And the prize for …

coolest but most obscure presentation with a mention for OpenCL goes to one by Dennis Wingo, on The Lunar Orbiter Image Recovery Project, forApple’s  WWDC 2009.


ATI Stream Software Development Kit (SDK) v2.0 Beta Program

Or should that be AMD have launched their Public Beta for v2.0 of their Stream Software Development Kit. The headline news here is that this is an OpenCL SDK. That’s the good news. The disappointing news is that there is only support for an OpenCL CPU runtime at the moment. We have to wait for a post v2.0-beta2 to ship later in Q3 of 2009 to get to grips with the GPU enabled version.

AMD have also posted the first in a series of Tutorials on using OpenCL, written by Ben Gaster who is the OpenCL Architect at AMD (and incidentally an ex-colleague of mine ;) ).


JPR Whitepaper on multi-AIB systems

The recent whitepaper from Jon Peddie on Multi GPU issues and opportunities is an interesting read. It’s a pretty even-handed analysis, as you would expect from someone of Jon’s experience. However I do wonder about the way in which it is being reported. I’ve noticed this whitepaper picked up in various places today, with very subtly differing takes, but the headline that most people are pulling out is that JPR has predicted that in the next three years, nearly half of all PCs will have multiple GPU AIBs (Add In Boards). Unfortunately while the re-broadcasting the whitepaper as news, most commentators haven’t seen fit to suggest why this might be the case. I see it as an interesting exercise in cause and effect!

Jon spends much more time looking at CAGRs, IHV marketing decks and sales projections than I do, but having read the report I do wonder if he’s missed a trick or two while staring at his tea leaves. Reading the report it’s clear that JPR’s remit was to explore multi-GPU from the perspective of scaling the graphics performance by using multi-AIB and multi-GPU systems. Not too surprising given that the report was at least partially sponsored by LucidLogix and to a lesser extent AMD and NVIDIA.

As an aside, LucidLogix are an interesting entrant into the graphics market, as they are producing an IHV agnostic chip that potentially allows for multi-GPU scaling to take a very interesting turn indeed. It’s not clear yet how their product will be greeted by the market (or indeed the IHVs), but if it works as well as they say it does (and I haven’t seen it in action yet) it has the potential to break the current state of single vendor (and mostly single device variant) multi-GPU systems (SLI vs Crossfire).It also has the ability to give habitual AIB buyers a far longer working life for their previous purchases, as in theory, a motherboard enhanced with a LucidLogix device means you can run your latest GPU in parallel with your previous primary graphics card.

ATI/AMD promised this a while back but I’m not sure it was ever delivered in any meaningful way (they were promoting it as a potential route for physics acceleration around the time NVIDIA acquired PhysX) and NVIDIA have shown systems where an NVIDIA IGP and an NVIDIA AIB co-exist and the driver selects the most appropriate device for a given workload (Hybrid SLI). With a LucidLogix Hydra device acting as a bridge, the theory is that the user sees something approaching additive scaling (certainly not a given from existing solutions) from a mix of different GPUs and best of all you won’t necessarily be tied to a single vendor either. Of course there’s likely to be a very long list of caveats to achieving this multi-GPU nirvana but that’s another (very interesting) blog entry and I digress.

My immediate thought while reading the JPR report was that the analysis curiously excluded IGPs (Integrated Graphics Processors) and thus I presume, devices coming from AMD and Intel with graphics integrated into the same package as the CPU (e.g. Fusion from AMD). The current trend to integrate graphics on the CPU package is a cost driven evolution from North Bridge IGPs (memory controllers already having moved to the CPU), with both AMD and Intel wanting to leverage their position as ‘platform’ vendors, and to offer price reductions to PC OEMs (but coincidentally increase the proportion of the PC’s BOM they see).

In terms of the number of people who only ever see an integrated graphics solution in action (be it in the chipset or in the future next to the CPU) we’re probably approaching 50%, given Intel’s current dominance in the mobile and mid to low cost desktop markets. If AMD’s Fusion range succeeds it will mostly likely take market share away from Intel rather than the AIB market. So this suggests one way in which JPR sees the market evolving, but apart from the continuing drive to improve rendering performance he doesn’t really offer any other market drivers. Should he have?

In recent years what has driven the evolution of the PC and the growth of the GPU vendors, has really been games. THis isn’t going to stop any time soon (even with consoles accounting for an increasing proportion of that market). At the same time the move to use COTS (Commercial Off The Shelf) systems in HPC (High Performance Compute) has also been a significant evolutionary driver for x86 as a platform.

So where does that leave this new wave of heterogeneous compute that the GPU vendors are so keen to exploite these days? Speaking as a person who already works with heterogeneous systems (i.e. accelerated using GPUs and other specialist co-processors) to solve engineering and scientific problems for clients, the quest for better performance for ISV (and custom) codes will drive adoption of in the personal workstation market, but that still leaves the consumer side of the equation.

This is where being involved in the Khronos Group and watching the evolution of OpenCL from the inside, has shown me just how wide a reach OpenCL (and other related APIs) potentially could have. It won’t just affect the software we write but it also has the potential to shape the direction that the future of PC architecture will take and ultimately this is bound up with the software that we run on them. This leads me back to my cause and effect observation: Just what would drive 50% of all PC owners by 2012 to have bought at least one additional AIB (or bought a machine that shipped with two AIBs)?

We have already seen that the market wasn’t quite ready for a single source, vendor specific and cost option as far as physics acceleration was concerned. This is not to say that PhysX has been a failure, but it’s not currently achieved the market penetration to really drive sales of NVIDIA AIBs in its own right. AMD have lately played smart, and offered a potential counter to the PhysX marketing bullet point that also addresses the issue of vendor specific solutions, by porting the Intel owned Havok physics engine to OpenCL (an interesting move in itself from a marketing perspective).

All of this leads to more questions than answers at the moment. Will the relatively recent existence of an open, cross platform and most importantly cross vendor programming target in OpenCL feed the growth of a non-game based software ecosystem, that is not just able to take advantage of heterogeneous acceleration, but will actually drive it in some quite remarkable directions? Will OpenCL be able to meet the Microsoft juggernaut (in the form of DirectX11 Compute shaders) head on?

I actually think these two approaches to tapping the compute horsepower available from GPUs are actually complementary rather than necessarily in direct opposition. DX11 Compute Shaders (and their evolutionary descendants) will undoubtedly enable performance improvements for games and open up a wealth of new options for game developers, but I think the smart money for ISVs wanting to develop other applications for heterogeneous systems will be with OpenCL. When the OpenCL standard matures a little and the IHVs work out some of the current interoperability issues, we will start to see truly heterogeneous software solutions arrive and this, I think, could be really important. A healthy software ecosystem will sustain, though it may not significantly grow the current AIB market (much as JPR predicts), but a lot still depends on where the gamers end up because at the moment this is what’s driving the year on year improvement in GPU performance.

As luck would have it Neil Trevett, the Khronos Group President, and incidentally an NVIDIA VP, has an interesting interview on The TechReport today along these lines.


Siggraph 2009

I’m expecting to see a fairly significant number of announcements relating to heterogeneous computing at Siggraph this year.  While fully accelerated, production quality rendering pipelines running on large scale heterogeneous render farms may not quite be here yet (and there is reason to suspect that will always be a pipedream), there are plenty of places in the production process which can benefit from the power of these systems. Look to see a lot of work-flow related innovations, designed to make an individual animator or TD more productive. We’ve already seen particular emphasis on accelerated previews for complex shading/lighting on GPUs so we’ll seem more of the same and also accelerated particle systems and physics solvers for complex interactive environments.


OpenCL API 1.0 Quick Reference Card

If you haven’t already found this on the Khronos web site you should definitely download and print out this little gem!

Update: Also available in poster format.


OpenCL Forum Resources

So imagine you want to ask a question about OpenCL that doesn’t seem to be covered by the published specification or you have observed behaviour that vendor specific documentation doesn’t describe, where should you turn for advice or help?

In the case of questions about the the specification, or queries about forthcoming revisions of the standard (or even suggestions), one should definitely go to the Khronos Group’s own OpenCL forum. Here you will find forum members who are actually involved with the OpenCL standard body as well as developers and users of vendor implementations. It’s a fair bet that you will get most specification related questions answered here.

In the case that you believe you are observing something that does not meet the specification as stated, in a vendor implementation, then your first port of call should be the vendor’s own support forum. If it’s an obvious bug or known performance issue the odds are good that it will have already been documented, either in an answer to a prior forum posting or in a FAQ or Release Note.  It should go without saying that you should take care to follow their procedure for submitting queries or bugs before posting to a forum (i.e. do a forum search and read release notes – we don’t compile them just for fun).

Here’s a list of OpenCL related forum resources I have to hand:

  • Khronos Group OpenCL forum
  • NVIDIA OpenCL forum
  • AMD Stream forum (AMD don’t have a public OpenCL forum at time of writing)
  • Beyond3D GPGPU Technology & Programming forum
  • GPGPU General GPGPU Discussion forum

I’m sure there are more out there, if you find them please send them to me and I’ll start keeping an OpenCL Resource page.


PGI Launch Compilers for Heterogeneous Computing

PGI have released version 9.0 of their Fortran and C99 compilers which includes the PGI Accelerator™  support (effectively x86+GPU). PGI are making some grand claims for these extensions but without having used the tools for myself I am doubtful that they are quite as all singing and dancing as PGI are making out. The Programming Model whitepaper does address some of the critical aspects of porting software to work effectively on heterogeneous system (which is less about the compute and more about data movement and maximising bandwidths once on the accelerator) but it also serves to mask an important factor that is currently a sticking point for many people exploring the use of heterogeneous systems; that applications written for x86 may not be the best starting point for best performance on accelerators.

Update: Incidentally there is quite a bit of information on the PGI web site and a solid series of articles by Michael Wolfe published on HPCWire.


NVIDIA GPU Computing Tutorial Webinar Series

For those of you who haven’t seen this elsewhere, NVIDIA are running what they now refer to as GPU Computing Webinars. GPU Computing seems to be what NVIDIA refer to their OpenCL thrust as, clearly they want to keep CUDA separate product in users minds.

Two in particular look interesting:

  • An Introduction to GPU Computing and OpenCL (1.5 hours)
  • Best Practices for OpenCL programming – Advanced (1.5 hours)

Khronos Group BOFs at SIGGRAPH 2009

2009-Siggraph-530x194

The Khronos Group (of which Petapath are a member) maintains standards which are fundamental to many of the technologies on display at SIGGRAPH.  Hear about the latest updates on OpenCL, OpenGL, and COLLADA at the BOF sessions running at SIGGRAPH this year.

“The OpenCL BOF is your chance to directly interact with the designers and implementers of this significant new standard for heterogeneous parallel programming on GPUs, CPUs and many other processors. Come and learn how OpenCL interoperates with OpenGL to enable advanced, cross-platform, visual computing applications.”

If you are attending SIGGRAPH follow the link below to sign up for the OpenCL BOF and also what promise to be two very interesting courses on programmable shading.

Siggraph 2009 New Orleans – Khronos Group Events, Seminars and Presentations.

The OpenCL BOF is your chance to directly interact with the designers and implementers of this significant new standard for heterogeneous parallel programming on GPUs, CPUs and many other processors. Come and learn how OpenCL interoperates with OpenGL to enable advanced, cross-platform, visual computing applications.

Graphics Take a Hit in ‘09

According to John Peddie estimated global graphics chip shipments for 2009 will see the worst ever year-over-year drop in shipments. The decrease 2009 will be even worse than the 2000-20001 recession. However he also says that 2010 promises an amazing comeback. He goes on to say:

“New programming capabilities using OpenCL, DirectX 11, and Nvidia’s CUDA architecture will remove barriers to the exploitation of the GPU as a serious, economical, and powerful co-processor in all level of PCs.”

Certainly the wider availability of OpenCL 1.0 and DX11′s Compute Shaders (with the launch Windows 7), as well as continued developments to CUDA will mean a step change in the number of people who are exposed to applications taking advantage of heterogeneous compute. These technologies will undoubtedly see a proliferation of applications taking advantage of data parallel compute acceleration with OpenCL slowly taking the lead for stand alone applications and DX11′s compute shaders enabling some nifty new tricks and performance improvements for games players.


 

May 2012
M T W T F S S
« Jun    
 123456
78910111213
14151617181920
21222324252627
28293031