<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Petapath Blog &#187; NVIDIA</title>
	<atom:link href="http://www.petapath.com/blog/category/nvidia/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.petapath.com/blog</link>
	<description>Musings on HPC and heterogeneous systems</description>
	<lastBuildDate>Tue, 15 Jun 2010 13:49:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>GPU Technology Conference 2010</title>
		<link>http://www.petapath.com/blog/2010/03/26/gpu-technology-conference-2010/</link>
		<comments>http://www.petapath.com/blog/2010/03/26/gpu-technology-conference-2010/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 13:14:30 +0000</pubDate>
		<dc:creator>Dairsie</dc:creator>
				<category><![CDATA[CUDA]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[Heterogeneous]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[GPU Technology Conference]]></category>

		<guid isPermaLink="false">http://www.petapath.com/blog/?p=184</guid>
		<description><![CDATA[Having attended the 2009 GPU Technology conference it will be very interesting to see how much things have moved on in the intervening twelve months. For heterogeneous computing to really show it&#8217;s commercial potential, practical results need to be delivered on the back of the marketing impetus delivered by the unveiling of Fermi last year. [...]]]></description>
			<content:encoded><![CDATA[<p>Having attended the 2009 GPU Technology conference it will be very interesting to see how much things have moved on in the intervening twelve months. For heterogeneous computing to really show it&#8217;s commercial potential, practical results need to be delivered on the back of the marketing impetus delivered by the unveiling of Fermi last year. With Fermi based parts actually being available in the flesh this time round I expect there to be quite a lot of interesting announcements going on.</p>
<p>For the full press release see <a title="GPU Technology Conference 2010 Press Release" href="http://www.nvidia.com/object/io_1269574709099.html" target="_blank">here</a> and the landing page is <a title="GPU Technology Conference 2010" href="http://www.nvidia.com/object/gpu_technology_conference.html" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.petapath.com/blog/2010/03/26/gpu-technology-conference-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA 3.0 toolkit released</title>
		<link>http://www.petapath.com/blog/2010/03/22/cuda-3-0-toolkit-released/</link>
		<comments>http://www.petapath.com/blog/2010/03/22/cuda-3-0-toolkit-released/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 12:23:45 +0000</pubDate>
		<dc:creator>Dairsie</dc:creator>
				<category><![CDATA[Compilers]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[Heterogeneous]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[CUDA Toolkit]]></category>
		<category><![CDATA[Developer]]></category>

		<guid isPermaLink="false">http://www.petapath.com/blog/?p=177</guid>
		<description><![CDATA[NVIDIA has released version 3.0 of their CUDA Toolkit. As well as some significant new features this release also  provides tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include: Support for the Fermi architecture, with: Native 64-bit GPU support Multiple Copy Engine support ECC reporting Concurrent Kernel Execution Fermi HW debugging [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>NVIDIA has released version 3.0 of their CUDA Toolkit. As well as some significant new features this release also  provides  tools to prepare for the upcoming Fermi-based GPUs.  Highlights of this release include:</p>
<ul>
<li>Support for the Fermi architecture, with:
<ul>
<li>Native 64-bit GPU support</li>
<li>Multiple Copy Engine support</li>
<li>ECC reporting</li>
<li>Concurrent Kernel Execution</li>
<li>Fermi HW debugging support in cuda-gdb</li>
<li>Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler</li>
</ul>
</li>
<li>C++ Class Inheritance and Template Inheritance support for increased  programmer productivity</li>
<li>A new unified interoperability API for Direct3D and OpenGL, with  support for:
<ul>
<li>OpenGL texture interop</li>
<li>Direct3D 11 interop support</li>
<li>CUDA Driver / Runtime Buffer Interoperability, which allows  applications using the CUDA Driver API to also use libraries implemented  using the CUDA C Runtime such as CUFFT and CUBLAS.</li>
</ul>
</li>
</ul>
<ul>
<li>CUBLAS now supports all BLAS1, 2, and 3 routines including those for  single and double precision complex numbers</li>
<li>Up to 100x performance improvement while debugging applications with  cuda-gdb</li>
<li>cuda-gdb hardware debugging support for applications that use the  CUDA Driver API</li>
<li>cuda-gdb support for JIT-compiled kernels</li>
<li>New CUDA Memory Checker reports misalignment and out of bounds  errors, available as a stand-alone utility and debugging mode within  cuda-gdb</li>
<li>CUDA Toolkit libraries are now versioned, enabling applications to  require a specific version, support multiple versions explicitly, etc.</li>
<li>CUDA C/C++ kernels are now compiled to standard ELF format</li>
<li>Support for device emulation mode has been packaged in a separate  version of the CUDA C Runtime (CUDART), and is deprecated in this  release. Now that more sophisticated hardware debugging tools are  available and more are on the way, NVIDIA will be focusing on supporting  these tools instead of the legacy device emulation functionality.
<ul>
<li>On Windows, use the new Parallel Nsight development environment for  Visual Studio, with integrated GPU debugging and profiling tools (was  code-named “Nexus”). Please see www.nvidia.com/nsight for details.</li>
<li>On Linux, use cuda-gdb and cuda-memcheck, and check out the  solutions from Allinea and TotalView that will be available soon.</li>
</ul>
</li>
</ul>
<ul>
<li>Support for all the OpenCL features in the latest R195 production  driver package:
<ul>
<li>Double Precision</li>
<li>Graphics Interoperability with OpenCL, Direc3D9, Direct3D10, and  Direct3D11 for high performance visualization</li>
<li>Query for Compute Capability, so you can target optimizations for  GPU architectures (cl_nv_device_attribute_query)</li>
<li>Ability to control compiler optimization settings via support for  pragma unroll in OpenCL kernels and an extension that allows programmers  to set compiler flags. (cl_nv_compiler_options)</li>
<li>OpenCL Images support, for better/faster image filtering</li>
<li>32-bit global and local atomics for fast, convenient data  manipulation</li>
<li>Byte Addressable Stores, for faster video/image processing and  compression algorithms</li>
<li>Support for the latest OpenCL spec revision 1.0.48 and latest  official Khronos OpenCL headers as of 2010-02-17</li>
</ul>
</li>
</ul>
<p>The toolkit, drivers, tools and documentation are available from <a onclick="javascript:pageTracker._trackPageview('/outbound/article/http://developer.nvidia.com/object/cuda_3_0_downloads.html');" href="http://developer.nvidia.com/object/cuda_3_0_downloads.html" target="_blank">http://developer.nvidia.com/object/cuda_3_0_downloads.html</a>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.petapath.com/blog/2010/03/22/cuda-3-0-toolkit-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JPR Whitepaper on multi-AIB systems</title>
		<link>http://www.petapath.com/blog/2009/08/04/jpr-whitepaper-on-multi-aib-systems/</link>
		<comments>http://www.petapath.com/blog/2009/08/04/jpr-whitepaper-on-multi-aib-systems/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 20:22:40 +0000</pubDate>
		<dc:creator>Dairsie</dc:creator>
				<category><![CDATA[AMD]]></category>
		<category><![CDATA[DirectX11]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[Heterogeneous]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[Links]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[NVIDIA]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Views]]></category>
		<category><![CDATA[Khronos]]></category>

		<guid isPermaLink="false">http://www.petapath.com/blog/?p=110</guid>
		<description><![CDATA[The recent whitepaper from Jon Peddie on Multi GPU issues and opportunities is an interesting read. It&#8217;s a pretty even-handed analysis, as you would expect from someone of Jon&#8217;s experience. However I do wonder about the way in which it is being reported. I&#8217;ve noticed this whitepaper picked up in various places today, with very [...]]]></description>
			<content:encoded><![CDATA[<p>The recent whitepaper from Jon Peddie on <a title="JPR GPU Report" href="http://www.jonpeddie.com/special/WhitePapers/Multi-GPU-issues-and-opportunities.pdf" target="_blank"><em>Multi GPU issues and opportunities</em></a> is an interesting read. It&#8217;s a pretty even-handed analysis, as you would expect from someone of Jon&#8217;s experience. However I do wonder about the way in which it is being reported. I&#8217;ve noticed this whitepaper picked up in various <a title="JP Report on AIB " href="http://en.expreview.com/2009/08/03/nearly-half-of-pcs-to-be-powered-by-multi-gpu-tech-in-2012.html" target="_blank">places</a> today, with very subtly differing <a href="http://www.vizworld.com/2009/08/new-jon-peddie-report-50-penetration-on-gpgpu/" target="_blank">takes</a>, but the headline that most people are pulling out is that JPR has predicted that in the next three years, nearly half of all PCs will have multiple GPU AIBs (Add In Boards). Unfortunately while the re-broadcasting the whitepaper as news, most commentators haven&#8217;t seen fit to suggest why this might be the case. I see it as an interesting exercise in cause and effect!</p>
<p>Jon spends much more time looking at CAGRs, IHV marketing decks and sales projections than I do, but having read the report I do wonder if he&#8217;s missed a trick or two while staring at his tea leaves. Reading the report it&#8217;s clear that JPR&#8217;s remit was to explore multi-GPU from the perspective of scaling the graphics performance by using multi-AIB and multi-GPU systems. Not too surprising given that the report was at least partially sponsored by LucidLogix and to a lesser extent AMD and NVIDIA.</p>
<p>As an aside, LucidLogix are an interesting entrant into the graphics market, as they are producing an IHV agnostic chip that potentially allows for multi-GPU scaling to take a very interesting turn indeed. It&#8217;s not clear yet how their product will be greeted by the market (or indeed the IHVs), but if it works as well as they say it does (and I haven&#8217;t seen it in action yet) it has the potential to break the current state of single vendor (and mostly single device variant) multi-GPU systems (SLI vs Crossfire).It also has the ability to give habitual AIB buyers a far longer working life for their previous purchases, as in theory, a motherboard enhanced with a LucidLogix device means you can run your latest GPU in parallel with your previous primary graphics card.</p>
<p>ATI/AMD promised this a while back but I&#8217;m not sure it was ever delivered in any meaningful way (they were promoting it as a potential route for physics acceleration around the time NVIDIA acquired PhysX) and NVIDIA have shown systems where an NVIDIA IGP and an NVIDIA AIB co-exist and the driver selects the most appropriate device for a given workload (Hybrid SLI). With a LucidLogix Hydra device acting as a bridge, the theory is that the user sees something approaching additive scaling (certainly not a given from existing solutions) from a mix of different GPUs and best of all you won&#8217;t necessarily be tied to a single vendor either. Of course there&#8217;s likely to be a very long list of caveats to achieving this multi-GPU nirvana but that&#8217;s another (very interesting) blog entry and I digress.</p>
<p>My immediate thought while reading the JPR report was that the analysis curiously excluded IGPs (Integrated Graphics Processors) and thus I presume, devices coming from AMD and Intel with graphics integrated into the same package as the CPU (e.g. Fusion from AMD). The current trend to integrate graphics on the CPU package is a cost driven evolution from North Bridge IGPs (memory controllers already having moved to the CPU), with both AMD and Intel wanting to leverage their position as &#8216;platform&#8217; vendors, and to offer price reductions to PC OEMs (but coincidentally increase the proportion of the PC&#8217;s BOM they see).</p>
<p>In terms of the number of people who only ever see an integrated graphics solution in action (be it in the chipset or in the future next to the CPU) we&#8217;re probably approaching 50%, given Intel&#8217;s current dominance in the mobile and mid to low cost desktop markets. If AMD&#8217;s Fusion range succeeds it will mostly likely take market share away from Intel rather than the AIB market. So this suggests one way in which JPR sees the market evolving, but apart from the continuing drive to improve rendering performance he doesn&#8217;t really offer any other market drivers. Should he have?</p>
<p>In recent years what has driven the evolution of the PC and the growth of the GPU vendors, has really been games. THis isn&#8217;t going to stop any time soon (even with consoles accounting for an increasing proportion of that market). At the same time the move to use COTS (Commercial Off The Shelf) systems in HPC (High Performance Compute) has also been a significant evolutionary driver for x86 as a platform.</p>
<p>So where does that leave this new wave of heterogeneous compute that the GPU vendors are so keen to exploite these days? Speaking as a person who already works with heterogeneous systems (i.e. accelerated using GPUs and other specialist co-processors) to solve engineering and scientific problems for clients, the quest for better performance for ISV (and custom) codes will drive adoption of in the personal workstation market, but that still leaves the consumer side of the equation.</p>
<p>This is where being involved in the Khronos Group and watching the evolution of OpenCL from the inside, has shown me just how wide a reach OpenCL (and other related APIs) potentially could have. It won&#8217;t just affect the software we write but it also has the potential to shape the direction that the future of PC architecture will take and ultimately this is bound up with the software that we run on them. This leads me back to my cause and effect observation: Just what would drive 50% of all PC owners by 2012 to have bought at least one additional AIB (or bought a machine that shipped with two AIBs)?</p>
<p>We have already seen that the market wasn&#8217;t quite ready for a single source, vendor specific and cost option as far as physics acceleration was concerned. This is not to say that PhysX has been a failure, but it&#8217;s not currently achieved the market penetration to really drive sales of NVIDIA AIBs in its own right. AMD have lately played smart, and offered a potential counter to the PhysX marketing bullet point that also addresses the issue of vendor specific solutions, by porting the Intel owned Havok physics engine to OpenCL (an interesting move in itself from a marketing perspective).</p>
<p>All of this leads to more questions than answers at the moment. Will the relatively recent existence of an open, cross platform and most importantly cross vendor programming target in OpenCL feed the growth of a non-game based software ecosystem, that is not just able to take advantage of heterogeneous acceleration, but will actually drive it in some quite remarkable directions? Will OpenCL be able to meet the Microsoft juggernaut (in the form of DirectX11 Compute shaders) head on?</p>
<p>I actually think these two approaches to tapping the compute horsepower available from GPUs are actually complementary rather than necessarily in direct opposition. DX11 Compute Shaders (and their evolutionary descendants) will undoubtedly enable performance improvements for games and open up a wealth of new options for game developers, but I think the smart money for ISVs wanting to develop other applications for heterogeneous systems will be with OpenCL. When the OpenCL standard matures a little and the IHVs work out some of the current interoperability issues, we will start to see truly heterogeneous software solutions arrive and this, I think, could be really important. A healthy software ecosystem will sustain, though it may not significantly grow the current AIB market (much as JPR predicts), but a lot still depends on where the gamers end up because at the moment this is what&#8217;s driving the year on year improvement in GPU performance.</p>
<p>As luck would have it Neil Trevett, the Khronos Group President, and incidentally an NVIDIA VP, has an interesting <a title="Tech Report interview with Neil Trevett" href="http://www.techreport.com/articles.x/17321" target="_blank">interview </a>on The TechReport today along these lines.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.petapath.com/blog/2009/08/04/jpr-whitepaper-on-multi-aib-systems/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (user agent is rejected)

Served from: www.petapath.com @ 2012-02-05 13:44:29 -->
