Where: San Jose, CA
Date: March 18, 2013
GTC is the place to learn about and share how advances in GPU technology help scientists, developers, graphic artists, designers, researchers, engineers and IT managers tackle their day-to-day computational and graphics challenges.
New for 2013, GTC is expanding to include more sessions on how GPUs are being used in manufacturing, media and entertainment, game development, mobile computing and cloud graphics.
GTC 2013 will feature the latest breakthroughs and the most amazing content in GPU-enabled applications and will deliver 4 full days of world-class education by some of the greatest minds from a wide range of fields.
Acceleware tutorials at GTC
Back by popular demand Acceleware will be presenting 4 informative tutorials at GTC:
Part 1 – An Introduction to GPU Programming (S3452)
Join us for an informative introduction to GPU Programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of two simple CUDA kernels will be provided. Find out more
Part 2 - How to Improve Performance using the CUDA Memory Model and Features of the new Kepler Architecture (S3453)
Explore the memory model of the GPU and the memory enhancements available in the new Kepler architecture and how these affect performance. The tutorial will begin with an essential overview of GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. The shuffle instruction, new shared memory configurations and Read-Only Cache of the Kepler architecture are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. The demonstration code will then be re-written using the shuffle instruction for the Kepler architecture. Find out more
Part 3 – Essential Optimization Techniques for NVIDIA Kepler and Fermi Architecture (S3454)
Learn how to optimize your algorithms for the Fermi and Kepler architectures. This informative tutorial will cover the key optimization strategies for compute and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size and using dynamic parallelism on the Kepler architecture. For compute bound algorithms we will discuss how to improve branching efficiency, using intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Find out more
Part 4 – CUDA Tools for Optimal Performance and Productivity (S3455)
Get the low down on debugging and profiling your GPU program! This tutorial dives deep into profiling techniques and the tools available to help you optimize your code. We will demonstrate NVIDIA’s Visual Profiler, nvcc flags and cuobjdump and highlight the various methods available for understanding the performance of your CUDA program. The second part of the session will focus on debugging techniques and available tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 5.0 including NSight and cuda-memcheck will be presented. A programming demo of the Visual profiler and Nsight will be included. Find out more