Table of Contents for GPGPU Programming For Games and Science

1 Introduction
2 CPU Computing
  2.1 Numerical Computing
      2.1.1 The Curse: An Example from Games
      2.1.2 The Curse: An Example From Science
      2.1.3 The Need to Understand Floating-Point Systems
  2.2 Balancing Robustness, Accuracy, and Speed
      2.2.1 Robustness
   Formal Definitions
   Algorithms and Implementations
   Practical Definitions
      2.2.2 Accuracy
      2.2.3 Speed
      2.2.4 Computer Science is a Study of Trade-offs
  2.3 IEEE Floating Point Standard
  2.4 Binary Scientific Notation
      2.4.1 Conversion from Rational to Binary Scientific Numbers
      2.4.2 Arithmetic Properties of Binary Scientific Numbers
   Addition of Binary Scientific Numbers
   Subtraction of Binary Scientific Numbers
   Multiplication of Binary Scientific Numbers
   Division of Binary Scientific Numbers
      2.4.3 Algebraic Properties of Binary Scientific Numbers
  2.5 Floating-Point Arithmetic
      2.5.1 Binary Encodings
   8-Bit Floating-Point Numbers
   16-Bit Floating-Point Numbers
   32-Bit Floating-Point Numbers
   64-Bit Floating-Point Numbers
   n-Bit Floating-Point Numbers
   Classifications of Floating-Point Numbers
      2.5.2 Rounding and Conversions
   Rounding with Ties-to-Even
   Rounding with Ties-to-Away
   Rounding Toward Zero
   Rounding Toward Positive
   Rounding Toward Negative
   Rounding from Floating-Point to Integral Floating-Point
   Conversion from Integer to Floating-Point
   Conversion from Floating-Point to Rational
   Conversion from Rational to Floating-Point
   Conversion to Wider Format
   Conversion to Narrower Format
      2.5.3 Arithmetic Operations
      2.5.4 Mathematical Functions
      2.5.5 Floating-Point Oddities
   Where Have All My Digits Gone?
   Have a Nice Stay!
   The Best I Can Do is That Bad?
   You Have Been More Than Helpful
   Hardware and Optimizing Compiler Issues
3 SIMD Computing
  3.1 Intel Streaming SIMD Extensions
      3.1.1 Shuffling Components
      3.1.2 Single-Component versus All-Component Access
      3.1.3 Load and Store Instructions
      3.1.4 Logical Instructions
      3.1.5 Comparison Instructions
      3.1.6 Arithmetic Instructions
      3.1.7 Matrix Multiplication and Transpose
      3.1.8 IEEE Floating-Point Support
      3.1.9 Keep the Pipeline Running
      3.1.10 Flattening of Branches
  3.2 SIMD Wrappers
  3.3 Function Approximations
      3.3.1 Minimax Approximations
      3.3.2 Inverse Square Root Function using Root Finding
      3.3.3 Square Root Function
      3.3.4 Inverse Square Root Function using a Minimax Algorithm
      3.3.5 Sine Function
      3.3.6 Cosine Function
      3.3.7 Tangent Function
      3.3.8 Inverse Sine Function
      3.3.9 Inverse Cosine Function
      3.3.10 Inverse Tangent Function
      3.3.11 Exponential Functions
      3.3.12 Logarithmic Functions
4 GPU Computing
  4.1 Drawing a 3D Object
      4.1.1 Model Space
      4.1.2 World Space
      4.1.3 View Space
      4.1.4 Projection Space
      4.1.5 Window Space
      4.1.6 Summary of the Transformations
      4.1.7 Rasterization
  4.2 High Level Shading Language (HLSL)
      4.2.1 Vertex and Pixel Shaders
      4.2.2 Geometry Shaders
      4.2.3 Compute Shaders
      4.2.4 Compiling HLSL Shaders
   Compiling the Vertex Coloring Shaders
   Compiling the Texturing Shaders
   Compiling the Billboard Shaders
   Compiling the Gaussian Blurring Shaders
      4.2.5 Reflecting HLSL Shaders
  4.3 Devices, Contexts, and Swap Chains
      4.3.1 Creating a Device and an Immediate Context
      4.3.2 Creating Swap Chains
      4.3.3 Creating the Back Buffer
  4.4 Resources
      4.4.1 Resource Usage and CPU Access
      4.4.2 Resource Views
      4.4.3 Subresources
      4.4.4 Buffers
   Constant Buffers
   Texture Buffers
   Vertex Buffers
   Index Buffers
   Structured Buffers
   Raw Buffers
   Indirect-Argument Buffers
      4.4.5 Textures
   1D Textures
   2D Textures
   3D Textures
      4.4.6 Texture Arrays
   1D Texture Arrays
   2D Texture Arrays
   Cubemap Textures
   Cubemap Texture Arrays
      4.4.7 Draw Targets
  4.5 States
  4.6 Shaders
      4.6.1 Creating Shaders
      4.6.2 Vertex, Geometry, and Pixel Shader Execution
      4.6.3 Compute Shader Execution
  4.7 Copying Data between CPU and GPU
      4.7.1 Mapped Writes for Dynamic Update
      4.7.2 Staging Resources
      4.7.3 Copy from CPU to GPU
      4.7.4 Copy from GPU to CPU
      4.7.5 Copy from GPU to GPU
  4.8 Multiple GPUs
      4.8.1 Enumerating the Adapters
      4.8.2 Copying Data between Multiple GPUs
  4.9 IEEE Floating-Point on the GPU
5 Practical Matters
  5.1 Engine Design and Architecture
      5.1.1 A Simple Low-Level D3D11 Application
      5.1.2 HLSL Compilation in Microsoft Visual Studio
      5.1.3 Design Goals for the Geometric Tools Engine
   An HLSL Factory
   Resource Bridges
   Visual Effects
   Visual Objects and Scene Graphs
  5.2 Debugging
      5.2.1 Debugging on the CPU
      5.2.2 Debugging on the GPU
      5.2.3 Be Mindful of Your Surroundings
   An Example of an HLSL Compiler Bug
   An Example of a Programmer Bug
  5.3 Performance
      5.3.1 Performance on the CPU
      5.3.2 Performance on the GPU
      5.3.3 Performance Guidelines
  5.4 Code Testing
      5.4.1 Topics in Code Testing
      5.4.2 Code Coverage and Unit Testing on the GPU
6 Linear Algebra
  6.1 Vectors
      6.1.1 Robust Length and Normalization Computations
      6.1.2 Orthogonality
   Orthogonality in 2D
   Orthogonality in 3D
   Orthogonality in 4D
   Gram-Schmidt Orthonormalization
      6.1.3 Orthonormal Sets
   Orthonormal Sets in 2D
   Orthonormal Sets in 3D
   Orthonormal Sets in 4D
      6.1.4 Barycentric Coordinates
      6.1.5 Intrinsic Dimensionality
  6.2 Matrices
      6.2.1 Matrix Storage and Transform Conventions
      6.2.2 Base Class Matrix Operations
      6.2.3 Square Matrix Operations in 2D
      6.2.4 Square Matrix Operations in 3D
      6.2.5 Square Matrix Operations in 4D
      6.2.6 The Laplace Expansion Theorem
  6.3 Rotations
      6.3.1 Rotations in 2D
      6.3.2 Rotations in 3D
      6.3.3 Rotations in 4D
      6.3.4 Quaternions
   Algebraic Operations
   Relationship of Quaternions to Rotations
   Spherical Linear Interpolation of Quaternions
      6.3.5 Euler Angles
   World Coordinates versus Body Coordinates
      6.3.6 Conversion between Representations
   Quaternion to Matrix
   Matrix to Quaternion
   Axis-Angle to Matrix
   Matrix to Axis-Angle
   Axis-Angle to Quaternion
   Quaternion to Axis-Angle
   Euler Angles to Matrix
   Matrix to Euler Angles
   Euler Angles to and from Quaternion or Axis-Angle
  6.4 Coordinate Systems
      6.4.1 Geometry and Affine Algebra
      6.4.2 Transformations
   Composition of Affine Transformations
   Decomposition of Affine Transformations
   A Simple Transformation Factory
      6.4.3 Coordinate System Conventions
      6.4.4 Converting Between Coordinate Systems
7 Sample Applications
  7.1 Video Streams
      7.1.1 The VideoStream Class
      7.1.2 The VideoStreamManager Class
  7.2 Root Finding
      7.2.1 Root Bounding
      7.2.2 Bisection
      7.2.3 Newton's Method
      7.2.4 Exhaustive Evaluation
   CPU Root Finding using a Single Thread
   CPU Root Finding using Multiple Threads
   GPU Root Finding
  7.3 Least Squares Fitting
      7.3.1 Fit a Line to 2D Points
      7.3.2 Fit a Plane to 3D Points
      7.3.3 Orthogonal Regression
   Fitting with Lines
   Fitting with Planes
      7.3.4 Estimation of Tangent Planes
  7.4 Partial Sums
  7.5 All-Pairs Triangle Intersection
  7.6 Shortest Path in a Weighted Graph
  7.7 Convolution
  7.8 Median Filtering
      7.8.1 Median by Sorting
      7.8.2 Median of 3x3 using Min-Max Operations
      7.8.3 Median of 5x5 using Min-Max Operations
  7.9 Level Surface Extraction
  7.10 Mass-Spring Systems
  7.11 Fluid Dynamics
       7.11.1 Numerical Methods
       7.11.2 Solving Fluid Flow in 2D
     Initialization of State
     Initialization of External Forces
     Updating the State with Advection
     Applying the State Boundary Conditions
     Computing the Divergence of Velocity
     Solving the Poisson Equation
     Updating the Velocity to be Divergence-Free
     Screen Captures from the Simulation
       7.11.3 Solving Fluid Flow in 3D
     Initialization of State
     Initialization of External Forces
     Updating the State with Advection
     Applying the State Boundary Conditions
     Computing the Divergence of Velocity
     Solving the Poisson Equation
     Updating the Velocity to be Divergence-Free
     Screen Captures from the Simulation