1 CPU: 0.59s 1 GPU: 0.12s Speedup: 4.9x
The next step is todo more of your computation on the GPU including data generation. As the computational complexity rises that can stay on the GPU before moving data with gpuArray() and gather()the greater your performance benefit will be. This example will take a FFT of a 2D function.
1 CPU: 25.66s 1 GPU: 1.38s Speedup: 18.59x
We used the vector form of the MATLAB operations which should be optimal. Another benefit of vector form, not only is it faster than using nested loops (uncomment the loop code in the second example if you are curious) MATLAB can also use multiple cores on vector forms for functions working on data that are large enough. How does this compare to a single GPU?
1 CPU: 25.66s
1 GPU: 1.38s Speedup: 18.59x
8 CPU: 4.61s Speedup: 5.56x
16 CPU: 2.65s Speedup: 9.68x
Based on the Flux rates as of 1/2014, $60/GPU-Month and $6.60/CPU-Month, codes need to have greater than 9x speedup from a single GPU to make up for the cost. Even this is not exactly correct as each Flux GPU comes with 2 host CPU cores.