This is why so many researchers rely on the cloud in this era of Big Data. Individual cloud machines come in many state-of-the-art flavors: GPU-intensive, compute-intensive, memory-intensive, low network latency, general purpose and so on. But there’s also a double-win for intensive science computation in the cloud: You do not have to wait for resources to become available, and, if you can parallelize your work, you can spin up large (or very large) clusters to finish your tasks quickly.


Suppose you have a highly parallel task that takes 40 hours on 20 nodes with 16 cores each. If you are sharing computing resources with other research groups, you may need to wait 24 hours—or maybe weeks—to use them, so your wall-clock time for one processing run becomes 64 hours or much longer. In the cloud, you can distribute your work among 2,000 machines, start immediately and run your work to completion in 30 minutes. The cloud has many strengths, and scalability is one of the biggest.