Presentation Archive

Extreme Data-Intensive Computing in Astrophysics

Alex Szalay

October 24, 2011

Abstract: Scientific computing is increasingly revolving around massive amounts of data. From physical sciences to numerical simulations to high throughput genomics and homeland security, we are soon dealing with Petabytes if not Exabytes of data. This new, data-centric computing requires a new look at computing architectures and strategies. We will revisit Amdahl’s Law establishing the relation between CPU and I/O in a balanced computer system, and use this to analyze current computing architectures and workloads. We will discuss how existing hardware can be used to build systems that are much closer to an ideal Amdahl machine. Scaling existing architectures to the yearly doubling of data will soon require excessive amounts of electrical power. We will explore how low-power processors combined with GPGPUs might provide an ideal, low-power platform with both excellent IO and computational performance. We have deployed various scientific test cases, mostly drawn from astronomy, over different architectures and compare performance and scaling laws. We discuss a hypothetical cheap, yet high performance multi-petabyte system currently under consideration at JHU. We will also explore strategies of interacting with very large amounts of data, and compare various large scale data analysis platforms.