High Performance Computing Lecture 1

54 slides
6.31 MB

Similar Presentations

Presentation Transcript


*Parallel Scientific Computing: Algorithms and Tools Lecture #1APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg


*LogisticsContact: Office hours: GK: M 2-4 pm; LG: W 2-4 pm Email: {gk,lgrinb}@dam.brown.edu Web: www.cfm.brown.edu/people/gk/APMA2821A Textbook: Karniadakis & Kirby, “Parallel scientific computing in C++/MPI” Other books: Shonkwiler & Lefton, “Parallel and Vector Scientific Computing” Wadleigh & Crawford, “Software Optimization for High Performance Computing” Foster, “Designing and Building Parallel Programs” (available online)


*LogisticsCCV Accounts Email: Sharon_King@brown.edu Prerequisite: C/Fortran programming Grading: 5 assignments/mini-projects: 50% 1 Final project/presentation : 50%






*Course Objectives Understanding of fundamental concepts and programming principles for development of high performance applications Able to program a range of parallel computers: PC  clusters  supercomputers Make efficient use of high performance parallel computing in your own research


*Course Objectives


*Content OverviewParallel computer architecture: 2-3 weeks CPU, Memory; Shared-/distributed-memory parallel machines; network connections; Parallel programming: 5 weeks MPI; OpenMP; UPC Parallel numerical algorithms: 4 weeks Matrix algorithms; direct/iterative solvers; eigensolvers; Monte Carlo methods (simulated annealing, genetic algorithms) Grid computing: 1 week Globus, MPICH-G2


*What & WhyWhat is high performance computing (HPC)? The use of the most efficient algorithms on computers capable of the highest performance to solve the most demanding problems. Why HPC? Large problems – spatially/temporally 10,000 x 10,000 x 10,000 grid  10^12 grid points  4x10^12 double variables  32x10^12 bytes = 32 Tera-Bytes. Usually need to simulate tens of millions of time steps. On-demand/urgent computing; real-time computing; Weather forecasting; protein folding; turbulence simulations/CFD; aerospace structures; Full-body simulation/ Digital human …


*HPC Examples: Blood Flow in Human Vascular NetworkCardiovascular disease accounts for about 50% of deaths in western world; Formation of arterial disease strongly correlated to blood flow patterns;Computational challenges: Enormous problem sizeIn one minute, the heart pumps the entire blood supply of 5 quarts through 60,000 miles of vessels, that is a quarter of the distance between the moon and the earthBlood flow involves multiple scales


*HPC ExamplesEarthquake simulation Surface velocity 75 sec after earthquakeFlu pandemic simulation 300 million people trackedDensity of infected population, 45 days after breakout


*HPC Example: Homogeneous TurbulenceDirect Numerical Simulation of Homogeneous Turbulence: 4096^3Zoom-inZoom-inVorticity iso-surface


*How HPC fits into Scientific ComputingAir flow around an airplaneNavier-stokes equationsAlgorithms, BCs, solvers, Application codes, supercomputersViz softwareHPC


*Performance MetricsFLOPS, or FLOP/S: FLoating-point Operations Per Second MFLOPS: MegaFLOPS, 10^6 flops GFLOPS: GigaFLOPS, 10^9 flops, home PC TFLOPS: TeraGLOPS, 10^12 flops, present-day supercomputers (www.top500.org) PFLOPS: PetaFLOPS, 10^15 flops, by 2011 EFLOPS: ExaFLOPS, 10^18 flops, by 2020 MIPS=Mega Instructions per Second = MegaHertz (if only 1IPS) Note: von Neumann computer -- 0.00083 MIPS


*Performance MetricsTheoretical peak performance R_theor: maximum FLOPS a machine can reach in theory. Clock_rate*no_cpus*no_FPU/CPU 3GHz, 2 cpus, 1 FPU/CPU  R_theor=3x10^9 * 2 = 6 GFLOPS Real performance R_real: FLOPS for specific operations, e.g. vector multiplication Sustained performance R_sustained: performance on an application, e.g. CFDR_sustained << R_real << R_theorNot uncommon R_sustained < 10%R_theor


*Top 10 Supercomputerswww.top500.org November 2007, LINPACK performanceR_realR_theor


*Number of Processors


*Fastest SupercomputersAt presentProjectionswww.top500.org Japanese Earth SimulatorMy Laptop


IBM BG/LASCI WhitePacificEDSAC 1UNIVAC 1IBM 7090CDC 6600IBM 360/195CDC 7600Cray 1Cray X-MPCray 2TMC CM-2TMC CM-5Cray T3DASCI Red19501960197019801990200020101 KFlop/s1 MFlop/s1 GFlop/s1 TFlop/s1 PFlop/s1941 1 (Floating Point operations / second, Flop/s) 1945 100 1949 1,000 (1 KiloFlop/s, KFlop/s) 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) 2000 10,000,000,000,000 2005 131,000,000,000,000 (131 Tflop/s)(103)(106)(109)(1012)(1015)2X Transistors/Chip Every 1.5 Years A Growth-Factor of a Billion in Performance in a Career


Japanese “Life Simulator” Effort for a 10 Pflop/s SystemFrom the Nikkei newspaper, May 30th morning edition. Collaboration of industry, academia and government is organized by NEC, Hitachi, U of Tokyo, Kyusyu U, and RIKEN. Competition component similar to the DARPA HPCS program. This year allocated about $4 M each to do advanced development towards petascale. Total of ¥100,000 M ($909 M) will be invested in this development. Plan to be operational in 2011.


Japan’s Life Simulator: Original concept design in 2005 Needs of Multi-scale Multi-physic simulationIntegration of multiple architectureTightly-coupled heterogeneous computerNeeds of multiple computation componentsProposing architecture


Major Applications of Next Generation SupercomputerTargeted as grand challenges


Basic Concept for Simulations in Nano-Science


Basic Concept for Simulations in Life Sciences


*Petascale Era: 2008-NCSA: Blue Waters 1PTF/s, 2011


*Bell versus Moore

Browse More Presentations

Last Updated: 8th March 2018

Recommended PPTs