COMP3320/COMP6464 High Performance Scientific Computing

36 slides
8.99 MB
314 views

Presentation Transcript

1

COMP4300/COMP8300 Parallel SystemsAlistair Rendell and Joseph Antony Research School of Computer Science Australian National University

2

Concept and RationaleThe idea Split your program into bits that can be executed simultaneously Motivation Speed, Speed, Speed… at a cost effective price If we didn’t want it to go faster we would not be bothered with the hassles of parallel programming! Reduce the time to solution to acceptable levels No point waiting 1 week for tomorrow’s weather forecast Simulations that take months to run are not useful in a design environment

3

Sample Application AreasFluid flow problems Weather forecasting/climate modeling Aerodynamic modeling of cars, planes, rockets etc Structural Mechanics Building bridge, car, etc strength analysis Car crash simulation Speech and character recognition, image processing Visualization, virtual reality Semiconductor design, simulation of new chips Structural biology, molecular level design of drugs Human genome mapping Financial market analysis and simulation Datamining, machine learning Games programming

4

World Climate ModelingAtmosphere divided into 3D regions or cells Complex mathematical equations describe conditions in each cell, eg pressure, temperature, velocity Conditions change according to neighbour cells Updates repeated frequently as time passes Cells are affected by more distant cells the longer range the forecast Assume Cells are 1x1x1 mile to a height of 10 miles, 5x108 cells 200 flops to update each cell per timestep 10 minute timesteps for total of 10 days 100 days on 100 mflop machine 10 minutes on a tflop machine

5

ParallelSystems@ANU: NCINCI: National Computational Infrastructure http://nci.org.au History: established APAC in 1998 with $19.5M grant from federal government, NCI created in 2007 Current NCI collaboration agreement (2012–15) Major Collaborators: ANU, CSIRO, BoM, GA, Universities: Adelaide, Monash, UNSW, UQ, Sydney, Deakin, RMIT University Consortia: Intersect (NSW), QCIF (Queensland) Co-investment (for recurrent operations) : 2007: $0M; 2008: $3.4M; 2009: $6.4M; 2011: $7.5M; 2012: $8.5M; 2013: $11M; 2014 $11+M; to provide for all recurrent operations

6

Current infrastructure: Data CentreNew Data Centre: $24M (opened Nov. 2012) Machine Room: 920 sq. m. Power (after 2014 upgrades) 4.5 MW capacity raw; 1 MW UPS; 2 x 1.1 MVA Cummins generators Cooling in two loops: Server: 2 x 1.8 MW Carrier chillers; 3 x 0.8 MW “free cooling” heat exchangers; 18 deg C; 75 l/sec pump rate Data: 3 x 0.5 MW Carrier chillers; 15 deg C PUE: approx. 1.25 *

7

NCI: Raijin—Petascale Supercomputer Raijin – Supercomputer (June 2013 commissioning) 57,472 cores (Intel Xeon Sandy Bridge, 2.6 GHz) in 3592 compute nodes 160 TBytes (approx.) of main memory; Mellanox Infiniband FDR interconnect (52 km cable) 10 PBytes (approx.) of usable fast filesystem (for short-term scratch space apps, home directories). Power: 1.5 MW max. load Cooling systems: 100 tonnes of water 24th fastest in the world in debut (November 2012); first petaflop system in Australia (November 2014: #52) Fastest file-system in the southern hemisphere Custom monitoring and deployment Custom Kernel Highly customised PBS Pro scheduler.*

8

NCI’s integrated high-performance environment*

9

ParallelSystems@DCSBunyip: tsg.anu.edu.au/Projects/Bunyip 192 processor PC Cluster winner of 2000 Gordon Bell prize for best price performanceHigh Performance Computing Group Jabberwocky cluster Saratoga cluster Sunnyvale cluster

10
11
12

The Rise of Parallel ComputingParallelism became an issue for programmers from late 80s People began compiling lists of big parallel systems

13

November 2014 Top500(NCI now number 52)

14

*

15

Planning the FutureTop500 SupercomputersGrowth in ANU/NCI’s computing performance (measured in TFlops) since 1987. Architecture and capability determined by research and innovation drivers International Top500 supercomputer growth since 1993. Red: #1 machine each year Yellow: #500 machine each Blue: Sum of all machines Graphs show growth factors of between 8 and 9 times every 3 years. *

16

Transitioning Australia to its HPC Future*

17

Moore’s Law ‘Transistor density will double approximately every two years.’Dennard Scaling ‘As MOSFET features shrink, switching time and power consumption will fall proportionately’which led to higher Hertz and faster flopsWe also had Increased Node Performance

18

Agarwal, Hrishikesh, Keckler Burger, Clock Rate Versus IPC, ISCA 2000Until the chips became too big…

19

…so multiple cores appeared on chip…until we hit a bigger problem…2004 Sun releases Sparc IV with dual cores and heralding the start of multicore

20

…the end of Dennard scaling…✗✓…ushering in..

21

…a new philosophy in processor design is emerging…and a fundamentally new set of building blocks for our petascale systems

22

Petascale and Beyond: Challenges and OpportunitiesIn RSCS we are working in all these areas

23

Other Important ParallelismMultiple instruction units: Typical processors issue ~4 instructions per cycle Instruction Pipelining: Complicated operations are broken into simple operations that can be overlapped Graphics Engines: Use multiple rendering pipes and processing elments to render millions of polygons a second Interleaved Memory: Multiple paths to memory that can be used at same time Input/Output: Disks are striped with different blocks of data written to different disks at the same time

24

ParallelisationSplit program up and run parts simultaneously on different processors On N computers the time to solution should (ideally!) be 1/N Parallel Programming: the art of writing the parallel code! Parallel Computer: the hardware on which we run our parallel code! COMP4300 will discuss both Beyond raw compute power other motivations include Enabling more accurate simulations in the same time (finer grids) Providing access to huge aggregate memories Providing more and/or better input/output capacity

25

Parallelism in a Single “CPU” BoxMultiple instruction units: Typical processors issue ~4 instructions per cycle Instruction Pipelining: Complicated operations are broken into simple operations that can be overlapped Graphics Engines: Use multiple rendering pipes and processing elments to render millions of polygons a second Interleaved Memory: Multiple paths to memory that can be used at same time Input/Output: Disks are stripped with different blocks of data written to different disks at the same time

26

Health Warning!Course is run every other year Drop out this year and it won’t be repeated until 2017 It’s a 4000/8000 level course, it’s supposed to: Be more challenging that a 3000 level course! Be less well structured Have a greater expectation on you Have more student participation Be fun! Nathan Robertson, 2002 honours student “Parallel systems and thread safety at Medicare: 2/16 understood it - the other guy was a $70/hr contractor”

Last Updated: 8th March 2018