Introduction to High Performance Computing with UCL Cluster

28 slides
3.1 MB

Similar Presentations

Presentation Transcript


Introduction to High Performance Cluster Computing Courseware Module H.1.a August 2008


What is HPCHPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these are NOT High Availability clusters HPTC = High Performance Technical Computing The ultimate aim of HPC users is to max out the CPUs!


AgendaParallel Computing Concepts Clusters Cluster Usage


Concurrency and Parallel ComputingA central concept in computer science is concurrency: Concurrency: Computing in which multiple tasks are active at the same time. There are many ways to use Concurrency: Concurrency is key to all modern Operating Systems as a way to hide latencies. Concurrency can be used together with redundancy to provide high availability. Parallel Computing uses concurrency to decrease program runtimes. HPC systems are based on Parallel Computing


Hardware for Parallel ComputingParallel computers are classified in terms of streams of data and streams of instructions: MIMD Computers: Multiple streams of instructions acting on multiple streams of data. SIMD Computers: A single stream of instructions acting on multiple streams of data. Parallel Hardware comes in many forms: On chip: Instruction level parallelism (e.g. IPF) Multi-core: Multiple execution cores inside a single CPU Multiprocessor: Multiple processors inside a single computer. Multi-computer: networks of computers working together.


Hardware for Parallel ComputingDistributed ComputingClusterMassively Parallel Processor (MPP)Symmetric Multiprocessor (SMP)Non-uniform Memory Architecture (NUMA)Single Instruction Multiple Data (SIMD)*Multiple Instruction Multiple Data (MIMD)Shared Address SpaceDisjoint Address SpaceParallel Computers


HPC Platform GenerationsCommodity Off The Shelf CPUs, everything else custom… but today, it is a cluster.COTS components everywhereIn the 1980’s, it was a vector SMP.Custom components throughoutIntel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.


What is an HPC ClusterA cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster uses: Commodity off the shelf parts Low latency communication protocols


What is HPCC?Cluster Management ToolsMaster NodeFile Server / GatewayCompute NodesLAN/WANInterconnect


A Sample Cluster Design


Cluster Architecture View


Cluster HardwareThe Node A single element within the cluster Compute Node Just computes – little else Private IP address – no user access Master/Head/Front End Node User login Job scheduler Public IP address – connects to external network Management/Administrator Node Systems/cluster management functions Secure administrator address I/O Node Access to data Generally internal to cluster or to data centre




AgendaParallel Computing Concepts Clusters Cluster Usage


Cluster UsagePerformance Measurements Usage Model Application Classification Application Behaviour


The Mysterious FLOPS1 GFlops = 1 billion floating point operations per second Theoretical v Real GFlops Xeon Processor 1 Core theoretical peak = 4 x Clock speed (double precision) Xeons have 128 bit SSE registers which allows the processor to carry out 2 double precision floating point add and 2 multiply operations per clock cycle 2 computational cores per processor 2 processors per node (4 cores per node) Sustained (Rmax) = ~35-80% of theoretical peak (interconnect dependent) You’ll NEVER hit peak!


Other Measures of CPU PerformanceSPEC ( Spec CPU2000/2006 Speed – single core performance indicator Spec CPU2000/2006 Rate – node performance indicator SpecFP – Floating Point performance SpecINT – Integer performance Many other performance metrics may be required STREAM - memory bandwidth HPL – High Performance Linpack NPB – NASA suite of performance tests Pallas Parallel Benchmark – another suite IOZone – file system throughput


Technology Advancements in 5 YearsExample: * From November 2001 top500 supercomputer list (cluster of Dell Precision 530) ** Intel internal cluster built in 2006


Usage ModelMany Serial Jobs (Capacity)One Big Parallel Job (Capability)Load Balancing More Important Job Scheduling very importantInterconnect More ImportantNormal Mixed UsageElectronic Design Monte Carlo Design Optimisation Parallel SearchMany Users Mixed size Parallel/Serial jobs Ability to Partition and Allocate Jobs to Nodes for Best PerformanceBatch UsageAppliance UsageMeteorology Seismic Analysis Fluid Dynamics Molecular Chemistry


Application and Usage ModelHPC clusters run parallel applications, and applications in parallel! One single application that takes advantage of multiple computing platforms Fine-Grained Application Uses many systems to run one application Shares data heavily across systems PDVR3D (Eigenvalues and Eigenstates of a matrix) Coarse-Grained Application Uses many systems to run one application Infrequent data sharing among systems Casino (Monte-Carlo stochastic methods) Embarrassingly Parallel Application An instance of the entire application runs on each node Little or no data sharing among compute nodes BLAST (pattern matching) A shared memory machine will run all sorts of application


Types of ApplicationsForward Modelling Inversion Signal Processing Searching/Comparing


Forward ModellingSolving linear equations Grid Based Parallelization by domain decomposition (split and distribute the data) Finite element/finite difference


From measurements (F) compute models (M) representing properties (d) of the measured object(s). Deterministic Matrix inversions Conjugate gradient Stochastic Monte Carlo, Markov chain Genetic algorithms Generally large amounts of shared memory Parallelism through multiple runs with different modelsInversion


Signal Processing/Quantum MechanicsConvolution model (stencil) Matrix computations (eigenvalues…) Conjugate gradient methods Normally not very demanding on latency and bandwidth Some algorithms are embarrassingly parallel Examples: seismic migration/processing, medical imaging, SETI@Home


Signal Processing Example


Searching/ComparingInteger operations are more dominant than floating point IO intensive Pattern matching Embarrassingly parallel – very suitable for grid computing Examples: encryption/decryption, message interception, bio-informatics, data mining Examples: BLAST, HMMER

Browse More Presentations

Last Updated: 8th March 2018

Recommended PPTs