Description

Book Synopsis

With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of

Table of Contents

Contributors xxiii

Preface xxvii

Part I Introduction 1

1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
Emmanuel Jeannot and Julius Žilinskas

1.1 Introduction and Vision 4

1.2 Scientific Organization 6

1.2.1 Scientific Focus 6

1.2.2 Working Groups 6

1.3 Activities of the Project 6

1.3.1 Spring Schools 6

1.3.2 International Workshops 7

1.3.3 Working Groups Meetings 7

1.3.4 Management Committee Meetings 7

1.3.5 Short-Term Scientific Missions 7

1.4 Main Outcomes of the Action 7

1.5 Contents of the Book 8

Acknowledgment 10

Part II Numerical Analysis for Heterogeneous and Multicore Systems 11

2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
Dimitar Lukarski and Maya Neytcheva

2.1 Introduction 14

2.2 General Description of Iterative Methods and Preconditioning 16

2.2.1 Basic Iterative Methods 16

2.2.2 Projection Methods: CG and GMRES 18

2.3 Preconditioning Techniques 20

2.4 Defect-Correction Technique 21

2.5 Multigrid Method 22

2.6 Parallelization of Iterative Methods 22

2.7 Heterogeneous Systems 23

2.7.1 Heterogeneous Computing 24

2.7.2 Algorithm Characteristics and Resource Utilization 25

2.7.3 Exposing Parallelism 26

2.7.4 Heterogeneity in Matrix Computation 26

2.7.5 Setup of Heterogeneous Iterative Solvers 27

2.8 Maintenance and Portability 29

2.9 Conclusion 30

Acknowledgments 31

References 31

3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
Matjaž Depolli, Gregor Kosec, and Roman Trobec

3.1 Introduction 34

3.2 Test Case 35

3.2.1 Governing Equations 35

3.2.2 Solution Procedure 36

3.3 Parallel Implementation 39

3.3.1 Intel PCM Library 39

3.3.2 OpenMP 40

3.4 Results 41

3.4.1 Results of Numerical Integration 41

3.4.2 Parallel Efficiency 42

3.5 Discussion 45

3.6 Conclusion 47

Acknowledgment 47

References 47

4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
Natalija Tumanova and Raimondas Ciegis

4.1 Introduction 51

4.2 Formulation of the Discrete Model 53

4.2.1 The 𝜃-Implicit Discrete Scheme 55

4.2.2 The Predictor–Corrector Algorithm I 57

4.2.3 The Predictor–Corrector Algorithm II 58

4.3 Parallel Algorithms 59

4.3.1 Parallel 𝜃-Implicit Algorithm 59

4.3.2 Parallel Predictor–Corrector Algorithm I 62

4.3.3 Parallel Predictor–Corrector Algorithm II 63

4.4 Computational Results 63

4.4.1 Experimental Comparison of Predictor–Corrector Algorithms 66

4.4.2 Numerical Experiment of Neuron Excitation 68

4.5 Conclusions 69

Acknowledgments 70

References 70

Part III Communication and Storage Considerations in High-Performance Computing 73

5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier

5.1 Introduction 76

5.2 General Overview 76

5.2.1 A Key to Scalability: Data Locality 77

5.2.2 Data Locality Management in Parallel Programming Models 77

5.2.3 Virtual Topology: Definition and Characteristics 78

5.2.4 Understanding the Hardware 79

5.3 Formalization of the Problem 79

5.4 Algorithmic Strategies for Topology Mapping 81

5.4.1 Greedy Algorithm Variants 81

5.4.2 Graph Partitioning 82

5.4.3 Schemes Based on Graph Similarity 82

5.4.4 Schemes Based on Subgraph Isomorphism 82

5.5 Mapping Enforcement Techniques 82

5.5.1 Resource Binding 83

5.5.2 Rank Reordering 83

5.5.3 Other Techniques 84

5.6 Survey of Solutions 85

5.6.1 Algorithmic Solutions 85

5.6.2 Existing Implementations 85

5.7 Conclusion and Open Problems 89

Acknowledgment 90

References 90

6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky

6.1 Introduction 95

6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97

6.3 Optimizations of Collectives on Homogeneous Clusters 98

6.4 Heterogeneous Networks 99

6.4.1 Comparison to Homogeneous Clusters 99

6.5 Topology- and Performance-Aware Collectives 100

6.6 Topology as Input 101

6.7 Performance as Input 102

6.7.1 Homogeneous Performance Models 103

6.7.2 Heterogeneous Performance Models 105

6.7.3 Estimation of Parameters of Heterogeneous Performance Models 106

6.7.4 Other Performance Models 106

6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106

6.8.1 Optimal Solutions with Multiple Spanning Trees 107

6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 107

6.8.3 Network Models Inspired by BitTorrent 108

6.9 Conclusion 111

Acknowledgments 111

References 111

7. Effective Data Access Patterns on Massively Parallel Processors 115
Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini

7.1 Introduction 115

7.2 Architectural Details 116

7.3 K-Model 117

7.3.1 The Architecture 117

7.3.2 Cost and Complexity Evaluation 118

7.3.3 Efficiency Evaluation 119

7.4 Parallel Prefix Sum 120

7.4.1 Experiments 125

7.5 Bitonic Sorting Networks 126

7.5.1 Experiments 131

7.6 Final Remarks 132

Acknowledgments 133

References 133

8. Scalable Storage I/O Software for Blue Gene Architectures 135
Florin Isaila, Javier Garcia, and Jesús Carretero

8.1 Introduction 135

8.2 Blue Gene System Overview 136

8.2.1 Blue Gene Architecture 136

8.2.2 Operating System Architecture 136

8.3 Design and Implementation 138

8.3.1 The Client Module 139

8.3.2 The I/O Module 141

8.4 Conclusions and Future Work 142

Acknowledgments 142

References 142

Part IV Efficient Exploitation af Heterogeneous Architectures 145

9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter

9.1 Introduction 148

9.1.1 Application Model 148

9.1.2 System Model 151

9.1.3 Performance Metrics 152

9.2 Concurrent Workflow Scheduling 153

9.2.1 Offline Scheduling of Concurrent Workflows 154

9.2.2 Online Scheduling of Concurrent Workflows 155

9.3 Experimental Results and Discussion 160

9.3.1 DAG Structure 160

9.3.2 Simulated Platforms 160

9.3.3 Results and Discussion 162

9.4 Conclusions 165

Acknowledgments 166

References 166

10. Systematic Mapping of Reed–Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski

10.1 Introduction 169

10.2 Related Works 171

10.3 Reed–Solomon Codes and Linear Algebra Algorithms 172

10.4 Mapping Reed–Solomon Codes on Cell/B.E. Architecture 173

10.4.1 Cell/B.E. Architecture 173

10.4.2 Basic Assumptions for Mapping 174

10.4.3 Vectorization Algorithm and Increasing its Efficiency 175

10.4.4 Performance Results 177

10.5 Mapping Reed–Solomon Codes on Multicore GPU Architectures 178

10.5.1 Parallelization of Reed–Solomon Codes on GPU Architectures 178

10.5.2 Organization of GPU Threads 180

10.6 Methods of Increasing the Algorithm Performance on GPUs 181

10.6.1 Basic Modifications 181

10.6.2 Stream Processing 182

10.6.3 Using Shared Memory 184

10.7 GPU Performance Evaluation 185

10.7.1 Experimental Results 185

10.7.2 Performance Analysis using the Roofline Model 187

10.8 Conclusions and Future Works 190

Acknowledgments 191

References 191

11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
Daniele D’Agostino, Andrea Clematis, and Emanuele Danovaro

11.1 Introduction 194

11.2 A Low-Cost Heterogeneous Computing Environment 196

11.2.1 Adopted Computing Environment 199

11.3 First Case Study: The N-Body Problem 200

11.3.1 The Sequential N-Body Algorithm 201

11.3.2 The Parallel N-Body Algorithm for Multicore Architectures 203

11.3.3 The Parallel N-Body Algorithm for CUDA Architectures 204

11.4 Second Case Study: The Convolution Algorithm 206

11.4.1 The Sequential Convolver Algorithm 206

11.4.2 The Parallel Convolver Algorithm for Multicore Architectures 207

11.4.3 The Parallel Convolver Algorithm for GPU Architectures 208

11.5 Conclusions 211

Acknowledgments 212

References 212

12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
Alejandro Álvarez-Melcón, Fernando D. Quesada, Domingo Giménez, Carlos Pérez-Alcaraz, José-Ginés Picón, and Tomás Ramírez

12.1 Introduction 215

12.2 Computation of Green’s functions in Hybrid Systems 216

12.2.1 Computation in a Heterogeneous Cluster 217

12.2.2 Experiments 218

12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222

12.3.1 Experiments 222

12.4 Autotuning Parallel Codes 226

12.4.1 Empirical Autotuning 227

12.4.2 Modeling the Linear Algebra Routines 229

12.5 Conclusions and Future Research 230

Acknowledgments 231

References 232

Part V CPU + GPU Coprocessing 235

13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong

13.1 Introduction 238

13.2 Related Work 241

13.3 Data Partitioning Based on Functional Performance Model 243

13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245

13.5 Performance Measurement on CPUs/GPUs System 247

13.6 Functional Performance Models of Multiple Cores and GPUs 248

13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250

13.8 Efficient Building of Functional Performance Models 251

13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253

13.10 Conclusion 257

Acknowledgments 259

References 259

14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
Aleksandar Ilic and Leonel Sousa

14.1 Introduction: Heterogeneous CPU + GPU Systems 262

14.1.1 Open Problems and Specific Contributions 263

14.2 Background and Related Work 265

14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 265

14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 268

14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269

14.3.1 Multilevel Simultaneous Load Balancing Algorithm 270

14.3.2 Algorithm for Multi-Installment Processing with Multidistributions 273

14.4 Experimental Results 275

14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 275

14.4.2 AMPMD Evaluation: 2D FFT Case Study 277

14.5 Conclusions 279

Acknowledgments 280

References 280

15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano

15.1 Introduction 283

15.2 Algorithmic Overview 285

15.2.1 Graph Theory Notation 285

15.2.2 Dijkstra’s Algorithm 286

15.2.3 Parallel Version of Dijkstra’s Algorithm 287

15.3 CUDA Overview 287

15.4 Heterogeneous Systems and Load Balancing 288

15.5 Parallel Solutions to The APSP 289

15.5.1 GPU Implementation 289

15.5.2 Heterogeneous Implementation 290

15.6 Experimental Setup 291

15.6.1 Methodology 291

15.6.2 Target Architectures 292

15.6.3 Input Set Characteristics 292

15.6.4 Load-Balancing Techniques Evaluated 292

15.7 Experimental Results 293

15.7.1 Complete APSP 293

15.7.2 512-Source-Node-to-All Shortest Path 295

15.7.3 Experimental Conclusions 296

15.8 Conclusions 297

Acknowledgments 297

References 297

Part VI Efficient Exploitation of Distributed Systems 301

16. Resource Management for HPC on the Cloud 303
Marc E. Frincu and Dana Petcu

16.1 Introduction 303

16.2 On the Type of Applications for HPC and HPC2 305

16.3 HPC on the Cloud 306

16.3.1 General PaaS Solutions 306

16.3.2 On-Demand Platforms for HPC 310

16.4 Scheduling Algorithms for HPC2 311

16.5 Toward an Autonomous Scheduling Framework 312

16.5.1 Autonomous Framework for RMS 313

16.5.2 Self-Management 315

16.5.3 Use Cases 317

16.6 Conclusions 319

Acknowledgment 320

References 320

17. Resource Discovery in Large-Scale Grid Systems 323
Konstantinos Karaoglanoglou and Helen Karatza

17.1 Introduction and Background 323

17.1.1 Introduction 323

17.1.2 Resource Discovery in Grids 324

17.1.3 Background 325

17.2 The Semantic Communities Approach 325

17.2.1 Grid Resource Discovery Using Semantic Communities 325

17.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 327

17.3 The P2P Approach 329

17.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 329

17.3.2 P2P Protocols for Resource Discovery in the Grid 330

17.4 The Grid-Routing Transferring Approach 333

17.4.1 Resource Discovery Based on Matchmaking Routers 333

17.4.2 Acquiring Knowledge in a Large-Scale Grid System 335

17.5 Conclusions 337

Acknowledgment 338

References 338

Part VII Energy Awareness in High-Performance Computing 341

18. Energy-Aware Approaches for HPC Systems 343
Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson

18.1 Introduction 344

18.2 Power Consumption of Servers 345

18.2.1 Server Modeling 346

18.2.2 Power Prediction Models 347

18.3 Classification and Energy Profiles of HPC Applications 354

18.3.1 Phase Detection 356

18.3.2 Phase Identification 358

18.4 Policies and Leverages 359

18.5 Conclusion 360

Acknowledgements 361

References 361

19. Strategies for Increased Energy Awareness in Cloud Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth

19.1 Introduction 365

19.2 Related Work 367

19.3 Scenarios 369

19.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 369

19.3.2 Energy Considerations in Commercial Cloud Federations 372

19.3.3 Reduced Energy Footprint of Academic Cloud Federations 374

19.4 Energy-Aware Cloud Federations 374

19.4.1 Availability of Energy-Consumption-Related Information 375

19.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 376

19.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 377

19.5 Conclusions 379

Acknowledgments 380

References 380

20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun

20.1 Introduction 384

20.2 Related Work 386

20.3 Overview of Our Approach 387

20.3.1 Heterogeneous CMP Architecture 387

20.3.2 Network Security Application Behavior 388

20.3.3 High-Level View 389

20.4 Heterogeneous CMP Design for Network Security Processors 390

20.4.1 Task Assignment 390

20.4.2 ILP Formulation 391

20.4.3 Discussion 393

20.5 Experimental Evaluation 394

20.5.1 Setup 394

20.5.2 Results 395

20.6 Concluding Remarks 397

Acknowledgments 397

References 397

Part VIII Applications of Heterogeneous High-Performance Computing 401

21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza

21.1 Introduction 404

21.2 CBIR For Hyperspectral Imaging Data 407

21.2.1 Spectral Unmixing 407

21.2.2 Proposed CBIR System 409

21.3 Jungle Computing 410

21.3.1 Jungle Computing: Requirements 411

21.4 IBIS and Constellation 412

21.5 System Design and Implementation 415

21.5.1 Endmember Extraction 418

21.5.2 Query Execution 418

21.5.3 Equi-Kernels 419

21.5.4 Matchmaking 420

21.6 Evaluation 420

21.6.1 Performance Evaluation 421

21.7 Conclusions 426

Acknowledgments 426

References 426

22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun

22.1 Introduction 430

22.2 Related Work 431

22.2.1 Image Processing on GPU 431

22.2.2 Video Processing on GPU 432

22.2.3 Contribution 433

22.3 Parallel Image Processing on GPU 433

22.3.1 Development Scheme for Image Processing on GPU 433

22.3.2 GPU Optimization 434

22.3.3 GPU Implementation of Edge and Corner Detection 434

22.3.4 Performance Analysis and Evaluation 434

22.4 Image Processing on Heterogeneous Architectures 437

22.4.1 Development Scheme for Multiple Image Processing 437

22.4.2 Task Scheduling within Heterogeneous Architectures 438

22.4.3 Optimization Within Heterogeneous Architectures 438

22.5 Video Processing on GPU 438

22.5.1 Development Scheme for Video Processing on GPU 439

22.5.2 GPU Optimizations 440

22.5.3 GPU Implementations 440

22.5.4 GPU-Based Silhouette Extraction 440

22.5.5 GPU-Based Optical Flow Estimation 440

22.5.6 Result Analysis 443

22.6 Experimental Results 444

22.6.1 Heterogeneous Computing for Vertebra Segmentation 444

22.6.2 GPU Computing for Motion Detection Using a Moving Camera 445

22.7 Conclusion 447

Acknowledgment 448

References 448

23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
José Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez

23.1 Introduction 452

23.2 Tomographic Reconstruction 453

23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455

23.4 Hybrid CPU + GPU Tomographic Reconstruction 457

23.5 Results 459

23.6 Discussion and Conclusion 461

Acknowledgments 463

References 463

Index 467

High Performance Computing

    Product form

    £92.66

    Includes FREE delivery

    RRP £102.95 – you save £10.29 (9%)

    Order before 4pm today for delivery by Mon 6 Jul 2026.

    A Hardback by Emmanuel Jeannot, Julius Zilinskas

    1 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of High Performance Computing by Emmanuel Jeannot

      Publisher: John Wiley & Sons Inc
      Publication Date: 01/07/2014
      ISBN13: 9781118712054, 978-1118712054
      ISBN10: 1118712056
      Also in:
      Computer science

      Description

      Book Synopsis

      With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of

      Table of Contents

      Contributors xxiii

      Preface xxvii

      Part I Introduction 1

      1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3
      Emmanuel Jeannot and Julius Žilinskas

      1.1 Introduction and Vision 4

      1.2 Scientific Organization 6

      1.2.1 Scientific Focus 6

      1.2.2 Working Groups 6

      1.3 Activities of the Project 6

      1.3.1 Spring Schools 6

      1.3.2 International Workshops 7

      1.3.3 Working Groups Meetings 7

      1.3.4 Management Committee Meetings 7

      1.3.5 Short-Term Scientific Missions 7

      1.4 Main Outcomes of the Action 7

      1.5 Contents of the Book 8

      Acknowledgment 10

      Part II Numerical Analysis for Heterogeneous and Multicore Systems 11

      2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13
      Dimitar Lukarski and Maya Neytcheva

      2.1 Introduction 14

      2.2 General Description of Iterative Methods and Preconditioning 16

      2.2.1 Basic Iterative Methods 16

      2.2.2 Projection Methods: CG and GMRES 18

      2.3 Preconditioning Techniques 20

      2.4 Defect-Correction Technique 21

      2.5 Multigrid Method 22

      2.6 Parallelization of Iterative Methods 22

      2.7 Heterogeneous Systems 23

      2.7.1 Heterogeneous Computing 24

      2.7.2 Algorithm Characteristics and Resource Utilization 25

      2.7.3 Exposing Parallelism 26

      2.7.4 Heterogeneity in Matrix Computation 26

      2.7.5 Setup of Heterogeneous Iterative Solvers 27

      2.8 Maintenance and Portability 29

      2.9 Conclusion 30

      Acknowledgments 31

      References 31

      3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33
      Matjaž Depolli, Gregor Kosec, and Roman Trobec

      3.1 Introduction 34

      3.2 Test Case 35

      3.2.1 Governing Equations 35

      3.2.2 Solution Procedure 36

      3.3 Parallel Implementation 39

      3.3.1 Intel PCM Library 39

      3.3.2 OpenMP 40

      3.4 Results 41

      3.4.1 Results of Numerical Integration 41

      3.4.2 Parallel Efficiency 42

      3.5 Discussion 45

      3.6 Conclusion 47

      Acknowledgment 47

      References 47

      4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51
      Natalija Tumanova and Raimondas Ciegis

      4.1 Introduction 51

      4.2 Formulation of the Discrete Model 53

      4.2.1 The 𝜃-Implicit Discrete Scheme 55

      4.2.2 The Predictor–Corrector Algorithm I 57

      4.2.3 The Predictor–Corrector Algorithm II 58

      4.3 Parallel Algorithms 59

      4.3.1 Parallel 𝜃-Implicit Algorithm 59

      4.3.2 Parallel Predictor–Corrector Algorithm I 62

      4.3.3 Parallel Predictor–Corrector Algorithm II 63

      4.4 Computational Results 63

      4.4.1 Experimental Comparison of Predictor–Corrector Algorithms 66

      4.4.2 Numerical Experiment of Neuron Excitation 68

      4.5 Conclusions 69

      Acknowledgments 70

      References 70

      Part III Communication and Storage Considerations in High-Performance Computing 73

      5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75
      Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier

      5.1 Introduction 76

      5.2 General Overview 76

      5.2.1 A Key to Scalability: Data Locality 77

      5.2.2 Data Locality Management in Parallel Programming Models 77

      5.2.3 Virtual Topology: Definition and Characteristics 78

      5.2.4 Understanding the Hardware 79

      5.3 Formalization of the Problem 79

      5.4 Algorithmic Strategies for Topology Mapping 81

      5.4.1 Greedy Algorithm Variants 81

      5.4.2 Graph Partitioning 82

      5.4.3 Schemes Based on Graph Similarity 82

      5.4.4 Schemes Based on Subgraph Isomorphism 82

      5.5 Mapping Enforcement Techniques 82

      5.5.1 Resource Binding 83

      5.5.2 Rank Reordering 83

      5.5.3 Other Techniques 84

      5.6 Survey of Solutions 85

      5.6.1 Algorithmic Solutions 85

      5.6.2 Existing Implementations 85

      5.7 Conclusion and Open Problems 89

      Acknowledgment 90

      References 90

      6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95
      Kiril Dichev and Alexey Lastovetsky

      6.1 Introduction 95

      6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97

      6.3 Optimizations of Collectives on Homogeneous Clusters 98

      6.4 Heterogeneous Networks 99

      6.4.1 Comparison to Homogeneous Clusters 99

      6.5 Topology- and Performance-Aware Collectives 100

      6.6 Topology as Input 101

      6.7 Performance as Input 102

      6.7.1 Homogeneous Performance Models 103

      6.7.2 Heterogeneous Performance Models 105

      6.7.3 Estimation of Parameters of Heterogeneous Performance Models 106

      6.7.4 Other Performance Models 106

      6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106

      6.8.1 Optimal Solutions with Multiple Spanning Trees 107

      6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 107

      6.8.3 Network Models Inspired by BitTorrent 108

      6.9 Conclusion 111

      Acknowledgments 111

      References 111

      7. Effective Data Access Patterns on Massively Parallel Processors 115
      Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini

      7.1 Introduction 115

      7.2 Architectural Details 116

      7.3 K-Model 117

      7.3.1 The Architecture 117

      7.3.2 Cost and Complexity Evaluation 118

      7.3.3 Efficiency Evaluation 119

      7.4 Parallel Prefix Sum 120

      7.4.1 Experiments 125

      7.5 Bitonic Sorting Networks 126

      7.5.1 Experiments 131

      7.6 Final Remarks 132

      Acknowledgments 133

      References 133

      8. Scalable Storage I/O Software for Blue Gene Architectures 135
      Florin Isaila, Javier Garcia, and Jesús Carretero

      8.1 Introduction 135

      8.2 Blue Gene System Overview 136

      8.2.1 Blue Gene Architecture 136

      8.2.2 Operating System Architecture 136

      8.3 Design and Implementation 138

      8.3.1 The Client Module 139

      8.3.2 The I/O Module 141

      8.4 Conclusions and Future Work 142

      Acknowledgments 142

      References 142

      Part IV Efficient Exploitation af Heterogeneous Architectures 145

      9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147
      Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter

      9.1 Introduction 148

      9.1.1 Application Model 148

      9.1.2 System Model 151

      9.1.3 Performance Metrics 152

      9.2 Concurrent Workflow Scheduling 153

      9.2.1 Offline Scheduling of Concurrent Workflows 154

      9.2.2 Online Scheduling of Concurrent Workflows 155

      9.3 Experimental Results and Discussion 160

      9.3.1 DAG Structure 160

      9.3.2 Simulated Platforms 160

      9.3.3 Results and Discussion 162

      9.4 Conclusions 165

      Acknowledgments 166

      References 166

      10. Systematic Mapping of Reed–Solomon Erasure Codes on Heterogeneous Multicore Architectures 169
      Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski

      10.1 Introduction 169

      10.2 Related Works 171

      10.3 Reed–Solomon Codes and Linear Algebra Algorithms 172

      10.4 Mapping Reed–Solomon Codes on Cell/B.E. Architecture 173

      10.4.1 Cell/B.E. Architecture 173

      10.4.2 Basic Assumptions for Mapping 174

      10.4.3 Vectorization Algorithm and Increasing its Efficiency 175

      10.4.4 Performance Results 177

      10.5 Mapping Reed–Solomon Codes on Multicore GPU Architectures 178

      10.5.1 Parallelization of Reed–Solomon Codes on GPU Architectures 178

      10.5.2 Organization of GPU Threads 180

      10.6 Methods of Increasing the Algorithm Performance on GPUs 181

      10.6.1 Basic Modifications 181

      10.6.2 Stream Processing 182

      10.6.3 Using Shared Memory 184

      10.7 GPU Performance Evaluation 185

      10.7.1 Experimental Results 185

      10.7.2 Performance Analysis using the Roofline Model 187

      10.8 Conclusions and Future Works 190

      Acknowledgments 191

      References 191

      11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193
      Daniele D’Agostino, Andrea Clematis, and Emanuele Danovaro

      11.1 Introduction 194

      11.2 A Low-Cost Heterogeneous Computing Environment 196

      11.2.1 Adopted Computing Environment 199

      11.3 First Case Study: The N-Body Problem 200

      11.3.1 The Sequential N-Body Algorithm 201

      11.3.2 The Parallel N-Body Algorithm for Multicore Architectures 203

      11.3.3 The Parallel N-Body Algorithm for CUDA Architectures 204

      11.4 Second Case Study: The Convolution Algorithm 206

      11.4.1 The Sequential Convolver Algorithm 206

      11.4.2 The Parallel Convolver Algorithm for Multicore Architectures 207

      11.4.3 The Parallel Convolver Algorithm for GPU Architectures 208

      11.5 Conclusions 211

      Acknowledgments 212

      References 212

      12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215
      Alejandro Álvarez-Melcón, Fernando D. Quesada, Domingo Giménez, Carlos Pérez-Alcaraz, José-Ginés Picón, and Tomás Ramírez

      12.1 Introduction 215

      12.2 Computation of Green’s functions in Hybrid Systems 216

      12.2.1 Computation in a Heterogeneous Cluster 217

      12.2.2 Experiments 218

      12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222

      12.3.1 Experiments 222

      12.4 Autotuning Parallel Codes 226

      12.4.1 Empirical Autotuning 227

      12.4.2 Modeling the Linear Algebra Routines 229

      12.5 Conclusions and Future Research 230

      Acknowledgments 231

      References 232

      Part V CPU + GPU Coprocessing 235

      13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237
      David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong

      13.1 Introduction 238

      13.2 Related Work 241

      13.3 Data Partitioning Based on Functional Performance Model 243

      13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245

      13.5 Performance Measurement on CPUs/GPUs System 247

      13.6 Functional Performance Models of Multiple Cores and GPUs 248

      13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250

      13.8 Efficient Building of Functional Performance Models 251

      13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253

      13.10 Conclusion 257

      Acknowledgments 259

      References 259

      14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261
      Aleksandar Ilic and Leonel Sousa

      14.1 Introduction: Heterogeneous CPU + GPU Systems 262

      14.1.1 Open Problems and Specific Contributions 263

      14.2 Background and Related Work 265

      14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 265

      14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 268

      14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269

      14.3.1 Multilevel Simultaneous Load Balancing Algorithm 270

      14.3.2 Algorithm for Multi-Installment Processing with Multidistributions 273

      14.4 Experimental Results 275

      14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 275

      14.4.2 AMPMD Evaluation: 2D FFT Case Study 277

      14.5 Conclusions 279

      Acknowledgments 280

      References 280

      15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283
      Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano

      15.1 Introduction 283

      15.2 Algorithmic Overview 285

      15.2.1 Graph Theory Notation 285

      15.2.2 Dijkstra’s Algorithm 286

      15.2.3 Parallel Version of Dijkstra’s Algorithm 287

      15.3 CUDA Overview 287

      15.4 Heterogeneous Systems and Load Balancing 288

      15.5 Parallel Solutions to The APSP 289

      15.5.1 GPU Implementation 289

      15.5.2 Heterogeneous Implementation 290

      15.6 Experimental Setup 291

      15.6.1 Methodology 291

      15.6.2 Target Architectures 292

      15.6.3 Input Set Characteristics 292

      15.6.4 Load-Balancing Techniques Evaluated 292

      15.7 Experimental Results 293

      15.7.1 Complete APSP 293

      15.7.2 512-Source-Node-to-All Shortest Path 295

      15.7.3 Experimental Conclusions 296

      15.8 Conclusions 297

      Acknowledgments 297

      References 297

      Part VI Efficient Exploitation of Distributed Systems 301

      16. Resource Management for HPC on the Cloud 303
      Marc E. Frincu and Dana Petcu

      16.1 Introduction 303

      16.2 On the Type of Applications for HPC and HPC2 305

      16.3 HPC on the Cloud 306

      16.3.1 General PaaS Solutions 306

      16.3.2 On-Demand Platforms for HPC 310

      16.4 Scheduling Algorithms for HPC2 311

      16.5 Toward an Autonomous Scheduling Framework 312

      16.5.1 Autonomous Framework for RMS 313

      16.5.2 Self-Management 315

      16.5.3 Use Cases 317

      16.6 Conclusions 319

      Acknowledgment 320

      References 320

      17. Resource Discovery in Large-Scale Grid Systems 323
      Konstantinos Karaoglanoglou and Helen Karatza

      17.1 Introduction and Background 323

      17.1.1 Introduction 323

      17.1.2 Resource Discovery in Grids 324

      17.1.3 Background 325

      17.2 The Semantic Communities Approach 325

      17.2.1 Grid Resource Discovery Using Semantic Communities 325

      17.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 327

      17.3 The P2P Approach 329

      17.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 329

      17.3.2 P2P Protocols for Resource Discovery in the Grid 330

      17.4 The Grid-Routing Transferring Approach 333

      17.4.1 Resource Discovery Based on Matchmaking Routers 333

      17.4.2 Acquiring Knowledge in a Large-Scale Grid System 335

      17.5 Conclusions 337

      Acknowledgment 338

      References 338

      Part VII Energy Awareness in High-Performance Computing 341

      18. Energy-Aware Approaches for HPC Systems 343
      Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson

      18.1 Introduction 344

      18.2 Power Consumption of Servers 345

      18.2.1 Server Modeling 346

      18.2.2 Power Prediction Models 347

      18.3 Classification and Energy Profiles of HPC Applications 354

      18.3.1 Phase Detection 356

      18.3.2 Phase Identification 358

      18.4 Policies and Leverages 359

      18.5 Conclusion 360

      Acknowledgements 361

      References 361

      19. Strategies for Increased Energy Awareness in Cloud Federations 365
      Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth

      19.1 Introduction 365

      19.2 Related Work 367

      19.3 Scenarios 369

      19.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 369

      19.3.2 Energy Considerations in Commercial Cloud Federations 372

      19.3.3 Reduced Energy Footprint of Academic Cloud Federations 374

      19.4 Energy-Aware Cloud Federations 374

      19.4.1 Availability of Energy-Consumption-Related Information 375

      19.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 376

      19.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 377

      19.5 Conclusions 379

      Acknowledgments 380

      References 380

      20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383
      Ozcan Ozturk and Suleyman Tosun

      20.1 Introduction 384

      20.2 Related Work 386

      20.3 Overview of Our Approach 387

      20.3.1 Heterogeneous CMP Architecture 387

      20.3.2 Network Security Application Behavior 388

      20.3.3 High-Level View 389

      20.4 Heterogeneous CMP Design for Network Security Processors 390

      20.4.1 Task Assignment 390

      20.4.2 ILP Formulation 391

      20.4.3 Discussion 393

      20.5 Experimental Evaluation 394

      20.5.1 Setup 394

      20.5.2 Results 395

      20.6 Concluding Remarks 397

      Acknowledgments 397

      References 397

      Part VIII Applications of Heterogeneous High-Performance Computing 401

      21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403
      Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza

      21.1 Introduction 404

      21.2 CBIR For Hyperspectral Imaging Data 407

      21.2.1 Spectral Unmixing 407

      21.2.2 Proposed CBIR System 409

      21.3 Jungle Computing 410

      21.3.1 Jungle Computing: Requirements 411

      21.4 IBIS and Constellation 412

      21.5 System Design and Implementation 415

      21.5.1 Endmember Extraction 418

      21.5.2 Query Execution 418

      21.5.3 Equi-Kernels 419

      21.5.4 Matchmaking 420

      21.6 Evaluation 420

      21.6.1 Performance Evaluation 421

      21.7 Conclusions 426

      Acknowledgments 426

      References 426

      22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429
      Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun

      22.1 Introduction 430

      22.2 Related Work 431

      22.2.1 Image Processing on GPU 431

      22.2.2 Video Processing on GPU 432

      22.2.3 Contribution 433

      22.3 Parallel Image Processing on GPU 433

      22.3.1 Development Scheme for Image Processing on GPU 433

      22.3.2 GPU Optimization 434

      22.3.3 GPU Implementation of Edge and Corner Detection 434

      22.3.4 Performance Analysis and Evaluation 434

      22.4 Image Processing on Heterogeneous Architectures 437

      22.4.1 Development Scheme for Multiple Image Processing 437

      22.4.2 Task Scheduling within Heterogeneous Architectures 438

      22.4.3 Optimization Within Heterogeneous Architectures 438

      22.5 Video Processing on GPU 438

      22.5.1 Development Scheme for Video Processing on GPU 439

      22.5.2 GPU Optimizations 440

      22.5.3 GPU Implementations 440

      22.5.4 GPU-Based Silhouette Extraction 440

      22.5.5 GPU-Based Optical Flow Estimation 440

      22.5.6 Result Analysis 443

      22.6 Experimental Results 444

      22.6.1 Heterogeneous Computing for Vertebra Segmentation 444

      22.6.2 GPU Computing for Motion Detection Using a Moving Camera 445

      22.7 Conclusion 447

      Acknowledgment 448

      References 448

      23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451
      José Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez

      23.1 Introduction 452

      23.2 Tomographic Reconstruction 453

      23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455

      23.4 Hybrid CPU + GPU Tomographic Reconstruction 457

      23.5 Results 459

      23.6 Discussion and Conclusion 461

      Acknowledgments 463

      References 463

      Index 467

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account