{"product_id":"high-performance-computing-9781118712054","title":"High Performance Computing","description":"\u003cb\u003eBook Synopsis\u003c\/b\u003e\u003cbr\u003e\u003cp\u003eWith recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of\u003cbr\u003e\u003cbr\u003e\u003cb\u003eTable of Contents\u003c\/b\u003e\u003cbr\u003e\u003c\/p\u003e\u003cp\u003eContributors xxiii\u003c\/p\u003e \u003cp\u003ePreface xxvii\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart I Introduction 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eEmmanuel Jeannot and Julius ilinskas\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e1.1 Introduction and Vision 4\u003c\/p\u003e \u003cp\u003e1.2 Scientific Organization 6\u003c\/p\u003e \u003cp\u003e1.2.1 Scientific Focus 6\u003c\/p\u003e \u003cp\u003e1.2.2 Working Groups 6\u003c\/p\u003e \u003cp\u003e1.3 Activities of the Project 6\u003c\/p\u003e \u003cp\u003e1.3.1 Spring Schools 6\u003c\/p\u003e \u003cp\u003e1.3.2 International Workshops 7\u003c\/p\u003e \u003cp\u003e1.3.3 Working Groups Meetings 7\u003c\/p\u003e \u003cp\u003e1.3.4 Management Committee Meetings 7\u003c\/p\u003e \u003cp\u003e1.3.5 Short-Term Scientific Missions 7\u003c\/p\u003e \u003cp\u003e1.4 Main Outcomes of the Action 7\u003c\/p\u003e \u003cp\u003e1.5 Contents of the Book 8\u003c\/p\u003e \u003cp\u003eAcknowledgment 10\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart II Numerical Analysis for Heterogeneous and Multicore Systems 11\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eDimitar Lukarski and Maya Neytcheva\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e2.1 Introduction 14\u003c\/p\u003e \u003cp\u003e2.2 General Description of Iterative Methods and Preconditioning 16\u003c\/p\u003e \u003cp\u003e2.2.1 Basic Iterative Methods 16\u003c\/p\u003e \u003cp\u003e2.2.2 Projection Methods: CG and GMRES 18\u003c\/p\u003e \u003cp\u003e2.3 Preconditioning Techniques 20\u003c\/p\u003e \u003cp\u003e2.4 Defect-Correction Technique 21\u003c\/p\u003e \u003cp\u003e2.5 Multigrid Method 22\u003c\/p\u003e \u003cp\u003e2.6 Parallelization of Iterative Methods 22\u003c\/p\u003e \u003cp\u003e2.7 Heterogeneous Systems 23\u003c\/p\u003e \u003cp\u003e2.7.1 Heterogeneous Computing 24\u003c\/p\u003e \u003cp\u003e2.7.2 Algorithm Characteristics and Resource Utilization 25\u003c\/p\u003e \u003cp\u003e2.7.3 Exposing Parallelism 26\u003c\/p\u003e \u003cp\u003e2.7.4 Heterogeneity in Matrix Computation 26\u003c\/p\u003e \u003cp\u003e2.7.5 Setup of Heterogeneous Iterative Solvers 27\u003c\/p\u003e \u003cp\u003e2.8 Maintenance and Portability 29\u003c\/p\u003e \u003cp\u003e2.9 Conclusion 30\u003c\/p\u003e \u003cp\u003eAcknowledgments 31\u003c\/p\u003e \u003cp\u003eReferences 31\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eMatja Depolli, Gregor Kosec, and Roman Trobec\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e3.1 Introduction 34\u003c\/p\u003e \u003cp\u003e3.2 Test Case 35\u003c\/p\u003e \u003cp\u003e3.2.1 Governing Equations 35\u003c\/p\u003e \u003cp\u003e3.2.2 Solution Procedure 36\u003c\/p\u003e \u003cp\u003e3.3 Parallel Implementation 39\u003c\/p\u003e \u003cp\u003e3.3.1 Intel PCM Library 39\u003c\/p\u003e \u003cp\u003e3.3.2 OpenMP 40\u003c\/p\u003e \u003cp\u003e3.4 Results 41\u003c\/p\u003e \u003cp\u003e3.4.1 Results of Numerical Integration 41\u003c\/p\u003e \u003cp\u003e3.4.2 Parallel Efficiency 42\u003c\/p\u003e \u003cp\u003e3.5 Discussion 45\u003c\/p\u003e \u003cp\u003e3.6 Conclusion 47\u003c\/p\u003e \u003cp\u003eAcknowledgment 47\u003c\/p\u003e \u003cp\u003eReferences 47\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eNatalija Tumanova and Raimondas Ciegis\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e4.1 Introduction 51\u003c\/p\u003e \u003cp\u003e4.2 Formulation of the Discrete Model 53\u003c\/p\u003e \u003cp\u003e4.2.1 The \u003ci\u003e𝜃\u003c\/i\u003e-Implicit Discrete Scheme 55\u003c\/p\u003e \u003cp\u003e4.2.2 The Predictor–Corrector Algorithm I 57\u003c\/p\u003e \u003cp\u003e4.2.3 The Predictor–Corrector Algorithm II 58\u003c\/p\u003e \u003cp\u003e4.3 Parallel Algorithms 59\u003c\/p\u003e \u003cp\u003e4.3.1 Parallel \u003ci\u003e𝜃\u003c\/i\u003e-Implicit Algorithm 59\u003c\/p\u003e \u003cp\u003e4.3.2 Parallel Predictor–Corrector Algorithm I 62\u003c\/p\u003e \u003cp\u003e4.3.3 Parallel Predictor–Corrector Algorithm II 63\u003c\/p\u003e \u003cp\u003e4.4 Computational Results 63\u003c\/p\u003e \u003cp\u003e4.4.1 Experimental Comparison of Predictor–Corrector Algorithms 66\u003c\/p\u003e \u003cp\u003e4.4.2 Numerical Experiment of Neuron Excitation 68\u003c\/p\u003e \u003cp\u003e4.5 Conclusions 69\u003c\/p\u003e \u003cp\u003eAcknowledgments 70\u003c\/p\u003e \u003cp\u003eReferences 70\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart III Communication and Storage Considerations in High-Performance Computing 73\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eTorsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e5.1 Introduction 76\u003c\/p\u003e \u003cp\u003e5.2 General Overview 76\u003c\/p\u003e \u003cp\u003e5.2.1 A Key to Scalability: Data Locality 77\u003c\/p\u003e \u003cp\u003e5.2.2 Data Locality Management in Parallel Programming Models 77\u003c\/p\u003e \u003cp\u003e5.2.3 Virtual Topology: Definition and Characteristics 78\u003c\/p\u003e \u003cp\u003e5.2.4 Understanding the Hardware 79\u003c\/p\u003e \u003cp\u003e5.3 Formalization of the Problem 79\u003c\/p\u003e \u003cp\u003e5.4 Algorithmic Strategies for Topology Mapping 81\u003c\/p\u003e \u003cp\u003e5.4.1 Greedy Algorithm Variants 81\u003c\/p\u003e \u003cp\u003e5.4.2 Graph Partitioning 82\u003c\/p\u003e \u003cp\u003e5.4.3 Schemes Based on Graph Similarity 82\u003c\/p\u003e \u003cp\u003e5.4.4 Schemes Based on Subgraph Isomorphism 82\u003c\/p\u003e \u003cp\u003e5.5 Mapping Enforcement Techniques 82\u003c\/p\u003e \u003cp\u003e5.5.1 Resource Binding 83\u003c\/p\u003e \u003cp\u003e5.5.2 Rank Reordering 83\u003c\/p\u003e \u003cp\u003e5.5.3 Other Techniques 84\u003c\/p\u003e \u003cp\u003e5.6 Survey of Solutions 85\u003c\/p\u003e \u003cp\u003e5.6.1 Algorithmic Solutions 85\u003c\/p\u003e \u003cp\u003e5.6.2 Existing Implementations 85\u003c\/p\u003e \u003cp\u003e5.7 Conclusion and Open Problems 89\u003c\/p\u003e \u003cp\u003eAcknowledgment 90\u003c\/p\u003e \u003cp\u003eReferences 90\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eKiril Dichev and Alexey Lastovetsky\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e6.1 Introduction 95\u003c\/p\u003e \u003cp\u003e6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97\u003c\/p\u003e \u003cp\u003e6.3 Optimizations of Collectives on Homogeneous Clusters 98\u003c\/p\u003e \u003cp\u003e6.4 Heterogeneous Networks 99\u003c\/p\u003e \u003cp\u003e6.4.1 Comparison to Homogeneous Clusters 99\u003c\/p\u003e \u003cp\u003e6.5 Topology- and Performance-Aware Collectives 100\u003c\/p\u003e \u003cp\u003e6.6 Topology as Input 101\u003c\/p\u003e \u003cp\u003e6.7 Performance as Input 102\u003c\/p\u003e \u003cp\u003e6.7.1 Homogeneous Performance Models 103\u003c\/p\u003e \u003cp\u003e6.7.2 Heterogeneous Performance Models 105\u003c\/p\u003e \u003cp\u003e6.7.3 Estimation of Parameters of Heterogeneous Performance Models 106\u003c\/p\u003e \u003cp\u003e6.7.4 Other Performance Models 106\u003c\/p\u003e \u003cp\u003e6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106\u003c\/p\u003e \u003cp\u003e6.8.1 Optimal Solutions with Multiple Spanning Trees 107\u003c\/p\u003e \u003cp\u003e6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 107\u003c\/p\u003e \u003cp\u003e6.8.3 Network Models Inspired by BitTorrent 108\u003c\/p\u003e \u003cp\u003e6.9 Conclusion 111\u003c\/p\u003e \u003cp\u003eAcknowledgments 111\u003c\/p\u003e \u003cp\u003eReferences 111\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7. Effective Data Access Patterns on Massively Parallel Processors 115\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eGabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e7.1 Introduction 115\u003c\/p\u003e \u003cp\u003e7.2 Architectural Details 116\u003c\/p\u003e \u003cp\u003e7.3 \u003ci\u003eK\u003c\/i\u003e-Model 117\u003c\/p\u003e \u003cp\u003e7.3.1 The Architecture 117\u003c\/p\u003e \u003cp\u003e7.3.2 Cost and Complexity Evaluation 118\u003c\/p\u003e \u003cp\u003e7.3.3 Efficiency Evaluation 119\u003c\/p\u003e \u003cp\u003e7.4 Parallel Prefix Sum 120\u003c\/p\u003e \u003cp\u003e7.4.1 Experiments 125\u003c\/p\u003e \u003cp\u003e7.5 Bitonic Sorting Networks 126\u003c\/p\u003e \u003cp\u003e7.5.1 Experiments 131\u003c\/p\u003e \u003cp\u003e7.6 Final Remarks 132\u003c\/p\u003e \u003cp\u003eAcknowledgments 133\u003c\/p\u003e \u003cp\u003eReferences 133\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8. Scalable Storage I\/O Software for Blue Gene Architectures 135\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eFlorin Isaila, Javier Garcia, and Jesús Carretero\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e8.1 Introduction 135\u003c\/p\u003e \u003cp\u003e8.2 Blue Gene System Overview 136\u003c\/p\u003e \u003cp\u003e8.2.1 Blue Gene Architecture 136\u003c\/p\u003e \u003cp\u003e8.2.2 Operating System Architecture 136\u003c\/p\u003e \u003cp\u003e8.3 Design and Implementation 138\u003c\/p\u003e \u003cp\u003e8.3.1 The Client Module 139\u003c\/p\u003e \u003cp\u003e8.3.2 The I\/O Module 141\u003c\/p\u003e \u003cp\u003e8.4 Conclusions and Future Work 142\u003c\/p\u003e \u003cp\u003eAcknowledgments 142\u003c\/p\u003e \u003cp\u003eReferences 142\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart IV Efficient Exploitation af Heterogeneous Architectures 145\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eHamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e9.1 Introduction 148\u003c\/p\u003e \u003cp\u003e9.1.1 Application Model 148\u003c\/p\u003e \u003cp\u003e9.1.2 System Model 151\u003c\/p\u003e \u003cp\u003e9.1.3 Performance Metrics 152\u003c\/p\u003e \u003cp\u003e9.2 Concurrent Workflow Scheduling 153\u003c\/p\u003e \u003cp\u003e9.2.1 Offline Scheduling of Concurrent Workflows 154\u003c\/p\u003e \u003cp\u003e9.2.2 Online Scheduling of Concurrent Workflows 155\u003c\/p\u003e \u003cp\u003e9.3 Experimental Results and Discussion 160\u003c\/p\u003e \u003cp\u003e9.3.1 DAG Structure 160\u003c\/p\u003e \u003cp\u003e9.3.2 Simulated Platforms 160\u003c\/p\u003e \u003cp\u003e9.3.3 Results and Discussion 162\u003c\/p\u003e \u003cp\u003e9.4 Conclusions 165\u003c\/p\u003e \u003cp\u003eAcknowledgments 166\u003c\/p\u003e \u003cp\u003eReferences 166\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10. Systematic Mapping of Reed–Solomon Erasure Codes on Heterogeneous Multicore Architectures 169\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eRoman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e10.1 Introduction 169\u003c\/p\u003e \u003cp\u003e10.2 Related Works 171\u003c\/p\u003e \u003cp\u003e10.3 Reed–Solomon Codes and Linear Algebra Algorithms 172\u003c\/p\u003e \u003cp\u003e10.4 Mapping Reed–Solomon Codes on Cell\/B.E. Architecture 173\u003c\/p\u003e \u003cp\u003e10.4.1 Cell\/B.E. Architecture 173\u003c\/p\u003e \u003cp\u003e10.4.2 Basic Assumptions for Mapping 174\u003c\/p\u003e \u003cp\u003e10.4.3 Vectorization Algorithm and Increasing its Efficiency 175\u003c\/p\u003e \u003cp\u003e10.4.4 Performance Results 177\u003c\/p\u003e \u003cp\u003e10.5 Mapping Reed–Solomon Codes on Multicore GPU Architectures 178\u003c\/p\u003e \u003cp\u003e10.5.1 Parallelization of Reed–Solomon Codes on GPU Architectures 178\u003c\/p\u003e \u003cp\u003e10.5.2 Organization of GPU Threads 180\u003c\/p\u003e \u003cp\u003e10.6 Methods of Increasing the Algorithm Performance on GPUs 181\u003c\/p\u003e \u003cp\u003e10.6.1 Basic Modifications 181\u003c\/p\u003e \u003cp\u003e10.6.2 Stream Processing 182\u003c\/p\u003e \u003cp\u003e10.6.3 Using Shared Memory 184\u003c\/p\u003e \u003cp\u003e10.7 GPU Performance Evaluation 185\u003c\/p\u003e \u003cp\u003e10.7.1 Experimental Results 185\u003c\/p\u003e \u003cp\u003e10.7.2 Performance Analysis using the Roofline Model 187\u003c\/p\u003e \u003cp\u003e10.8 Conclusions and Future Works 190\u003c\/p\u003e \u003cp\u003eAcknowledgments 191\u003c\/p\u003e \u003cp\u003eReferences 191\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eDaniele D’Agostino, Andrea Clematis, and Emanuele Danovaro\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e11.1 Introduction 194\u003c\/p\u003e \u003cp\u003e11.2 A Low-Cost Heterogeneous Computing Environment 196\u003c\/p\u003e \u003cp\u003e11.2.1 Adopted Computing Environment 199\u003c\/p\u003e \u003cp\u003e11.3 First Case Study: The \u003ci\u003eN\u003c\/i\u003e-Body Problem 200\u003c\/p\u003e \u003cp\u003e11.3.1 The Sequential \u003ci\u003eN\u003c\/i\u003e-Body Algorithm 201\u003c\/p\u003e \u003cp\u003e11.3.2 The Parallel \u003ci\u003eN\u003c\/i\u003e-Body Algorithm for Multicore Architectures 203\u003c\/p\u003e \u003cp\u003e11.3.3 The Parallel \u003ci\u003eN\u003c\/i\u003e-Body Algorithm for CUDA Architectures 204\u003c\/p\u003e \u003cp\u003e11.4 Second Case Study: The Convolution Algorithm 206\u003c\/p\u003e \u003cp\u003e11.4.1 The Sequential Convolver Algorithm 206\u003c\/p\u003e \u003cp\u003e11.4.2 The Parallel Convolver Algorithm for Multicore Architectures 207\u003c\/p\u003e \u003cp\u003e11.4.3 The Parallel Convolver Algorithm for GPU Architectures 208\u003c\/p\u003e \u003cp\u003e11.5 Conclusions 211\u003c\/p\u003e \u003cp\u003eAcknowledgments 212\u003c\/p\u003e \u003cp\u003eReferences 212\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eAlejandro Álvarez-Melcón, Fernando D. Quesada, Domingo Giménez, Carlos Pérez-Alcaraz, José-Ginés Picón, and Tomás Ramírez\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e12.1 Introduction 215\u003c\/p\u003e \u003cp\u003e12.2 Computation of Green’s functions in Hybrid Systems 216\u003c\/p\u003e \u003cp\u003e12.2.1 Computation in a Heterogeneous Cluster 217\u003c\/p\u003e \u003cp\u003e12.2.2 Experiments 218\u003c\/p\u003e \u003cp\u003e12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222\u003c\/p\u003e \u003cp\u003e12.3.1 Experiments 222\u003c\/p\u003e \u003cp\u003e12.4 Autotuning Parallel Codes 226\u003c\/p\u003e \u003cp\u003e12.4.1 Empirical Autotuning 227\u003c\/p\u003e \u003cp\u003e12.4.2 Modeling the Linear Algebra Routines 229\u003c\/p\u003e \u003cp\u003e12.5 Conclusions and Future Research 230\u003c\/p\u003e \u003cp\u003eAcknowledgments 231\u003c\/p\u003e \u003cp\u003eReferences 232\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart V CPU + GPU Coprocessing 235\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eDavid Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e13.1 Introduction 238\u003c\/p\u003e \u003cp\u003e13.2 Related Work 241\u003c\/p\u003e \u003cp\u003e13.3 Data Partitioning Based on Functional Performance Model 243\u003c\/p\u003e \u003cp\u003e13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245\u003c\/p\u003e \u003cp\u003e13.5 Performance Measurement on CPUs\/GPUs System 247\u003c\/p\u003e \u003cp\u003e13.6 Functional Performance Models of Multiple Cores and GPUs 248\u003c\/p\u003e \u003cp\u003e13.7 FPM-Based Data Partitioning on CPUs\/GPUs System 250\u003c\/p\u003e \u003cp\u003e13.8 Efficient Building of Functional Performance Models 251\u003c\/p\u003e \u003cp\u003e13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253\u003c\/p\u003e \u003cp\u003e13.10 Conclusion 257\u003c\/p\u003e \u003cp\u003eAcknowledgments 259\u003c\/p\u003e \u003cp\u003eReferences 259\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eAleksandar Ilic and Leonel Sousa\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e14.1 Introduction: Heterogeneous CPU + GPU Systems 262\u003c\/p\u003e \u003cp\u003e14.1.1 Open Problems and Specific Contributions 263\u003c\/p\u003e \u003cp\u003e14.2 Background and Related Work 265\u003c\/p\u003e \u003cp\u003e14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 265\u003c\/p\u003e \u003cp\u003e14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 268\u003c\/p\u003e \u003cp\u003e14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269\u003c\/p\u003e \u003cp\u003e14.3.1 Multilevel Simultaneous Load Balancing Algorithm 270\u003c\/p\u003e \u003cp\u003e14.3.2 Algorithm for Multi-Installment Processing with Multidistributions 273\u003c\/p\u003e \u003cp\u003e14.4 Experimental Results 275\u003c\/p\u003e \u003cp\u003e14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 275\u003c\/p\u003e \u003cp\u003e14.4.2 AMPMD Evaluation: 2D FFT Case Study 277\u003c\/p\u003e \u003cp\u003e14.5 Conclusions 279\u003c\/p\u003e \u003cp\u003eAcknowledgments 280\u003c\/p\u003e \u003cp\u003eReferences 280\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eHector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e15.1 Introduction 283\u003c\/p\u003e \u003cp\u003e15.2 Algorithmic Overview 285\u003c\/p\u003e \u003cp\u003e15.2.1 Graph Theory Notation 285\u003c\/p\u003e \u003cp\u003e15.2.2 Dijkstra’s Algorithm 286\u003c\/p\u003e \u003cp\u003e15.2.3 Parallel Version of Dijkstra’s Algorithm 287\u003c\/p\u003e \u003cp\u003e15.3 CUDA Overview 287\u003c\/p\u003e \u003cp\u003e15.4 Heterogeneous Systems and Load Balancing 288\u003c\/p\u003e \u003cp\u003e15.5 Parallel Solutions to The APSP 289\u003c\/p\u003e \u003cp\u003e15.5.1 GPU Implementation 289\u003c\/p\u003e \u003cp\u003e15.5.2 Heterogeneous Implementation 290\u003c\/p\u003e \u003cp\u003e15.6 Experimental Setup 291\u003c\/p\u003e \u003cp\u003e15.6.1 Methodology 291\u003c\/p\u003e \u003cp\u003e15.6.2 Target Architectures 292\u003c\/p\u003e \u003cp\u003e15.6.3 Input Set Characteristics 292\u003c\/p\u003e \u003cp\u003e15.6.4 Load-Balancing Techniques Evaluated 292\u003c\/p\u003e \u003cp\u003e15.7 Experimental Results 293\u003c\/p\u003e \u003cp\u003e15.7.1 Complete APSP 293\u003c\/p\u003e \u003cp\u003e15.7.2 512-Source-Node-to-All Shortest Path 295\u003c\/p\u003e \u003cp\u003e15.7.3 Experimental Conclusions 296\u003c\/p\u003e \u003cp\u003e15.8 Conclusions 297\u003c\/p\u003e \u003cp\u003eAcknowledgments 297\u003c\/p\u003e \u003cp\u003eReferences 297\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart VI Efficient Exploitation of Distributed Systems 301\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16. Resource Management for HPC on the Cloud 303\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eMarc E. Frincu and Dana Petcu\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e16.1 Introduction 303\u003c\/p\u003e \u003cp\u003e16.2 On the Type of Applications for HPC and HPC2 305\u003c\/p\u003e \u003cp\u003e16.3 HPC on the Cloud 306\u003c\/p\u003e \u003cp\u003e16.3.1 General PaaS Solutions 306\u003c\/p\u003e \u003cp\u003e16.3.2 On-Demand Platforms for HPC 310\u003c\/p\u003e \u003cp\u003e16.4 Scheduling Algorithms for HPC2 311\u003c\/p\u003e \u003cp\u003e16.5 Toward an Autonomous Scheduling Framework 312\u003c\/p\u003e \u003cp\u003e16.5.1 Autonomous Framework for RMS 313\u003c\/p\u003e \u003cp\u003e16.5.2 Self-Management 315\u003c\/p\u003e \u003cp\u003e16.5.3 Use Cases 317\u003c\/p\u003e \u003cp\u003e16.6 Conclusions 319\u003c\/p\u003e \u003cp\u003eAcknowledgment 320\u003c\/p\u003e \u003cp\u003eReferences 320\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17. Resource Discovery in Large-Scale Grid Systems 323\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eKonstantinos Karaoglanoglou and Helen Karatza\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e17.1 Introduction and Background 323\u003c\/p\u003e \u003cp\u003e17.1.1 Introduction 323\u003c\/p\u003e \u003cp\u003e17.1.2 Resource Discovery in Grids 324\u003c\/p\u003e \u003cp\u003e17.1.3 Background 325\u003c\/p\u003e \u003cp\u003e17.2 The Semantic Communities Approach 325\u003c\/p\u003e \u003cp\u003e17.2.1 Grid Resource Discovery Using Semantic Communities 325\u003c\/p\u003e \u003cp\u003e17.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 327\u003c\/p\u003e \u003cp\u003e17.3 The P2P Approach 329\u003c\/p\u003e \u003cp\u003e17.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 329\u003c\/p\u003e \u003cp\u003e17.3.2 P2P Protocols for Resource Discovery in the Grid 330\u003c\/p\u003e \u003cp\u003e17.4 The Grid-Routing Transferring Approach 333\u003c\/p\u003e \u003cp\u003e17.4.1 Resource Discovery Based on Matchmaking Routers 333\u003c\/p\u003e \u003cp\u003e17.4.2 Acquiring Knowledge in a Large-Scale Grid System 335\u003c\/p\u003e \u003cp\u003e17.5 Conclusions 337\u003c\/p\u003e \u003cp\u003eAcknowledgment 338\u003c\/p\u003e \u003cp\u003eReferences 338\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart VII Energy Awareness in High-Performance Computing 341\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e18. Energy-Aware Approaches for HPC Systems 343\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eRobert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e18.1 Introduction 344\u003c\/p\u003e \u003cp\u003e18.2 Power Consumption of Servers 345\u003c\/p\u003e \u003cp\u003e18.2.1 Server Modeling 346\u003c\/p\u003e \u003cp\u003e18.2.2 Power Prediction Models 347\u003c\/p\u003e \u003cp\u003e18.3 Classification and Energy Profiles of HPC Applications 354\u003c\/p\u003e \u003cp\u003e18.3.1 Phase Detection 356\u003c\/p\u003e \u003cp\u003e18.3.2 Phase Identification 358\u003c\/p\u003e \u003cp\u003e18.4 Policies and Leverages 359\u003c\/p\u003e \u003cp\u003e18.5 Conclusion 360\u003c\/p\u003e \u003cp\u003eAcknowledgements 361\u003c\/p\u003e \u003cp\u003eReferences 361\u003c\/p\u003e \u003cp\u003e\u003cb\u003e19. Strategies for Increased Energy Awareness in Cloud Federations 365\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eGabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e19.1 Introduction 365\u003c\/p\u003e \u003cp\u003e19.2 Related Work 367\u003c\/p\u003e \u003cp\u003e19.3 Scenarios 369\u003c\/p\u003e \u003cp\u003e19.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 369\u003c\/p\u003e \u003cp\u003e19.3.2 Energy Considerations in Commercial Cloud Federations 372\u003c\/p\u003e \u003cp\u003e19.3.3 Reduced Energy Footprint of Academic Cloud Federations 374\u003c\/p\u003e \u003cp\u003e19.4 Energy-Aware Cloud Federations 374\u003c\/p\u003e \u003cp\u003e19.4.1 Availability of Energy-Consumption-Related Information 375\u003c\/p\u003e \u003cp\u003e19.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 376\u003c\/p\u003e \u003cp\u003e19.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 377\u003c\/p\u003e \u003cp\u003e19.5 Conclusions 379\u003c\/p\u003e \u003cp\u003eAcknowledgments 380\u003c\/p\u003e \u003cp\u003eReferences 380\u003c\/p\u003e \u003cp\u003e\u003cb\u003e20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eOzcan Ozturk and Suleyman Tosun\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e20.1 Introduction 384\u003c\/p\u003e \u003cp\u003e20.2 Related Work 386\u003c\/p\u003e \u003cp\u003e20.3 Overview of Our Approach 387\u003c\/p\u003e \u003cp\u003e20.3.1 Heterogeneous CMP Architecture 387\u003c\/p\u003e \u003cp\u003e20.3.2 Network Security Application Behavior 388\u003c\/p\u003e \u003cp\u003e20.3.3 High-Level View 389\u003c\/p\u003e \u003cp\u003e20.4 Heterogeneous CMP Design for Network Security Processors 390\u003c\/p\u003e \u003cp\u003e20.4.1 Task Assignment 390\u003c\/p\u003e \u003cp\u003e20.4.2 ILP Formulation 391\u003c\/p\u003e \u003cp\u003e20.4.3 Discussion 393\u003c\/p\u003e \u003cp\u003e20.5 Experimental Evaluation 394\u003c\/p\u003e \u003cp\u003e20.5.1 Setup 394\u003c\/p\u003e \u003cp\u003e20.5.2 Results 395\u003c\/p\u003e \u003cp\u003e20.6 Concluding Remarks 397\u003c\/p\u003e \u003cp\u003eAcknowledgments 397\u003c\/p\u003e \u003cp\u003eReferences 397\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart VIII Applications of Heterogeneous High-Performance Computing 401\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eTimo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e21.1 Introduction 404\u003c\/p\u003e \u003cp\u003e21.2 CBIR For Hyperspectral Imaging Data 407\u003c\/p\u003e \u003cp\u003e21.2.1 Spectral Unmixing 407\u003c\/p\u003e \u003cp\u003e21.2.2 Proposed CBIR System 409\u003c\/p\u003e \u003cp\u003e21.3 Jungle Computing 410\u003c\/p\u003e \u003cp\u003e21.3.1 Jungle Computing: Requirements 411\u003c\/p\u003e \u003cp\u003e21.4 IBIS and Constellation 412\u003c\/p\u003e \u003cp\u003e21.5 System Design and Implementation 415\u003c\/p\u003e \u003cp\u003e21.5.1 Endmember Extraction 418\u003c\/p\u003e \u003cp\u003e21.5.2 Query Execution 418\u003c\/p\u003e \u003cp\u003e21.5.3 Equi-Kernels 419\u003c\/p\u003e \u003cp\u003e21.5.4 Matchmaking 420\u003c\/p\u003e \u003cp\u003e21.6 Evaluation 420\u003c\/p\u003e \u003cp\u003e21.6.1 Performance Evaluation 421\u003c\/p\u003e \u003cp\u003e21.7 Conclusions 426\u003c\/p\u003e \u003cp\u003eAcknowledgments 426\u003c\/p\u003e \u003cp\u003eReferences 426\u003c\/p\u003e \u003cp\u003e\u003cb\u003e22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eSidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e22.1 Introduction 430\u003c\/p\u003e \u003cp\u003e22.2 Related Work 431\u003c\/p\u003e \u003cp\u003e22.2.1 Image Processing on GPU 431\u003c\/p\u003e \u003cp\u003e22.2.2 Video Processing on GPU 432\u003c\/p\u003e \u003cp\u003e22.2.3 Contribution 433\u003c\/p\u003e \u003cp\u003e22.3 Parallel Image Processing on GPU 433\u003c\/p\u003e \u003cp\u003e22.3.1 Development Scheme for Image Processing on GPU 433\u003c\/p\u003e \u003cp\u003e22.3.2 GPU Optimization 434\u003c\/p\u003e \u003cp\u003e22.3.3 GPU Implementation of Edge and Corner Detection 434\u003c\/p\u003e \u003cp\u003e22.3.4 Performance Analysis and Evaluation 434\u003c\/p\u003e \u003cp\u003e22.4 Image Processing on Heterogeneous Architectures 437\u003c\/p\u003e \u003cp\u003e22.4.1 Development Scheme for Multiple Image Processing 437\u003c\/p\u003e \u003cp\u003e22.4.2 Task Scheduling within Heterogeneous Architectures 438\u003c\/p\u003e \u003cp\u003e22.4.3 Optimization Within Heterogeneous Architectures 438\u003c\/p\u003e \u003cp\u003e22.5 Video Processing on GPU 438\u003c\/p\u003e \u003cp\u003e22.5.1 Development Scheme for Video Processing on GPU 439\u003c\/p\u003e \u003cp\u003e22.5.2 GPU Optimizations 440\u003c\/p\u003e \u003cp\u003e22.5.3 GPU Implementations 440\u003c\/p\u003e \u003cp\u003e22.5.4 GPU-Based Silhouette Extraction 440\u003c\/p\u003e \u003cp\u003e22.5.5 GPU-Based Optical Flow Estimation 440\u003c\/p\u003e \u003cp\u003e22.5.6 Result Analysis 443\u003c\/p\u003e \u003cp\u003e22.6 Experimental Results 444\u003c\/p\u003e \u003cp\u003e22.6.1 Heterogeneous Computing for Vertebra Segmentation 444\u003c\/p\u003e \u003cp\u003e22.6.2 GPU Computing for Motion Detection Using a Moving Camera 445\u003c\/p\u003e \u003cp\u003e22.7 Conclusion 447\u003c\/p\u003e \u003cp\u003eAcknowledgment 448\u003c\/p\u003e \u003cp\u003eReferences 448\u003c\/p\u003e \u003cp\u003e\u003cb\u003e23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eJosé Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e23.1 Introduction 452\u003c\/p\u003e \u003cp\u003e23.2 Tomographic Reconstruction 453\u003c\/p\u003e \u003cp\u003e23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455\u003c\/p\u003e \u003cp\u003e23.4 Hybrid CPU + GPU Tomographic Reconstruction 457\u003c\/p\u003e \u003cp\u003e23.5 Results 459\u003c\/p\u003e \u003cp\u003e23.6 Discussion and Conclusion 461\u003c\/p\u003e \u003cp\u003eAcknowledgments 463\u003c\/p\u003e \u003cp\u003eReferences 463\u003c\/p\u003e \u003cp\u003eIndex 467\u003c\/p\u003e","brand":"John Wiley \u0026 Sons Inc","offers":[{"title":"Default Title","offer_id":49406909645143,"sku":"9781118712054","price":92.66,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0817\/1739\/5799\/files\/9781118712054.jpg?v=1730497526","url":"https:\/\/bookcurl.com\/products\/high-performance-computing-9781118712054","provider":"Book Curl","version":"1.0","type":"link"}