{"product_id":"multiagent-coordination-a-reinforcement-learning-approach-wiley-ieee-9781119699033","title":"MultiAgent Coordination A Reinforcement Learning","description":"\u003cb\u003eBook Synopsis\u003c\/b\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cb\u003eTable of Contents\u003c\/b\u003e\u003cbr\u003e\u003cp\u003ePreface xi\u003c\/p\u003e \u003cp\u003eAcknowledgments xix\u003c\/p\u003e \u003cp\u003eAbout the Authors xxi\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction: Multi-agent Coordination by Reinforcement Learning and Evolutionary Algorithms \u003c\/b\u003e\u003cb\u003e1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 Introduction 2\u003c\/p\u003e \u003cp\u003e1.2 Single Agent Planning 4\u003c\/p\u003e \u003cp\u003e1.2.1 Terminologies Used in Single Agent Planning 4\u003c\/p\u003e \u003cp\u003e1.2.2 Single Agent Search-Based Planning Algorithms 10\u003c\/p\u003e \u003cp\u003e1.2.2.1 Dijkstra’s Algorithm 10\u003c\/p\u003e \u003cp\u003e1.2.2.2 A∗ (A-star) Algorithm 11\u003c\/p\u003e \u003cp\u003e1.2.2.3 D∗ (D-star) Algorithm 15\u003c\/p\u003e \u003cp\u003e1.2.2.4 Planning by STRIPS-Like Language 15\u003c\/p\u003e \u003cp\u003e1.2.3 Single Agent RL 17\u003c\/p\u003e \u003cp\u003e1.2.3.1 Multiarmed Bandit Problem 17\u003c\/p\u003e \u003cp\u003e1.2.3.2 DP and Bellman Equation 20\u003c\/p\u003e \u003cp\u003e1.2.3.3 Correlation Between RL and DP 21\u003c\/p\u003e \u003cp\u003e1.2.3.4 Single Agent Q-Learning 21\u003c\/p\u003e \u003cp\u003e1.2.3.5 Single Agent Planning Using Q-Learning 24\u003c\/p\u003e \u003cp\u003e1.3 Multi-agent Planning and Coordination 25\u003c\/p\u003e \u003cp\u003e1.3.1 Terminologies Related to Multi-agent Coordination 25\u003c\/p\u003e \u003cp\u003e1.3.2 Classification of MAS 26\u003c\/p\u003e \u003cp\u003e1.3.3 Game Theory for Multi-agent Coordination 28\u003c\/p\u003e \u003cp\u003e1.3.3.1 Nash Equilibrium 31\u003c\/p\u003e \u003cp\u003e1.3.3.2 Correlated Equilibrium 36\u003c\/p\u003e \u003cp\u003e1.3.3.3 Static Game Examples 38\u003c\/p\u003e \u003cp\u003e1.3.4 Correlation Among RL, DP, and GT 40\u003c\/p\u003e \u003cp\u003e1.3.5 Classification of MARL 40\u003c\/p\u003e \u003cp\u003e1.3.5.1 Cooperative MARL 42\u003c\/p\u003e \u003cp\u003e1.3.5.2 Competitive MARL 56\u003c\/p\u003e \u003cp\u003e1.3.5.3 Mixed MARL 59\u003c\/p\u003e \u003cp\u003e1.3.6 Coordination and Planning by MAQL 84\u003c\/p\u003e \u003cp\u003e1.3.7 Performance Analysis of MAQL and MAQL-Based Coordination 85\u003c\/p\u003e \u003cp\u003e1.4 Coordination by Optimization Algorithm 87\u003c\/p\u003e \u003cp\u003e1.4.1 PSO Algorithm 88\u003c\/p\u003e \u003cp\u003e1.4.2 Firefly Algorithm 91\u003c\/p\u003e \u003cp\u003e1.4.2.1 Initialization 92\u003c\/p\u003e \u003cp\u003e1.4.2.2 Attraction to Brighter Fireflies 92\u003c\/p\u003e \u003cp\u003e1.4.2.3 Movement of Fireflies 93\u003c\/p\u003e \u003cp\u003e1.4.3 Imperialist Competitive Algorithm 93\u003c\/p\u003e \u003cp\u003e1.4.3.1 Initialization 94\u003c\/p\u003e \u003cp\u003e1.4.3.2 Selection of Imperialists and Colonies 95\u003c\/p\u003e \u003cp\u003e1.4.3.3 Formation of Empires 95\u003c\/p\u003e \u003cp\u003e1.4.3.4 Assimilation of Colonies 96\u003c\/p\u003e \u003cp\u003e1.4.3.5 Revolution 96\u003c\/p\u003e \u003cp\u003e1.4.3.6 Imperialistic Competition 97\u003c\/p\u003e \u003cp\u003e1.4.4 Differential Evolution Algorithm 98\u003c\/p\u003e \u003cp\u003e1.4.4.1 Initialization 99\u003c\/p\u003e \u003cp\u003e1.4.4.2 Mutation 99\u003c\/p\u003e \u003cp\u003e1.4.4.3 Recombination 99\u003c\/p\u003e \u003cp\u003e1.4.4.4 Selection 99\u003c\/p\u003e \u003cp\u003e1.4.5 Off-line Optimization 99\u003c\/p\u003e \u003cp\u003e1.4.6 Performance Analysis of Optimization Algorithms 99\u003c\/p\u003e \u003cp\u003e1.4.6.1 Friedman Test 100\u003c\/p\u003e \u003cp\u003e1.4.6.2 Iman–Davenport Test 100\u003c\/p\u003e \u003cp\u003e1.5 Summary 101\u003c\/p\u003e \u003cp\u003eReferences 101\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 Improve Convergence Speed of Multi-Agent Q-Learning for Cooperative Task Planning \u003c\/b\u003e\u003cb\u003e111\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Introduction 112\u003c\/p\u003e \u003cp\u003e2.2 Literature Review 116\u003c\/p\u003e \u003cp\u003e2.3 Preliminaries 118\u003c\/p\u003e \u003cp\u003e2.3.1 Single Agent Q-learning 119\u003c\/p\u003e \u003cp\u003e2.3.2 Multi-agent Q-learning 119\u003c\/p\u003e \u003cp\u003e2.4 Proposed MAQL 123\u003c\/p\u003e \u003cp\u003e2.4.1 Two Useful Properties 124\u003c\/p\u003e \u003cp\u003e2.5 Proposed FCMQL Algorithms and Their Convergence Analysis 128\u003c\/p\u003e \u003cp\u003e2.5.1 Proposed FCMQL Algorithms 129\u003c\/p\u003e \u003cp\u003e2.5.2 Convergence Analysis of the Proposed FCMQL Algorithms 130\u003c\/p\u003e \u003cp\u003e2.6 FCMQL-Based Cooperative Multi-agent Planning 131\u003c\/p\u003e \u003cp\u003e2.7 Experiments and Results 134\u003c\/p\u003e \u003cp\u003e2.8 Conclusions 141\u003c\/p\u003e \u003cp\u003e2.9 Summary 143\u003c\/p\u003e \u003cp\u003e2.A More Details on Experimental Results 144\u003c\/p\u003e \u003cp\u003e2.A.1 Additional Details of Experiment 2.1 144\u003c\/p\u003e \u003cp\u003e2.A.2 Additional Details of Experiment 2.2 159\u003c\/p\u003e \u003cp\u003e2.A.3 Additional Details of Experiment 2.4 161\u003c\/p\u003e \u003cp\u003eReferences 162\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 Consensus Q-Learning for Multi-agent Cooperative Planning \u003c\/b\u003e\u003cb\u003e167\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Introduction 167\u003c\/p\u003e \u003cp\u003e3.2 Preliminaries 169\u003c\/p\u003e \u003cp\u003e3.2.1 Single Agent Q-Learning 169\u003c\/p\u003e \u003cp\u003e3.2.2 Equilibrium-Based Multi-agent Q-Learning 170\u003c\/p\u003e \u003cp\u003e3.3 Consensus 171\u003c\/p\u003e \u003cp\u003e3.4 Proposed CoQL and Planning 173\u003c\/p\u003e \u003cp\u003e3.4.1 Consensus Q-Learning 173\u003c\/p\u003e \u003cp\u003e3.4.2 Consensus-Based Multi-robot Planning 175\u003c\/p\u003e \u003cp\u003e3.5 Experiments and Results 176\u003c\/p\u003e \u003cp\u003e3.5.1 Experimental Setup 176\u003c\/p\u003e \u003cp\u003e3.5.2 Experiments for CoQL 177\u003c\/p\u003e \u003cp\u003e3.5.3 Experiments for Consensus-Based Planning 177\u003c\/p\u003e \u003cp\u003e3.6 Conclusions 179\u003c\/p\u003e \u003cp\u003e3.7 Summary 180\u003c\/p\u003e \u003cp\u003eReferences 180\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 An Efficient Computing of Correlated Equilibrium for Cooperative Q-Learning-Based Multi-Robot Planning \u003c\/b\u003e\u003cb\u003e183\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 Introduction 183\u003c\/p\u003e \u003cp\u003e4.2 Single-Agent Q-Learning and Equilibrium-Based MAQL 186\u003c\/p\u003e \u003cp\u003e4.2.1 Single Agent Q-Learning 187\u003c\/p\u003e \u003cp\u003e4.2.2 Equilibrium-Based MAQL 187\u003c\/p\u003e \u003cp\u003e4.3 Proposed Cooperative MAQL and Planning 188\u003c\/p\u003e \u003cp\u003e4.3.1 Proposed Schemes with Their Applicability 189\u003c\/p\u003e \u003cp\u003e4.3.2 Immediate Rewards in Scheme-I and -II 190\u003c\/p\u003e \u003cp\u003e4.3.3 Scheme-I-Induced MAQL 190\u003c\/p\u003e \u003cp\u003e4.3.4 Scheme-II-Induced MAQL 193\u003c\/p\u003e \u003cp\u003e4.3.5 Algorithms for Scheme-I and II 200\u003c\/p\u003e \u003cp\u003e4.3.6 Constraint ΩQL-I\/ΩQL-II(CΩQL-I\/CΩQL-II) 201\u003c\/p\u003e \u003cp\u003e4.3.7 Convergence 201\u003c\/p\u003e \u003cp\u003e4.3.8 Multi-agent Planning 207\u003c\/p\u003e \u003cp\u003e4.4 Complexity Analysis 209\u003c\/p\u003e \u003cp\u003e4.4.1 Complexity of CQL 210\u003c\/p\u003e \u003cp\u003e4.4.1.1 Space Complexity 210\u003c\/p\u003e \u003cp\u003e4.4.1.2 Time Complexity 210\u003c\/p\u003e \u003cp\u003e4.4.2 Complexity of the Proposed Algorithms 210\u003c\/p\u003e \u003cp\u003e4.4.2.1 Space Complexity 211\u003c\/p\u003e \u003cp\u003e4.4.2.2 Time Complexity 211\u003c\/p\u003e \u003cp\u003e4.4.3 Complexity Comparison 213\u003c\/p\u003e \u003cp\u003e4.4.3.1 Space Complexity 213\u003c\/p\u003e \u003cp\u003e4.4.3.2 Time Complexity 214\u003c\/p\u003e \u003cp\u003e4.5 Simulation and Experimental Results 215\u003c\/p\u003e \u003cp\u003e4.5.1 Experimental Platform 215\u003c\/p\u003e \u003cp\u003e4.5.1.1 Simulation 215\u003c\/p\u003e \u003cp\u003e4.5.1.2 Hardware 216\u003c\/p\u003e \u003cp\u003e4.5.2 Experimental Approach 217\u003c\/p\u003e \u003cp\u003e4.5.2.1 Learning Phase 217\u003c\/p\u003e \u003cp\u003e4.5.2.2 Planning Phase 217\u003c\/p\u003e \u003cp\u003e4.5.3 Experimental Results 218\u003c\/p\u003e \u003cp\u003e4.6 Conclusion 226\u003c\/p\u003e \u003cp\u003e4.7 Summary 226\u003c\/p\u003e \u003cp\u003e4.A Supporting Algorithm and Mathematical Analysis 227\u003c\/p\u003e \u003cp\u003eReferences 228\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 A Modified Imperialist Competitive Algorithm for Multi-Robot Stick-Carrying Application \u003c\/b\u003e\u003cb\u003e233\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 Introduction 234\u003c\/p\u003e \u003cp\u003e5.2 Problem Formulation for Multi-Robot Stick-Carrying 239\u003c\/p\u003e \u003cp\u003e5.3 Proposed Hybrid Algorithm 242\u003c\/p\u003e \u003cp\u003e5.3.1 An Overview of ICA 242\u003c\/p\u003e \u003cp\u003e5.3.1.1 Initialization 242\u003c\/p\u003e \u003cp\u003e5.3.1.2 Selection of Imperialists and Colonies 243\u003c\/p\u003e \u003cp\u003e5.3.1.3 Formation of Empires 243\u003c\/p\u003e \u003cp\u003e5.3.1.4 Assimilation of Colonies 244\u003c\/p\u003e \u003cp\u003e5.3.1.5 Revolution 244\u003c\/p\u003e \u003cp\u003e5.3.1.6 Imperialistic Competition 245\u003c\/p\u003e \u003cp\u003e5.4 An Overview of FA 247\u003c\/p\u003e \u003cp\u003e5.4.1 Initialization 247\u003c\/p\u003e \u003cp\u003e5.4.2 Attraction to Brighter Fireflies 247\u003c\/p\u003e \u003cp\u003e5.4.3 Movement of Fireflies 248\u003c\/p\u003e \u003cp\u003e5.5 Proposed ICFA 248\u003c\/p\u003e \u003cp\u003e5.5.1 Assimilation of Colonies 251\u003c\/p\u003e \u003cp\u003e5.5.1.1 Attraction to Powerful Colonies 251\u003c\/p\u003e \u003cp\u003e5.5.1.2 Modification of Empire Behavior 251\u003c\/p\u003e \u003cp\u003e5.5.1.3 Union of Empires 252\u003c\/p\u003e \u003cp\u003e5.6 Simulation Results 254\u003c\/p\u003e \u003cp\u003e5.6.1 Comparative Framework 254\u003c\/p\u003e \u003cp\u003e5.6.2 Parameter Settings 254\u003c\/p\u003e \u003cp\u003e5.6.3 Analysis on Explorative Power of ICFA 254\u003c\/p\u003e \u003cp\u003e5.6.4 Comparison of Quality of the Final Solution 255\u003c\/p\u003e \u003cp\u003e5.6.5 Performance Analysis 258\u003c\/p\u003e \u003cp\u003e5.7 Computer Simulation and Experiment 265\u003c\/p\u003e \u003cp\u003e5.7.1 Average Total Path Deviation (ATPD) 265\u003c\/p\u003e \u003cp\u003e5.7.2 Average Uncovered Target Distance (AUTD) 265\u003c\/p\u003e \u003cp\u003e5.7.3 Experimental Setup in Simulation Environment 265\u003c\/p\u003e \u003cp\u003e5.7.4 Experimental Results in Simulation Environment 266\u003c\/p\u003e \u003cp\u003e5.7.5 Experimental Setup with Khepera Robots 268\u003c\/p\u003e \u003cp\u003e5.7.6 Experimental Results with Khepera Robots 269\u003c\/p\u003e \u003cp\u003e5.8 Conclusion 270\u003c\/p\u003e \u003cp\u003e5.9 Summary 272\u003c\/p\u003e \u003cp\u003e5.A Additional Comparison of ICFA 272\u003c\/p\u003e \u003cp\u003eReferences 275\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Conclusions and Future Directions \u003c\/b\u003e\u003cb\u003e281\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Conclusions 281\u003c\/p\u003e \u003cp\u003e6.2 Future Directions 283\u003c\/p\u003e \u003cp\u003eIndex 285\u003c\/p\u003e","brand":"John Wiley \u0026 Sons Inc","offers":[{"title":"Default Title","offer_id":48738364358999,"sku":"9781119699033","price":98.06,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0817\/1739\/5799\/files\/9781119699033.jpg?v=1723811979","url":"https:\/\/bookcurl.com\/products\/multiagent-coordination-a-reinforcement-learning-approach-wiley-ieee-9781119699033","provider":"Book Curl","version":"1.0","type":"link"}