{"product_id":"reinforcement-learning-and-stochastic-optimization-9781119815037","title":"Reinforcement Learning and Stochastic","description":"\u003cb\u003eBook Synopsis\u003c\/b\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cb\u003eTable of Contents\u003c\/b\u003e\u003cbr\u003e\u003cp\u003ePreface xxv\u003c\/p\u003e \u003cp\u003eAcknowledgments xxxi\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart I – Introduction \u003c\/b\u003e\u003cb\u003e1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Sequential Decision Problems \u003c\/b\u003e\u003cb\u003e3\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 The Audience 7\u003c\/p\u003e \u003cp\u003e1.2 The Communities of Sequential Decision Problems 8\u003c\/p\u003e \u003cp\u003e1.3 Our Universal Modeling Framework 10\u003c\/p\u003e \u003cp\u003e1.4 Designing Policies for Sequential Decision Problems 15\u003c\/p\u003e \u003cp\u003e1.5 Learning 20\u003c\/p\u003e \u003cp\u003e1.6 Themes 21\u003c\/p\u003e \u003cp\u003e1.7 Our Modeling Approach 27\u003c\/p\u003e \u003cp\u003e1.8 How to Read this Book 27\u003c\/p\u003e \u003cp\u003e1.9 Bibliographic Notes 33\u003c\/p\u003e \u003cp\u003eExercises 34\u003c\/p\u003e \u003cp\u003eBibliography 38\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 Canonical Problems and Applications \u003c\/b\u003e\u003cb\u003e39\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Canonical Problems 39\u003c\/p\u003e \u003cp\u003e2.2 A Universal Modeling Framework for Sequential Decision Problems 64\u003c\/p\u003e \u003cp\u003e2.3 Applications 69\u003c\/p\u003e \u003cp\u003e2.4 Bibliographic Notes 85\u003c\/p\u003e \u003cp\u003eExercises 90\u003c\/p\u003e \u003cp\u003eBibliography 93\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 Online Learning \u003c\/b\u003e\u003cb\u003e101\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Machine Learning for Sequential Decisions 102\u003c\/p\u003e \u003cp\u003e3.2 Adaptive Learning Using Exponential Smoothing 110\u003c\/p\u003e \u003cp\u003e3.3 Lookup Tables with Frequentist Updating 111\u003c\/p\u003e \u003cp\u003e3.4 Lookup Tables with Bayesian Updating 112\u003c\/p\u003e \u003cp\u003e3.5 Computing Bias and Variance* 118\u003c\/p\u003e \u003cp\u003e3.6 Lookup Tables and Aggregation* 121\u003c\/p\u003e \u003cp\u003e3.7 Linear Parametric Models 131\u003c\/p\u003e \u003cp\u003e3.8 Recursive Least Squares for Linear Models 136\u003c\/p\u003e \u003cp\u003e3.9 Nonlinear Parametric Models 140\u003c\/p\u003e \u003cp\u003e3.10 Nonparametric Models* 149\u003c\/p\u003e \u003cp\u003e3.11 Nonstationary Learning* 159\u003c\/p\u003e \u003cp\u003e3.12 The Curse of Dimensionality 162\u003c\/p\u003e \u003cp\u003e3.13 Designing Approximation Architectures in Adaptive Learning 165\u003c\/p\u003e \u003cp\u003e3.14 Why Does It Work?** 166\u003c\/p\u003e \u003cp\u003e3.15 Bibliographic Notes 174\u003c\/p\u003e \u003cp\u003eExercises 176\u003c\/p\u003e \u003cp\u003eBibliography 180\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Introduction to Stochastic Search \u003c\/b\u003e\u003cb\u003e183\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 Illustrations of the Basic Stochastic Optimization Problem 185\u003c\/p\u003e \u003cp\u003e4.2 Deterministic Methods 188\u003c\/p\u003e \u003cp\u003e4.3 Sampled Models 193\u003c\/p\u003e \u003cp\u003e4.4 Adaptive Learning Algorithms 202\u003c\/p\u003e \u003cp\u003e4.5 Closing Remarks 210\u003c\/p\u003e \u003cp\u003e4.6 Bibliographic Notes 210\u003c\/p\u003e \u003cp\u003eExercises 212\u003c\/p\u003e \u003cp\u003eBibliography 218\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart II – Stochastic Search \u003c\/b\u003e\u003cb\u003e221\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 Derivative-Based Stochastic Search \u003c\/b\u003e\u003cb\u003e223\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 Some Sample Applications 225\u003c\/p\u003e \u003cp\u003e5.2 Modeling Uncertainty 228\u003c\/p\u003e \u003cp\u003e5.3 Stochastic Gradient Methods 231\u003c\/p\u003e \u003cp\u003e5.4 Styles of Gradients 237\u003c\/p\u003e \u003cp\u003e5.5 Parameter Optimization for Neural Networks* 242\u003c\/p\u003e \u003cp\u003e5.6 Stochastic Gradient Algorithm as a Sequential Decision Problem 247\u003c\/p\u003e \u003cp\u003e5.7 Empirical Issues 248\u003c\/p\u003e \u003cp\u003e5.8 Transient Problems* 249\u003c\/p\u003e \u003cp\u003e5.9 Theoretical Performance* 250\u003c\/p\u003e \u003cp\u003e5.10 Why Does it Work? 250\u003c\/p\u003e \u003cp\u003e5.11 Bibliographic Notes 263\u003c\/p\u003e \u003cp\u003eExercises 264\u003c\/p\u003e \u003cp\u003eBibliography 270\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Stepsize Policies \u003c\/b\u003e\u003cb\u003e273\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Deterministic Stepsize Policies 276\u003c\/p\u003e \u003cp\u003e6.2 Adaptive Stepsize Policies 282\u003c\/p\u003e \u003cp\u003e6.3 Optimal Stepsize Policies* 289\u003c\/p\u003e \u003cp\u003e6.4 Optimal Step sizes for Approximate Value Iteration* 297\u003c\/p\u003e \u003cp\u003e6.5 Convergence 300\u003c\/p\u003e \u003cp\u003e6.6 Guidelines for Choosing Stepsize Policies 301\u003c\/p\u003e \u003cp\u003e6.7 Why Does it Work* 303\u003c\/p\u003e \u003cp\u003e6.8 Bibliographic Notes 306\u003c\/p\u003e \u003cp\u003eExercises 307\u003c\/p\u003e \u003cp\u003eBibliography 314\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 Derivative-Free Stochastic Search \u003c\/b\u003e\u003cb\u003e317\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Overview of Derivative-free Stochastic Search 319\u003c\/p\u003e \u003cp\u003e7.2 Modeling Derivative-free Stochastic Search 325\u003c\/p\u003e \u003cp\u003e7.3 Designing Policies 330\u003c\/p\u003e \u003cp\u003e7.4 Policy Function Approximations 333\u003c\/p\u003e \u003cp\u003e7.5 Cost Function Approximations 335\u003c\/p\u003e \u003cp\u003e7.6 VFA-based Policies 338\u003c\/p\u003e \u003cp\u003e7.7 Direct Lookahead Policies 348\u003c\/p\u003e \u003cp\u003e7.8 The Knowledge Gradient (Continued)* 362\u003c\/p\u003e \u003cp\u003e7.9 Learning in Batches 380\u003c\/p\u003e \u003cp\u003e7.10 Simulation Optimization* 382\u003c\/p\u003e \u003cp\u003e7.11 Evaluating Policies 385\u003c\/p\u003e \u003cp\u003e7.12 Designing Policies 394\u003c\/p\u003e \u003cp\u003e7.13 Extensions* 398\u003c\/p\u003e \u003cp\u003e7.14 Bibliographic Notes 409\u003c\/p\u003e \u003cp\u003eExercises 412\u003c\/p\u003e \u003cp\u003eBibliography 424\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart III – State-dependent Problems \u003c\/b\u003e\u003cb\u003e429\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 State-dependent Problems \u003c\/b\u003e\u003cb\u003e431\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 Graph Problems 433\u003c\/p\u003e \u003cp\u003e8.2 Inventory Problems 439\u003c\/p\u003e \u003cp\u003e8.3 Complex Resource Allocation Problems 446\u003c\/p\u003e \u003cp\u003e8.4 State-dependent Learning Problems 456\u003c\/p\u003e \u003cp\u003e8.5 A Sequence of Problem Classes 460\u003c\/p\u003e \u003cp\u003e8.6 Bibliographic Notes 461\u003c\/p\u003e \u003cp\u003eExercises 462\u003c\/p\u003e \u003cp\u003eBibliography 466\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Modeling Sequential Decision Problems \u003c\/b\u003e\u003cb\u003e467\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 A Simple Modeling Illustration 471\u003c\/p\u003e \u003cp\u003e9.2 Notational Style 476\u003c\/p\u003e \u003cp\u003e9.3 Modeling Time 478\u003c\/p\u003e \u003cp\u003e9.4 The States of Our System 481\u003c\/p\u003e \u003cp\u003e9.5 Modeling Decisions 500\u003c\/p\u003e \u003cp\u003e9.6 The Exogenous Information Process 506\u003c\/p\u003e \u003cp\u003e9.7 The Transition Function 515\u003c\/p\u003e \u003cp\u003e9.8 The Objective Function 518\u003c\/p\u003e \u003cp\u003e9.9 Illustration: An Energy Storage Model 523\u003c\/p\u003e \u003cp\u003e9.10 Base Models and Lookahead Models 528\u003c\/p\u003e \u003cp\u003e9.11 A Classification of Problems* 529\u003c\/p\u003e \u003cp\u003e9.12 Policy Evaluation* 532\u003c\/p\u003e \u003cp\u003e9.13 Advanced Probabilistic Modeling Concepts** 534\u003c\/p\u003e \u003cp\u003e9.14 Looking Forward 540\u003c\/p\u003e \u003cp\u003e9.15 Bibliographic Notes 542\u003c\/p\u003e \u003cp\u003eExercises 544\u003c\/p\u003e \u003cp\u003eBibliography 557\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Uncertainty Modeling \u003c\/b\u003e\u003cb\u003e559\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 Sources of Uncertainty 560\u003c\/p\u003e \u003cp\u003e10.2 A Modeling Case Study: The COVID Pandemic 575\u003c\/p\u003e \u003cp\u003e10.3 Stochastic Modeling 575\u003c\/p\u003e \u003cp\u003e10.4 Monte Carlo Simulation 581\u003c\/p\u003e \u003cp\u003e10.5 Case Study: Modeling Electricity Prices 589\u003c\/p\u003e \u003cp\u003e10.6 Sampling vs. Sampled Models 595\u003c\/p\u003e \u003cp\u003e10.7 Closing Notes 597\u003c\/p\u003e \u003cp\u003e10.8 Bibliographic Notes 597\u003c\/p\u003e \u003cp\u003eExercises 598\u003c\/p\u003e \u003cp\u003eBibliography 601\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Designing Policies \u003c\/b\u003e\u003cb\u003e603\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e11.1 From Optimization to Machine Learning to Sequential Decision Problems 605\u003c\/p\u003e \u003cp\u003e11.2 The Classes of Policies 606\u003c\/p\u003e \u003cp\u003e11.3 Policy Function Approximations 610\u003c\/p\u003e \u003cp\u003e11.4 Cost Function Approximations 613\u003c\/p\u003e \u003cp\u003e11.5 Value Function Approximations 614\u003c\/p\u003e \u003cp\u003e11.6 Direct Lookahead Approximations 616\u003c\/p\u003e \u003cp\u003e11.7 Hybrid Strategies 620\u003c\/p\u003e \u003cp\u003e11.8 Randomized Policies 626\u003c\/p\u003e \u003cp\u003e11.9 Illustration: An Energy Storage Model Revisited 627\u003c\/p\u003e \u003cp\u003e11.10 Choosing the Policy Class 631\u003c\/p\u003e \u003cp\u003e11.11 Policy Evaluation 641\u003c\/p\u003e \u003cp\u003e11.12 Parameter Tuning 642\u003c\/p\u003e \u003cp\u003e11.13 Bibliographic Notes 646\u003c\/p\u003e \u003cp\u003eExercises 646\u003c\/p\u003e \u003cp\u003eBibliography 651\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart IV – Policy Search \u003c\/b\u003e\u003cb\u003e653\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Policy Function Approximations and Policy Search \u003c\/b\u003e\u003cb\u003e655\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e12.1 Policy Search as a Sequential Decision Problem 657\u003c\/p\u003e \u003cp\u003e12.2 Classes of Policy Function Approximations 658\u003c\/p\u003e \u003cp\u003e12.3 Problem Characteristics 665\u003c\/p\u003e \u003cp\u003e12.4 Flavors of Policy Search 666\u003c\/p\u003e \u003cp\u003e12.5 Policy Search with Numerical Derivatives 669\u003c\/p\u003e \u003cp\u003e12.6 Derivative-Free Methods for Policy Search 670\u003c\/p\u003e \u003cp\u003e12.7 Exact Derivatives for Continuous Sequential Problems* 677\u003c\/p\u003e \u003cp\u003e12.8 Exact Derivatives for Discrete Dynamic Programs** 680\u003c\/p\u003e \u003cp\u003e12.9 Supervised Learning 686\u003c\/p\u003e \u003cp\u003e12.10 Why Does it Work? 687\u003c\/p\u003e \u003cp\u003e12.11 Bibliographic Notes 690\u003c\/p\u003e \u003cp\u003eExercises 691\u003c\/p\u003e \u003cp\u003eBibliography 698\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Cost Function Approximations \u003c\/b\u003e\u003cb\u003e701\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e13.1 General Formulation for Parametric CFA 703\u003c\/p\u003e \u003cp\u003e13.2 Objective-Modified CFAs 704\u003c\/p\u003e \u003cp\u003e13.3 Constraint-Modified CFAs 714\u003c\/p\u003e \u003cp\u003e13.4 Bibliographic Notes 725\u003c\/p\u003e \u003cp\u003eExercises 726\u003c\/p\u003e \u003cp\u003eBibliography 729\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart V – Lookahead Policies \u003c\/b\u003e\u003cb\u003e731\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Exact Dynamic Programming \u003c\/b\u003e\u003cb\u003e737\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e14.1 Discrete Dynamic Programming 738\u003c\/p\u003e \u003cp\u003e14.2 The Optimality Equations 740\u003c\/p\u003e \u003cp\u003e14.3 Finite Horizon Problems 747\u003c\/p\u003e \u003cp\u003e14.4 Continuous Problems with Exact Solutions 750\u003c\/p\u003e \u003cp\u003e14.5 Infinite Horizon Problems* 755\u003c\/p\u003e \u003cp\u003e14.6 Value Iteration for Infinite Horizon Problems* 757\u003c\/p\u003e \u003cp\u003e14.7 Policy Iteration for Infinite Horizon Problems* 762\u003c\/p\u003e \u003cp\u003e14.8 Hybrid Value-Policy Iteration* 764\u003c\/p\u003e \u003cp\u003e14.9 Average Reward Dynamic Programming* 765\u003c\/p\u003e \u003cp\u003e14.10 The Linear Programming Method for Dynamic Programs** 766\u003c\/p\u003e \u003cp\u003e14.11 Linear Quadratic Regulation 767\u003c\/p\u003e \u003cp\u003e14.12 Why Does it Work?** 770\u003c\/p\u003e \u003cp\u003e14.13 Bibliographic Notes 783\u003c\/p\u003e \u003cp\u003eExercises 783\u003c\/p\u003e \u003cp\u003eBibliography 793\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Backward Approximate Dynamic Programming \u003c\/b\u003e\u003cb\u003e795\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e15.1 Backward Approximate Dynamic Programming for Finite Horizon Problems 797\u003c\/p\u003e \u003cp\u003e15.2 Fitted Value Iteration for Infinite Horizon Problems 804\u003c\/p\u003e \u003cp\u003e15.3 Value Function Approximation Strategies 805\u003c\/p\u003e \u003cp\u003e15.4 Computational Observations 810\u003c\/p\u003e \u003cp\u003e15.5 Bibliographic Notes 816\u003c\/p\u003e \u003cp\u003eExercises 816\u003c\/p\u003e \u003cp\u003eBibliography 821\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Forward ADP I: The Value of a Policy \u003c\/b\u003e\u003cb\u003e823\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e16.1 Sampling the Value of a Policy 824\u003c\/p\u003e \u003cp\u003e16.2 Stochastic Approximation Methods 835\u003c\/p\u003e \u003cp\u003e16.3 Bellman’s Equation Using a Linear Model* 837\u003c\/p\u003e \u003cp\u003e16.4 Analysis of TD(0), LSTD, and LSPE Using a Single State* 842\u003c\/p\u003e \u003cp\u003e16.5 Gradient-based Methods for Approximate Value Iteration* 845\u003c\/p\u003e \u003cp\u003e16.6 Value Function Approximations Based on Bayesian Learning* 852\u003c\/p\u003e \u003cp\u003e16.7 Learning Algorithms and Atepsizes 855\u003c\/p\u003e \u003cp\u003e16.8 Bibliographic Notes 860\u003c\/p\u003e \u003cp\u003eExercises 862\u003c\/p\u003e \u003cp\u003eBibliography 864\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Forward ADP II: Policy Optimization \u003c\/b\u003e\u003cb\u003e867\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e17.1 Overview of Algorithmic Strategies 869\u003c\/p\u003e \u003cp\u003e17.2 Approximate Value Iteration and \u003ci\u003eQ\u003c\/i\u003e-Learning Using Lookup Tables 871\u003c\/p\u003e \u003cp\u003e17.3 Styles of Learning 881\u003c\/p\u003e \u003cp\u003e17.4 Approximate Value Iteration Using Linear Models 886\u003c\/p\u003e \u003cp\u003e17.5 On-policy vs. off-policy learning and the exploration–exploitation problem 888\u003c\/p\u003e \u003cp\u003e17.6 Applications 894\u003c\/p\u003e \u003cp\u003e17.7 Approximate Policy Iteration 900\u003c\/p\u003e \u003cp\u003e17.8 The Actor–Critic Paradigm 907\u003c\/p\u003e \u003cp\u003e17.9 Statistical Bias in the Max Operator* 909\u003c\/p\u003e \u003cp\u003e17.10 The Linear Programming Method Using Linear Models* 912\u003c\/p\u003e \u003cp\u003e17.11 Finite Horizon Approximations for Steady-State Applications 915\u003c\/p\u003e \u003cp\u003e17.12 Bibliographic Notes 917\u003c\/p\u003e \u003cp\u003eExercises 918\u003c\/p\u003e \u003cp\u003eBibliography 924\u003c\/p\u003e \u003cp\u003e\u003cb\u003e18 Forward ADP III: Convex Resource Allocation Problems \u003c\/b\u003e\u003cb\u003e927\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e18.1 Resource Allocation Problems 930\u003c\/p\u003e \u003cp\u003e18.2 Values Versus Marginal Values 937\u003c\/p\u003e \u003cp\u003e18.3 Piecewise Linear Approximations for Scalar Functions 938\u003c\/p\u003e \u003cp\u003e18.4 Regression Methods 941\u003c\/p\u003e \u003cp\u003e18.5 Separable Piecewise Linear Approximations 944\u003c\/p\u003e \u003cp\u003e18.6 Benders Decomposition for Nonseparable Approximations** 946\u003c\/p\u003e \u003cp\u003e18.7 Linear Approximations for High-Dimensional Applications 956\u003c\/p\u003e \u003cp\u003e18.8 Resource Allocation with Exogenous Information State 958\u003c\/p\u003e \u003cp\u003e18.9 Closing Notes 959\u003c\/p\u003e \u003cp\u003e18.10 Bibliographic Notes 960\u003c\/p\u003e \u003cp\u003eExercises 962\u003c\/p\u003e \u003cp\u003eBibliography 967\u003c\/p\u003e \u003cp\u003e\u003cb\u003e19 Direct Lookahead Policies \u003c\/b\u003e\u003cb\u003e971\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e19.1 Optimal Policies Using Lookahead Models 974\u003c\/p\u003e \u003cp\u003e19.2 Creating an Approximate Lookahead Model 978\u003c\/p\u003e \u003cp\u003e19.3 Modified Objectives in Lookahead Models 985\u003c\/p\u003e \u003cp\u003e19.4 Evaluating DLA Policies 992\u003c\/p\u003e \u003cp\u003e19.5 Why Use a DLA? 997\u003c\/p\u003e \u003cp\u003e19.6 Deterministic Lookaheads 999\u003c\/p\u003e \u003cp\u003e19.7 A Tour of Stochastic Lookahead Policies 1005\u003c\/p\u003e \u003cp\u003e19.8 Monte Carlo Tree Search for Discrete Decisions 1009\u003c\/p\u003e \u003cp\u003e19.9 Two-Stage Stochastic Programming for Vector Decisions* 1018\u003c\/p\u003e \u003cp\u003e19.10 Observations on DLA Policies 1024\u003c\/p\u003e \u003cp\u003e19.11 Bibliographic Notes 1025\u003c\/p\u003e \u003cp\u003eExercises 1027\u003c\/p\u003e \u003cp\u003eBibliography 1031\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart VI – Multiagent Systems \u003c\/b\u003e\u003cb\u003e1033\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e20 Multiagent Modeling and Learning \u003c\/b\u003e\u003cb\u003e1035\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e20.1 Overview of Multiagent Systems 1036\u003c\/p\u003e \u003cp\u003e20.2 A Learning Problem – Flu Mitigation 1044\u003c\/p\u003e \u003cp\u003e20.3 The POMDP Perspective* 1059\u003c\/p\u003e \u003cp\u003e20.4 The Two-Agent Newsvendor Problem 1062\u003c\/p\u003e \u003cp\u003e20.5 Multiple Independent Agents – An HVAC Controller Model 1067\u003c\/p\u003e \u003cp\u003e20.6 Cooperative Agents – A Spatially Distributed Blood Management Problem 1070\u003c\/p\u003e \u003cp\u003e20.7 Closing Notes 1074\u003c\/p\u003e \u003cp\u003e20.8 Why Does it Work? 1074\u003c\/p\u003e \u003cp\u003e20.9 Bibliographic Notes 1076\u003c\/p\u003e \u003cp\u003eExercises 1077\u003c\/p\u003e \u003cp\u003eBibliography 1083\u003c\/p\u003e \u003cp\u003eIndex 1085\u003c\/p\u003e","brand":"John Wiley \u0026 Sons Inc","offers":[{"title":"Default Title","offer_id":48866417705303,"sku":"9781119815037","price":108.86,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0817\/1739\/5799\/files\/9781119815037.jpg?v=1722278547","url":"https:\/\/bookcurl.com\/products\/reinforcement-learning-and-stochastic-optimization-9781119815037","provider":"Book Curl","version":"1.0","type":"link"}