This book provides a unified approach for the study of constrained Markov decision processes with a finite state space and unbounded costs. Unlike the single controller case considered in many other books, the author considers a single controller with several objectives, such as minimizing delays and loss, probabilities, and maximization of throughputs. It is desirable to design a controller that minimizes one cost objective, subject to inequality constraints on other cost objectives. This framework describes dynamic decision problems arising frequently in many engineering fields. A thorough overview of these applications is presented in the introduction.
The book is then divided into three sections that build upon each other.
The first part explains the theory for the finite state space. The author characterizes the set of achievable expected occupation measures as well as performance vectors, and identifies simple classes of policies among which optimal policies exist. This allows the reduction of the original dynamic into a linear program. A Lagranian approach is then used to derive the dual linear program using dynamic programming techniques.
In the second part, these results are extended to the infinite state space and action spaces. The author provides two frameworks: the case where costs are bounded below and the contracting framework.
The third part builds upon the results of the first two parts and examines asymptotical results of the convergence of both the value and the policies in the time horizon and in the discount factor. Finally, several state truncation algorithms that enable the approximation of the solution of the original control problem via finite linear programs are given.
INTRODUCTION Examples of Constrained Dynamic Control Problems On Solution Approaches for CMDPs with Expected Costs Other Types of CMDPs Cost Criteria and Assumptions The Convex Analytical Approach and Occupation Measures Linear Programming and Lagrangian Approach for CMDPs About the Methodology The Structure of the Book PART ONE: FINITE MDPS MARKOV DECISION PROCESSES The Model Cost Criteria and the Constrained Problem Some Notation The Dominance of Markov Policies THE DISCOUNTED COST Occupation Measure and the Primal LP Dynamic Programming and Dual LP: the Unconstrained Case Constrained Control: Lagrangian Approach The Dual LP Number of Randomizations THE EXPECTED AVERAGE COST Occupation Measure and the Primal LP Equivalent Linear Program The Dual Program Number of Randomizations FLOW AND SERVICE CONTROL IN A SINGLE-SERVER QUEUE The Model The Lagrangian The Original Constrained Problem Structure of Randomization and Implementation Issues On Coordination Between Controllers Open Questions PART TWO: INFINITE MDPS MDPS WITH INFINITE STATE AND ACTION SPACES The Model Cost Criteria Mixed Policies, and Topologic Structures The Dominance of Markov Policies Aggregation of States Extra Randomization in the Policies Equivalent Quasi-Markov Model and Quasi-Markov Policies THE TOTAL COST: CLASSIFICATION OF MDPS Transient and Absorbing MDPs MDPs With Uniform Lyapunov Functions Equivalence of MDP With Unbounded and bounded costs Properties of MDPs With Uniform Lyapunov Functions Properties for Fixed Initial Distribution Examples of Uniform Lyapunov Functions Contracting MDPs THE TOTAL COST: OCCUPATION MEASURES AND THE PRIMAL LP Occupation Measure Continuity of Occupation Measures More Properties of MDPs Characterization of Achievable Sets of Occupation Measure Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program THE TOTAL COST: DYNAMIC AND LINEAR PROGRAMMING Non-Constrained Control: Dynamic and Linear Programming Superharmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP State Truncation A Second LP Approach for Optimal Mixed Policies More on Unbound Costs THE DISCOUNTED COST The Equivalent Total Cost Model Occupation Measure and LP Non-negative Immediate Cost Weak Contracting Assumptions and Lyapunov Functions Example: Flow and Service Control THE EXPECTED AVERAGE COST Occupation Measures Completeness Properties of Stationary Policies Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program The Contracting Framework Other Conditions for the Uniform Integrability The Case of Uniform Lyapunov Conditions EXPECTED AVERAGE COST: DYNAMIC PROGRAMMING AND LP The Non-Constrained Case: Optimality Inequality Non-Constrained Control: Cost Bounded Below Dynamic Programming and Uniform Lyapunov Function Super-Harmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP A Second LP Approach for Optimal Mixed Policies PART THREE: ASYMPTOTIC METHODS AND APPROXIMATIONS SENSITIVITY ANALYSIS Introduction Approximation of the Values Approximation and Robustness of the Policies CONVERGENCE OF DISCOUNTED CONSTRAINED MDPS Convergence in the Discount Factor Convergence to the Expected Average Cost The Case of Uniform Lyapunov Function CONVERGENCE AS THE HORIZON TENDS TO INFINITY The Discounted Cost The Expected Average Cost: Stationary Policies The Expected Average Cost: General Policies STATE TRUNCATION AND APPROXIMATION The Approximating sets of States Scheme I: the Total Cost Scheme II: the Total Cost Scheme III: the Total Cost The Expected Average Cost Infinite MDPs: on the Number of Randomizations APPENDIX: CONVERGENCE OF PROBABILITY MEASURES REFERENCES LIST OF SYMBOLS AND NOTATION INDEX