简介:Weconsidertheclassicalpolicyiterationmethodofdynamicprogramming(DP),whereapproximationsandsimulationareusedtodealwiththecurseofdimensionality.Wesurveyanumberofissues:convergenceandrateofconvergenceofapproximatepolicyevaluationmethods,singularityandsusceptibilitytosimulationnoiseofpolicyevaluation,explorationissues,constrainedandenhancedpolicyiteration,policyoscillationandchattering,andoptimisticanddistributedpolicyiteration.Ourdiscussionofpolicyeva...