课程费用

7800.00 /人

课程时长

3

成为教练

课程简介

深度强化学习:原理、算法和应用

目标收益

- 幻灯片算法讲解,结合代码分析
- 深入讲解强化学习各种算法设计、特点和异同
- 结合实际应用举例和和业界趋势分析
- 分析强化学习的演示实现代码

培训对象

1.对增强学习算法原理和应用感兴趣,具有一定编程(Python)和数学基础(线性代数、概率论)的技术人员。
2.对深度学习(deep learning)模型有一定了解为佳

课程内容

环境要求:
- Python 3.5 以上
- GPU:Nvidia GTX 960 以上机器

课程大纲

1. Reinforcement Learning 入门 - Reinforcement Learning 特点
- Reinforcement Learning 案例
- Reinforcement Learning 组成
- Rewards
- Environment
- History and State
- Observation
- Agent: Policy, Value, Model
- 案例:迷宫学习
- Reinforcement Learning 分类
- Value Based
- Policy Based
- Actor Based
- Model Free vs Model Based
- Reinforcement Learning 中的顺序决策 sequential decision making 问题
- Learning and Planning
- 案例:电子游戏 Atari
- Exploration and Exploitation
- Prediction and Control
2. 马尔科夫决策过程 Markov Decision Processes (MDP) - Markov Processes 马尔科夫过程
- Markov Reward Processes 马尔科夫回报过程
- Markov Decision Processes 马尔科夫决策过程
- MDP 扩展
3. 用动态规划做计划 Planning by Dynamic Programming - 策略评估 Policy Evaluation
- 策略迭代 Policy Iteration
- 价值迭代 Value Iteration
- 动态规划扩展 Extension to DP
- 压缩映射 Contraction Mapping
4. 无模型预测 Model-Free Prediction - 蒙特卡罗学习 Monte-Carlo Learning
- 时间差分学习 Temporal-Difference Learning
- TD( λ) 学习
5. 无模型控制 Model-Free Control - 有策略蒙特卡罗控制 On-Policy Monte-Carlo Control
- 有策略时间差分学习 On-Policy Temporal-Difference Learning
- 无策略学习 Off-Policy Learning
6. 价值函数近似 Value Function Approximation - 增量方法 Incremental Methods
- 批量方法 Batch Methods
7. 策略梯度法 Policy Gradient - 有限差分政策梯度 Finite Difference Policy Gradient
- 蒙特卡洛策略梯度 Monte-Carlo Policy Gradient
- AC策略梯度 Actor-Critic Policy Gradient

* Proximal Policy Optimization (PPO)
- the default reinforcement learning algorithm at OpenAI

* On-Policy v.s. Off-policy: Importance Sampling
- Issue of Importance Sampling
- On-Policy -> Off-policy
- Add Constraint

* PPO / TRPO

* Q-Learning
- Critic
- Target Network
- Replay Buffer
- Tips of Q-Learning
- Double DQN
- Dueling DQN
- Prioritized Reply
- Noisy Net
- Distributed Q-function
- Rainbow
- Q-Learning for Continuous Actions

* Actor-Critic
- A3C
- Advantage Actor-Critic
- Path-wise Derivative Policy Gradient

* Imitation Learning
- Behavior Cloning

* Inverse Reinforcement Learning (IRL)
- Framework of IRL
- IRL and GAN

* Sparse Reward
- Curiosity
- Curriculum Learning
- Hierarchical Reinforcement Learning
8. 整合学习和计划 Integrating Learning and Planning - 基于模型的增强学习 Model-Based Reinforcement Learning
- 整合架构 Integrated Architectures
- 基于模拟的搜索 Simulation-Based Search
9. 探索与开发 Exploration and Exploitation - Multi-Armed Bandits 多臂 Bandit 装置
- Contextual Bandits
- MDPs
10. 强化学习在游戏中的应用 - 博弈论概要
- 最小最大搜索 Minimax Search
- 自对弈增强学习 Self-Play Reinforcement Learning
- 结合强化学习和 Minimax 搜索
- 不完全信息游戏中的强化学习 RL in Imperfect-Information Games
1. Reinforcement Learning 入门
- Reinforcement Learning 特点
- Reinforcement Learning 案例
- Reinforcement Learning 组成
- Rewards
- Environment
- History and State
- Observation
- Agent: Policy, Value, Model
- 案例:迷宫学习
- Reinforcement Learning 分类
- Value Based
- Policy Based
- Actor Based
- Model Free vs Model Based
- Reinforcement Learning 中的顺序决策 sequential decision making 问题
- Learning and Planning
- 案例:电子游戏 Atari
- Exploration and Exploitation
- Prediction and Control
2. 马尔科夫决策过程 Markov Decision Processes (MDP)
- Markov Processes 马尔科夫过程
- Markov Reward Processes 马尔科夫回报过程
- Markov Decision Processes 马尔科夫决策过程
- MDP 扩展
3. 用动态规划做计划 Planning by Dynamic Programming
- 策略评估 Policy Evaluation
- 策略迭代 Policy Iteration
- 价值迭代 Value Iteration
- 动态规划扩展 Extension to DP
- 压缩映射 Contraction Mapping
4. 无模型预测 Model-Free Prediction
- 蒙特卡罗学习 Monte-Carlo Learning
- 时间差分学习 Temporal-Difference Learning
- TD( λ) 学习
5. 无模型控制 Model-Free Control
- 有策略蒙特卡罗控制 On-Policy Monte-Carlo Control
- 有策略时间差分学习 On-Policy Temporal-Difference Learning
- 无策略学习 Off-Policy Learning
6. 价值函数近似 Value Function Approximation
- 增量方法 Incremental Methods
- 批量方法 Batch Methods
7. 策略梯度法 Policy Gradient
- 有限差分政策梯度 Finite Difference Policy Gradient
- 蒙特卡洛策略梯度 Monte-Carlo Policy Gradient
- AC策略梯度 Actor-Critic Policy Gradient

* Proximal Policy Optimization (PPO)
- the default reinforcement learning algorithm at OpenAI

* On-Policy v.s. Off-policy: Importance Sampling
- Issue of Importance Sampling
- On-Policy -> Off-policy
- Add Constraint

* PPO / TRPO

* Q-Learning
- Critic
- Target Network
- Replay Buffer
- Tips of Q-Learning
- Double DQN
- Dueling DQN
- Prioritized Reply
- Noisy Net
- Distributed Q-function
- Rainbow
- Q-Learning for Continuous Actions

* Actor-Critic
- A3C
- Advantage Actor-Critic
- Path-wise Derivative Policy Gradient

* Imitation Learning
- Behavior Cloning

* Inverse Reinforcement Learning (IRL)
- Framework of IRL
- IRL and GAN

* Sparse Reward
- Curiosity
- Curriculum Learning
- Hierarchical Reinforcement Learning
8. 整合学习和计划 Integrating Learning and Planning
- 基于模型的增强学习 Model-Based Reinforcement Learning
- 整合架构 Integrated Architectures
- 基于模拟的搜索 Simulation-Based Search
9. 探索与开发 Exploration and Exploitation
- Multi-Armed Bandits 多臂 Bandit 装置
- Contextual Bandits
- MDPs
10. 强化学习在游戏中的应用
- 博弈论概要
- 最小最大搜索 Minimax Search
- 自对弈增强学习 Self-Play Reinforcement Learning
- 结合强化学习和 Minimax 搜索
- 不完全信息游戏中的强化学习 RL in Imperfect-Information Games
提交需求