（深層）強化学習の解説とデモ

（深層）強化学習についてデモを交えて解説しています．デモに使用したPythonコードは「100+の最適化問題」の最短路のページ https://scmopt.github.io/opt100/03sp.html にあります．

動画プレイリスト:
Pythonで数理最適化モデルを作る方法 https://www.youtube.com/playlist?list=PLz8sHu_CzBwNi2aJcJNicOmsWHdTvF4Ka
組合せ最適化とアルゴリズム https://www.youtube.com/playlist?list=PLz8sHu_CzBwPVWVwsnKVjjSqZyZiwnj5y
Python言語による実務で役に立つ100の最適化問題 https://www.youtube.com/playlist?list=PLz8sHu_CzBwPMIdyL_WMEVUw-GOSL-J6w
ただでアナリティクスの専門家になる方法 https://www.youtube.com/playlist?list=PLz8sHu_CzBwNTHy3GDouxNPI0QmdACDKZ
データサイエンス練習問題集 https://www.youtube.com/playlist?list=PLz8sHu_CzBwPt3BPmwYjseKbQoIsqAO4T
データサイエンス講義

NumPyのはじめの一歩

Python言語超入門 https://www.youtube.com/playlist?list=PLz8sHu_CzBwNHOgeAha17VVOvFh7okuZg
メタヒューリスティクス https://www.youtube.com/playlist?list=PLz8sHu_CzBwNJt9a0P50hlDfL9P3RcJwS
SCMOPT サプライ・チェイン最適化プロジェクト https://www.youtube.com/playlist?list=PLz8sHu_CzBwNLQJeRjadZcSvXG-LmEIGP
MITの深層学習講義を日本語で解説 https://www.youtube.com/playlist?list=PLz8sHu_CzBwMRsto31_ddblF82Y4qI03_
サプライ・チェイン最適化講義 https://www.youtube.com/playlist?list=PLz8sHu_CzBwN2b-9Wo2RqzMXdnPVf4Lz8
サプライ・チェイン最適化特論 https://www.youtube.com/playlist?list=PLz8sHu_CzBwO0CePoT8KG2j5tB0SL102z
制約最適化ソルバー SCOP https://www.youtube.com/playlist?list=PLz8sHu_CzBwOxfNC2f5vL0n1QAa073AkI
スケジューリング最適化ソルバーOptSeq https://www.youtube.com/playlist?list=PLz8sHu_CzBwOcukYoz2PzoNXAqY5awFRc

Table of Contents:
00:05 – 強化学習とは
01:28 – 例：格子世界のロボット
03:30 – マルコフ決定過程 (MDP)
05:32 – 方策
07:05 – 価値関数
08:22 – 最適価値関数
10:54 – 動的計画法
12:06 – 方策評価と方策改善
14:10 – 方策反復/価値反復
15:02 – モンテカルロ(MC)方策評価
16:45 – モンテカルロ (MC) コントロール
17:28 – 探索（exploration）
18:26 – モンテカルロ法の利点
20:14 – Temporal Difference （TD)学習
22:27 – モンテカルロ vs. TD
23:53 – Sarsa
24:52 – デモ (エピソード1）ソースコードは「100+の最適化問題」https://mikiokubo.github.io/opt100/の最短路のページ
26:09 – デモ (エピソード2）
26:50 – デモ (エピソード5000経過）
27:14 – 深層強化学習