中圖網

>

計算機理論

強化學習的數學原理(英文版)

包郵強化學習的數學原理(英文版)

作者：趙世鈺著

出版社：清華大學出版社出版時間：2024-07-01

開本： 16開 頁數： 312

本類榜單：計算機/網絡銷量榜

中圖價:¥87.3(7.4折) 定價 ~~¥118.0~~ 登錄后可看到會員價

加入購物車收藏

開年大促， 全場包郵

?新疆、西藏除外

本類五星書更多>

>
全國計算機等級考試最新真考題庫模擬考場及詳解·二級MSOffice高級應用

全國計算機等級考試最新真考題庫模擬考場及詳解·二級MSOffice高級應用

¥14.4¥45
>
決戰行測5000題(言語理解與表達)

決戰行測5000題(言語理解與表達)

¥44.1¥88
>
軟件性能測試.分析與調優實踐之路

軟件性能測試.分析與調優實踐之路

¥56.2¥69
>
第一行代碼Android

第一行代碼Android

¥55.4¥99
>
JAVA持續交付

JAVA持續交付

¥58.1¥119
>
EXCEL最強教科書(完全版)(全彩印刷)

EXCEL最強教科書(完全版)(全彩印刷)

¥31.1¥69.9
>
深度學習

深度學習

¥92.4¥168

商品詳情
商品評論(0條)

中圖價:¥87.3 加入購物車

版權信息
本書特色
內容簡介
目錄
作者簡介

強化學習的數學原理(英文版) 版權信息

ISBN：9787302658528
條形碼：9787302658528 ; 978-7-302-65852-8
裝幀：一般膠版紙
冊數：暫無
重量：暫無
所屬分類：
計算機/網絡
>
計算機理論

強化學習的數學原理(英文版) 本書特色

·從零開始到透徹理解，知其然并知其所以然； ·本書在GitHub收獲2000 星； ·課程視頻全網播放超過80萬； ·國內外讀者反饋口碑爆棚； ·教材、視頻、課件三位一體。

強化學習的數學原理(英文版) 內容簡介

本書從強化學習*基本的概念開始介紹, 將介紹基礎的分析工具, 包括貝爾曼公式和貝爾曼* 優公式, 然后推廣到基于模型的和無模型的強化學習算法, *后推廣到基于函數逼近的強化學習方法。本書強調從數學的角度引入概念、分析問題、分析算法, 并不強調算法的編程實現。本書不要求讀者具備任何關于強化學習的知識背景, 僅要求讀者具備一定的概率論和線性代數的知識。如果讀者已經具備強化學習的學習基礎, 本書可以幫助讀者更深入地理解一些問題并提供新的視角。本書面向對強化學習感興趣的本科生、研究生、研究人員和企業或研究所的從業者。

強化學習的數學原理(英文版) 目錄

Overview of this BookChapter 1 Basic Concepts1.1 A grid world example1.2 State and action1.3 State transition1.4 Policy1.5 Reward1.6 Trajectories, returns, and episodes1.7 Markov decision processes1.8 Summary1.9 Q&AChapter 2 State Values and the Bellman Equation2.1 Motivating example 1: Why are returns important?2.2 Motivating example 2: How to calculate returns?2.3 State values2.4 The Bellman equation2.5 Examples for illustrating the Bellman equation2.6 Matrix-vector form of the Bellman equation2.7 Solving state values from the Bellman equation2.7.1 Closed-form solution2.7.2 Iterative solution2.7.3 Illustrative examples2.8 From state value to action value2.8.1 Illustrative examples2.8.2 The Bellman equation in terms of action values2.9 Summary2.10 Q&AChapter 3 Optimal State Values and the Bellman Optimality Equation3.1 Motivating example: How to improve policies?3.2 Optimal state values and optimal policies3.3 The Bellman optimality equation3.3.1 Maximization of the right-hand side of the BOE3.3.2 Matrix-vector form of the BOE3.3.3 Contraction mapping theorem3.3.4 Contraction property of the right-hand side of the BOE3.4 Solving an optimal policy from the BOE3.5 Factors that influence optimal policies3.6 Summary3.7 Q&AChapter 4 Value Iteration and Policy Iteration4.1 Value iteration4.1.1 Elementwise form and implementation4.1.2 Illustrative examples4.2 Policy iteration4.2.1 Algorithm analysis4.2.2 Elementwise form and implementation4.2.3 Illustrative examples4.3 Truncated policy iteration4.3.1 Comparing value iteration and policy iteration4.3.2 Truncated policy iteration algorithm4.4 Summary4.5 Q&AChapter 5 Monte Carlo Methods5.1 Motivating example: Mean estimation5.2 MC Basic: The simplest MC-based algorithm5.2.1 Converting policy iteration to be model-free5.2.2 The MC Basic algorithm5.2.3 Illustrative examples5.3 MC Exploring Starts5.3.1 Utilizing samples more efficiently5.3.2 Updating policies more efficiently5.3.3 Algorithm description5.4 MC ∈-Greedy: Learning without exploring starts5.4.1 ∈-greedy policies5.4.2 Algorithm description5.4.3 Illustrative examples5.5 Exploration and exploitation of ∈-greedy policies5.6 Summary5.7 Q&AChapter 6 Stochastic Approximation6.1 Motivating example: Mean estimation6.2 Robbins-Monro algorithm6.2.1 Convergence properties6.2.2 Application to mean estimation6.3 Dvoretzky's convergence theorem6.3.1 Proof of Dvoretzky's theorem6.3.2 Application to mean estimation6.3.3 Application to the Robbins-Monro theorem6.3.4 An extension of Dvoretzky's theorem6.4 Stochastic gradient descent6.4.1 Application to mean estimation6.4.2 Convergence pattern of SGD6.4.3 A deterministic formulation of SGD6.4.4 BGD, SGD, and mini-batch GD6.4.5 Convergence of SGD6.5 Summary6.6 Q&AChapter 7 Temporal-Difference Methods7.1 TD learning of state values7.1.1 Algorithm description7.1.2 Property analysis7.1.3 Convergence analysis7.2 TD learning of action values: Sarsa7.2.1 Algorithm description7.2.2 Optimal policy learning via Sarsa7.3 TD learning of action values: n-step Sarsa7.4 TD learning of optimal action values: Q-learning7.4.1 Algorithm description7.4.2 Off-policy vs. on-policy7.4.3 Implementation7.4.4 Illustrative examples7.5 A unifed viewpoint7.6 Summary7.7 Q&AChapter 8 Value Function Approximation8.1 Value representation: From table to function8.2 TD learning of state values with function approximation8.2.1 O

展開全部

強化學習的數學原理(英文版) 作者簡介

趙世鈺，西湖大學工學院AI分支特聘研究員，智能無人系統實驗室負責人，國家海外高層次人才引進計劃青年項目獲得者；本碩畢業于北京航空航天大學，博士畢業于新加坡國立大學，曾任英國謝菲爾德大學自動控制與系統工程系Lecturer；致力于研發有趣、有用、有挑戰性的下一代機器人系統，重點關注多機器人系統中的控制、決策與感知等問題。

商品評論(0條)

寫書評賺書幣

暫無評論……

書友推薦

>
朝聞道
朝聞道
劉慈欣
¥14.8~~¥23.8~~
>
莉莉和章魚
莉莉和章魚
[美]史蒂文·羅利著，祝文亭譯
¥14.4~~¥42.0~~
>
名家帶你讀魯迅:朝花夕拾
名家帶你讀魯迅:朝花夕拾
魯迅著，陳漱渝主編
¥10.5~~¥21.0~~
>
伊索寓言-世界文學名著典藏-全譯本
伊索寓言-世界文學名著典藏-全譯本
[古希臘] 伊索著，陳韻如譯
¥6.7~~¥19.0~~
>
巴金－再思錄
巴金－再思錄
巴金
¥15.7~~¥46.0~~
>
李白與唐代文化
李白與唐代文化
葛景春
¥9.9~~¥29.8~~
>
苦雨齋序跋文-周作人自編集
苦雨齋序跋文-周作人自編集
周作人著，止庵校訂
¥6.1~~¥16.0~~
>
推拿
推拿
畢飛宇
¥12.2~~¥32.0~~

本類暢銷

大模型應用開發極簡入門基于GPT-4和ChatGPT

(比)奧利維耶·卡埃朗,(法)瑪麗-艾麗

¥41.9~~¥59.8~~
人工智能現代方法第4版(全2冊)

(美)斯圖爾特·羅素,(美)彼得·諾維格

¥120.8~~¥198~~
計算機視覺:算法與應用

RichardSzeliski、艾海舟

¥95.9~~¥139~~
必然(修訂版)

KevinKelly（凱文·凱利）

¥53.5~~¥89~~
GPT時代人類再騰飛

(美)里德·霍夫曼,美國GPT-4

¥54.9~~¥89.9~~
生成式人工智能(AIGC)應用

張亭婷,湯景,陶蕊編

¥64.9~~¥90~~

橡胶接头_橡胶软接头_可曲挠橡胶接头-河南伟创管道科技有限公司

包郵強化學習的數學原理(英文版)

強化學習的數學原理(英文版) 版權信息

強化學習的數學原理(英文版) 本書特色

強化學習的數學原理(英文版) 內容簡介

強化學習的數學原理(英文版) 目錄

強化學習的數學原理(英文版) 作者簡介

朝聞道

莉莉和章魚

名家帶你讀魯迅:朝花夕拾

伊索寓言-世界文學名著典藏-全譯本

巴金－再思錄

李白與唐代文化

苦雨齋序跋文-周作人自編集

推拿

大模型應用開發極簡入門基于GPT-4和ChatGPT

人工智能現代方法第4版(全2冊)

計算機視覺:算法與應用

必然(修訂版)

GPT時代人類再騰飛

生成式人工智能(AIGC)應用

“水滸”識小錄

陶潛和櫻子

張學良人格圖譜

房思琪的初戀樂園

朱仙鎮年畫:七日談

每日論語

包郵 強化學習的數學原理(英文版)

強化學習的數學原理(英文版) 版權信息

強化學習的數學原理(英文版) 本書特色

強化學習的數學原理(英文版) 內容簡介

強化學習的數學原理(英文版) 目錄

強化學習的數學原理(英文版) 作者簡介

包郵強化學習的數學原理(英文版)