Multi-Armed Bandits: Theory and Applications to Online Learning in Networks
暫譯: 多臂賭徒:理論與在網絡中在線學習的應用
Zhao, Qing, Srikant, R.
- 出版商: Morgan & Claypool
- 出版日期: 2019-11-21
- 售價: $3,050
- 貴賓價: 9.5 折 $2,898
- 語言: 英文
- 頁數: 147
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 1681736373
- ISBN-13: 9781681736372
海外代購書籍(需單獨結帳)
相關主題
商品描述
Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments.
Since the first bandit problem posed by Thompson in 1933 for the application of clinical trials, bandit problems have enjoyed lasting attention from multiple research communities and have found a wide range of applications across diverse domains. This book covers classic results and recent development on both Bayesian and frequentist bandit problems. We start in Chapter 1 with a brief overview on the history of bandit problems, contrasting the two schools-Bayesian and frequentis -of approaches and highlighting foundational results and key applications. Chapters 2 and 4 cover, respectively, the canonical Bayesian and frequentist bandit models. In Chapters 3 and 5, we discuss major variants of the canonical bandit models that lead to new directions, bring in new techniques, and broaden the applications of this classical problem. In Chapter 6, we present several representative application examples in communication networks and social-economic systems, aiming to illuminate the connections between the Bayesian and the frequentist formulations of bandit problems and how structural results pertaining to one may be leveraged to obtain solutions under the other.
商品描述(中文翻譯)
多臂賭徒問題涉及在未知環境中進行最佳序列決策和學習。
自從1933年湯普森(Thompson)提出的第一個賭徒問題以來,該問題在臨床試驗中的應用引起了多個研究社群的持續關注,並在各個領域找到了廣泛的應用。本書涵蓋了貝葉斯(Bayesian)和頻率主義(frequentist)賭徒問題的經典結果和最新發展。我們在第一章開始時簡要回顧賭徒問題的歷史,對比了貝葉斯和頻率主義兩種方法,並突出了基礎結果和關鍵應用。第二章和第四章分別涵蓋了經典的貝葉斯和頻率主義賭徒模型。在第三章和第五章中,我們討論了經典賭徒模型的主要變體,這些變體引導了新的方向,帶入了新技術,並擴大了這一經典問題的應用範圍。在第六章中,我們展示了幾個在通信網絡和社會經濟系統中的代表性應用示例,旨在闡明貝葉斯和頻率主義賭徒問題的公式之間的聯繫,以及如何利用與一種相關的結構性結果來獲得另一種的解決方案。