Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models (使用 Python 的深度強化學習:針對聊天機器人和大型語言模型的 RLHF)

Sanghi, Nimish

買這商品的人也買了...

相關主題

商品描述

Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicate the latest research in this field.

New agent environments ranging from games, and robotics to finance are explained to help you try different ways to apply reinforcement learning. A chapter on multi-agent reinforcement learning covers how multiple agents compete, while another chapter focuses on the widely used deep RL algorithm, proximal policy optimization (PPO). You'll see how reinforcement learning with human feedback (RLHF) has been used by chatbots, built using Large Language Models, e.g. ChatGPT to improve conversational capabilities.

You'll also review the steps for using the code on multiple cloud systems and deploying models on platforms such as Hugging Face Hub. The code is in Jupyter Notebook, which canbe run on Google Colab, and other similar deep learning cloud platforms, allowing you to tailor the code to your own needs.

Whether it's for applications in gaming, robotics, or Generative AI, Deep Reinforcement Learning with Python will help keep you ahead of the curve.


What You'll Learn

 

 

 

  • Explore Python-based RL libraries, including StableBaselines3 and CleanRL
  • Work with diverse RL environments like Gymnasium, Pybullet, and Unity ML
  • Understand instruction finetuning of Large Language Models using RLHF and PPO
  • Study training and optimization techniques using HuggingFace, Weights and Biases, and Optuna

 

Who This Book Is For

Software engineers and machine learning developers eager to sharpen their understanding of deep RL and acquire practical skills in implementing RL algorithms fromscratch.

 

商品描述(中文翻譯)

獲得對深度強化學習(deep RL)中最受歡迎的庫的理論理解。本新版本專注於深度強化學習的最新進展,採用以編碼學習的方式,讓讀者能夠吸收並複製該領域的最新研究。

新代理環境涵蓋從遊戲、機器人到金融的各種應用,幫助您嘗試不同的強化學習應用方式。一章關於多代理強化學習的內容探討了多個代理之間的競爭,而另一章則專注於廣泛使用的深度強化學習算法——近端策略優化(PPO)。您將看到如何利用人類反饋的強化學習(RLHF)來改善聊天機器人的對話能力,這些聊天機器人是基於大型語言模型(如ChatGPT)構建的。

您還將回顧在多個雲系統上使用代碼的步驟,以及如何在Hugging Face Hub等平台上部署模型。代碼使用Jupyter Notebook編寫,可以在Google Colab及其他類似的深度學習雲平台上運行,讓您能夠根據自己的需求調整代碼。

無論是應用於遊戲、機器人還是生成式人工智慧,《Deep Reinforcement Learning with Python》將幫助您保持領先。

您將學到的內容:
- 探索基於Python的強化學習庫,包括StableBaselines3和CleanRL
- 使用多樣的強化學習環境,如Gymnasium、Pybullet和Unity ML
- 理解使用RLHF和PPO對大型語言模型進行指令微調
- 研究使用HuggingFace、Weights and Biases和Optuna的訓練和優化技術

本書適合對象:
渴望加深對深度強化學習理解並獲得從零開始實施強化學習算法的實用技能的軟體工程師和機器學習開發者。

作者簡介

Nimish is a seasoned entrepreneur and an angel investor, with a rich portfolio of tech ventures in SaaS Software and Automation with AI across India, the US and Singapore. He has over 30 years of work experience. Nimish ventured into entrepreneurship in 2006 after holding leadership roles at global corporations like PwC, IBM, and Oracle.

 

Nimish holds an MBA from Indian Institute of Management, Ahmedabad, India (IIMA), and a Bachelor of Technology in Electrical Engineering from Indian Institute of Technology, Kanpur, India (IITK). ​

 

作者簡介(中文翻譯)

Nimish 是一位經驗豐富的企業家和天使投資人,擁有在印度、美國和新加坡的 SaaS 軟體和自動化 AI 技術創業的豐富投資組合。他擁有超過 30 年的工作經驗。Nimish 在 2006 年開始創業,此前曾在 PwC、IBM 和 Oracle 等全球企業擔任領導職位。

Nimish 擁有印度艾哈邁達巴德管理學院 (IIMA) 的 MBA 學位,以及印度坎普爾科技學院 (IITK) 的電機工程學士學位。