Building Large Language Models from Scratch: Design, Train, and Deploy Llms with Pytorch
暫譯: 從零開始構建大型語言模型：使用 Pytorch 設計、訓練和部署 LLMs

Name: Building Large Language Models from Scratch: Design, Train, and Deploy Llms with Pytorch
Price: 2289 TWD
Availability: OnlineOnly
Author: Grigorov, Dilyan
ISBN: 9798868822964

Grigorov, Dilyan

出版商: Apress
出版日期: 2026-04-28
售價: $2,410
貴賓價: 9.5 折 $2,289
語言: 英文
頁數: 530
裝訂: Quality Paper - also called trade paper
ISBN: 9798868822964
ISBN-13: 9798868822964
相關分類: Large language model

海外代購書籍(需單獨結帳)

商品描述

This book is a complete, hands-on guide to designing, training, and deploying your own Large Language Models (LLMs)--from the foundations of tokenization to the advanced stages of fine-tuning and reinforcement learning. Written for developers, data scientists, and AI practitioners, it bridges core principles and state-of-the-art techniques, offering a rare, transparent look at how modern transformers truly work beneath the surface.

Starting from the essentials, you'll learn how to set up your environment with Python and PyTorch, manage datasets, and implement critical fundamentals such as tensors, embeddings, and gradient descent. You'll then progress through the architectural heart of modern models, covering RMS normalization, rotary positional embeddings (RoPE), scaled dot-product attention, Grouped Query Attention (GQA), Mixture of Experts (MoE), and SwiGLU activations, each explored in depth and built step by step in code. As you advance, the book introduces custom CUDA kernel integration, teaching you how to optimize key components for speed and memory efficiency at the GPU level--an essential skill for scaling real-world LLMs. You'll also gain mastery over the phases of training that define today's leading models:

Pretraining - Building general linguistic and semantic understanding. Midtraining - Expanding domain-specific capabilities and adaptability. Supervised Fine-Tuning (SFT) - Aligning behavior with curated, task-driven data. Reinforcement Learning from Human Feedback (RLHF) - Refining responses through reward-based optimization for human alignment. The final chapters guide you through dataset preparation, filtering, deduplication, and training optimization, culminating in model evaluation and real-world prompting with a custom TokenGenerator for text generation and inference.

By the end of this book, you'll have the knowledge and confidence to architect, train, and deploy your own transformer-based models, equipped with both the theoretical depth and practical expertise to innovate in the rapidly evolving world of AI.

What You'll Learn

How to configure and optimize your development environment using PyTorch The mechanics of tokenization, embeddings, normalization, and attention mechanisms. How to implement transformer components like RMSNorm, RoPE, GQA, MoE, and SwiGLU from scratch. How to integrate custom CUDA kernels to accelerate transformer computations. The full LLM training pipeline: pretraining, midtraining, supervised fine-tuning, and RLHF. Techniques for dataset preparation, deduplication, model debugging, and GPU memory management. How to train, evaluate, and deploy a complete GPT-like architecture for real-world tasks. Who this book is for:

Software developers, data scientists, machine learning engineers and AI enthusiasts looking to build their models from scratch.

商品描述(中文翻譯)

這本書是一本完整的實作指南，涵蓋設計、訓練和部署您自己的大型語言模型（Large Language Models, LLMs）的所有步驟——從標記化（tokenization）的基礎到微調（fine-tuning）和強化學習（reinforcement learning）的進階階段。這本書是為開發者、數據科學家和人工智慧（AI）從業者撰寫的，橋接了核心原則和最先進的技術，提供了一個罕見且透明的視角，讓您了解現代變壓器（transformers）在表面之下的真實運作。

從基本概念開始，您將學習如何使用 Python 和 PyTorch 設置環境、管理數據集，並實現關鍵的基本概念，如張量（tensors）、嵌入（embeddings）和梯度下降（gradient descent）。接著，您將深入現代模型的架構核心，涵蓋 RMS 正規化（RMS normalization）、旋轉位置嵌入（rotary positional embeddings, RoPE）、縮放點積注意力（scaled dot-product attention）、分組查詢注意力（Grouped Query Attention, GQA）、專家混合（Mixture of Experts, MoE）和 SwiGLU 激活，每個概念都將深入探討並逐步在代碼中構建。隨著進展，本書介紹了自定義 CUDA 核心的整合，教您如何在 GPU 層面優化關鍵組件以提高速度和內存效率——這是擴展現實世界 LLM 的基本技能。您還將掌握定義當今領先模型的訓練階段：

- 預訓練（Pretraining） - 建立一般的語言和語義理解。
- 中期訓練（Midtraining） - 擴展特定領域的能力和適應性。
- 監督微調（Supervised Fine-Tuning, SFT） - 將行為與策劃的任務驅動數據對齊。
- 從人類反饋中進行強化學習（Reinforcement Learning from Human Feedback, RLHF） - 通過基於獎勵的優化來精煉回應，以實現人類對齊。

最後幾章將指導您進行數據集準備、過濾、去重和訓練優化，最終以模型評估和使用自定義 TokenGenerator 進行文本生成和推理的實際應用作結尾。

到本書結束時，您將擁有架構、訓練和部署您自己的基於變壓器的模型的知識和信心，具備理論深度和實踐專業知識，以在快速發展的 AI 世界中創新。

您將學到什麼

- 如何使用 PyTorch 配置和優化您的開發環境
- 標記化、嵌入、正規化和注意力機制的運作原理
- 如何從零開始實現變壓器組件，如 RMSNorm、RoPE、GQA、MoE 和 SwiGLU
- 如何整合自定義 CUDA 核心以加速變壓器計算
- 完整的 LLM 訓練流程：預訓練、中期訓練、監督微調和 RLHF
- 數據集準備、去重、模型調試和 GPU 內存管理的技術
- 如何訓練、評估和部署一個完整的類 GPT 架構以應對現實世界的任務

本書適合誰：

軟體開發者、數據科學家、機器學習工程師和希望從零開始構建模型的 AI 愛好者。

作者簡介

Dilyan Grigorov is a software developer with a passion for Python software development, generative deep learning & machine learning, data structures, and algorithms. He is an advocate for open source and the Python language itself. He has 16 years of industry experience programming in Python and has spent 5 of those years researching and testing Generative AI solutions. His passion for them stems from his background as an SEO specialist dealing with search engine algorithms daily. He enjoys engaging with the software community, often giving talks at local meetups and larger conferences. In his spare time, he enjoys reading books, hiking in the mountains, taking long walks, playing with his son, and playing the piano.

作者簡介(中文翻譯)

Dilyan Grigorov 是一位軟體開發人員，對 Python 軟體開發、生成式深度學習與機器學習、資料結構和演算法充滿熱情。他是開源和 Python 語言的倡導者。擁有 16 年的行業經驗，專注於 Python 編程，其中有 5 年專注於研究和測試生成式 AI 解決方案。他對這些技術的熱情源於他作為 SEO 專家，每天處理搜尋引擎演算法的背景。他喜歡與軟體社群互動，經常在當地的聚會和大型會議上發表演講。在空閒時間，他喜歡閱讀書籍、登山健行、散步、和兒子玩耍以及彈鋼琴。