Architecting an Apache Iceberg Lakehouse: A Scalable, Open-Source Data Platform
暫譯: 設計 Apache Iceberg 湖倉:一個可擴展的開源數據平台
Merced, Alex
- 出版商: Manning
- 出版日期: 2026-05-19
- 售價: $2,110
- 貴賓價: 9.5 折 $2,004
- 語言: 英文
- 頁數: 408
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1633435105
- ISBN-13: 9781633435100
-
相關分類:
大數據 Big-data
尚未上市,無法訂購
商品描述
Get the eBook free when you register your print book at Manning. Design an Apache Iceberg lakehouse from scratch! The "lakehouse" data architecture is a powerful way to combine the flexibility of data lakes with the management features of data warehouses. The open source Apache Iceberg framework delivers the scalability, reliability, and performance you want from a lakehouse without the expense and vendor lock-in of platforms like Snowflake, BigQuery, and Redshift. In Architecting an Apache Iceberg Data Lakehouse, data guru Alex Merced shows you: - How to create a modular, scalable Iceberg lakehouse architecture
- Where Spark, Flink, Dremio, Polaris fit into your design
- Reliable batch and streaming ingestion pipelines
- Strategies for governance, security, and performance at scale Apache Iceberg is an open source table format perfect for massive analytic datasets. Iceberg enables ACID transactions, schema evolution, and high-performance queries on data lakes using multiple compute engines like Spark, Trino, Flink, Presto, and Hive. An Iceberg data lakehouse enables fast, reliable analytics at scale while retaining the observability you need for compliance audits, governance, and provable data security. Foreword by Tim Berglund. Afterword by Adi Polak. About the technology Apache Iceberg is an open data format that lets data lake files work like database tables. It helps turn a data lake into a more reliable and capable lakehouse. About the book Architecting an Apache Iceberg Lakehouse shows you how to design an open, scalable, and cost-effective lakehouse platform with Apache Iceberg. More than a set of blueprints, the book explains the reasoning behind the architecture. You'll build a mini lakehouse by ingesting sales and marketing data from PostgreSQL into Iceberg tables with Apache Spark and then create interactive dashboards in Apache Superset. You'll appreciate expert Alex Merced's real-world insights about operating an Iceberg lakehouse. What's inside - Create a modular, scalable Iceberg lakehouse architecture
- Fit Spark, Flink, Dremio, Polaris and more into your design
- Batch and streaming ingestion pipelines
- Governance, security, and performance at scale About the reader For data architects familiar with the basics of a data lakehouse. About the author Alex Merced is Head of Developer Relations at Dremio. He shares his expertise through videos, podcasts, and articles, and leads the DataLakehouseHub.com community. Table of Contents Part 1
1 The world of the data lakehouse
2 Apache Iceberg and the lakehouse
3 Hands-on with Apache Iceberg
Part 2
4 Preparing for your move to Apache Iceberg
5 Selecting the storage layer
6 Architecting the ingestion layer
7 Implementing the catalog layer
8 Designing the federation layer
9 Understanding the consumption layer
Part 3 Operating your Apache Iceberg lakehouse
10 Maintaining an Iceberg lakehouse
11 Operationalizing Apache Iceberg
A The metadata tables
B Python for Apache Iceberg
C The Apache Iceberg specification
- Where Spark, Flink, Dremio, Polaris fit into your design
- Reliable batch and streaming ingestion pipelines
- Strategies for governance, security, and performance at scale Apache Iceberg is an open source table format perfect for massive analytic datasets. Iceberg enables ACID transactions, schema evolution, and high-performance queries on data lakes using multiple compute engines like Spark, Trino, Flink, Presto, and Hive. An Iceberg data lakehouse enables fast, reliable analytics at scale while retaining the observability you need for compliance audits, governance, and provable data security. Foreword by Tim Berglund. Afterword by Adi Polak. About the technology Apache Iceberg is an open data format that lets data lake files work like database tables. It helps turn a data lake into a more reliable and capable lakehouse. About the book Architecting an Apache Iceberg Lakehouse shows you how to design an open, scalable, and cost-effective lakehouse platform with Apache Iceberg. More than a set of blueprints, the book explains the reasoning behind the architecture. You'll build a mini lakehouse by ingesting sales and marketing data from PostgreSQL into Iceberg tables with Apache Spark and then create interactive dashboards in Apache Superset. You'll appreciate expert Alex Merced's real-world insights about operating an Iceberg lakehouse. What's inside - Create a modular, scalable Iceberg lakehouse architecture
- Fit Spark, Flink, Dremio, Polaris and more into your design
- Batch and streaming ingestion pipelines
- Governance, security, and performance at scale About the reader For data architects familiar with the basics of a data lakehouse. About the author Alex Merced is Head of Developer Relations at Dremio. He shares his expertise through videos, podcasts, and articles, and leads the DataLakehouseHub.com community. Table of Contents Part 1
1 The world of the data lakehouse
2 Apache Iceberg and the lakehouse
3 Hands-on with Apache Iceberg
Part 2
4 Preparing for your move to Apache Iceberg
5 Selecting the storage layer
6 Architecting the ingestion layer
7 Implementing the catalog layer
8 Designing the federation layer
9 Understanding the consumption layer
Part 3 Operating your Apache Iceberg lakehouse
10 Maintaining an Iceberg lakehouse
11 Operationalizing Apache Iceberg
A The metadata tables
B Python for Apache Iceberg
C The Apache Iceberg specification
商品描述(中文翻譯)
在Manning註冊您的印刷書籍時可免費獲得電子書。
從零開始設計一個Apache Iceberg湖屋! 「湖屋」數據架構是一種強大的方式,將數據湖的靈活性與數據倉庫的管理特性結合在一起。開源的Apache Iceberg框架提供了您所需的可擴展性、可靠性和性能,而不需要像Snowflake、BigQuery和Redshift這樣的平台所帶來的高昂費用和供應商鎖定。 在Architecting an Apache Iceberg Data Lakehouse中,數據專家Alex Merced將向您展示: - 如何創建一個模組化、可擴展的Iceberg湖屋架構- Spark、Flink、Dremio、Polaris在您的設計中如何適配
- 可靠的批量和流式數據攝取管道
- 大規模治理、安全性和性能的策略 Apache Iceberg是一種開源表格格式,適合用於大規模分析數據集。Iceberg支持ACID交易、模式演變,並能在數據湖上使用多個計算引擎(如Spark、Trino、Flink、Presto和Hive)進行高性能查詢。Iceberg數據湖屋能夠在保持合規審計、治理和可證明數據安全所需的可觀察性的同時,實現快速、可靠的大規模分析。 前言由Tim Berglund撰寫。後記由Adi Polak撰寫。 關於技術 Apache Iceberg是一種開放數據格式,讓數據湖文件像數據庫表一樣運作。它幫助將數據湖轉變為更可靠和更具能力的湖屋。 關於本書 《Architecting an Apache Iceberg Lakehouse》向您展示如何使用Apache Iceberg設計一個開放、可擴展且具成本效益的湖屋平台。這本書不僅僅是一套藍圖,還解釋了架構背後的推理。您將通過使用Apache Spark將來自PostgreSQL的銷售和市場數據攝取到Iceberg表中,然後在Apache Superset中創建互動式儀表板,來構建一個迷你湖屋。您將欣賞到專家Alex Merced對運營Iceberg湖屋的現實見解。 內容概覽 - 創建一個模組化、可擴展的Iceberg湖屋架構
- 將Spark、Flink、Dremio、Polaris等融入您的設計
- 批量和流式數據攝取管道
- 大規模的治理、安全性和性能 讀者對象 適合熟悉數據湖屋基礎知識的數據架構師。 作者介紹 Alex Merced是Dremio的開發者關係負責人。他通過視頻、播客和文章分享他的專業知識,並領導DataLakehouseHub.com社區。 目錄 第一部分
1 數據湖屋的世界
2 Apache Iceberg與湖屋
3 實作Apache Iceberg
第二部分
4 準備遷移到Apache Iceberg
5 選擇存儲層
6 設計攝取層
7 實施目錄層
8 設計聯邦層
9 理解消費層
第三部分 操作您的Apache Iceberg湖屋
10 維護Iceberg湖屋
11 將Apache Iceberg運營化
A 元數據表
B 用於Apache Iceberg的Python
C Apache Iceberg規範
作者簡介
Alex Merced is Head of Developer Relations at Dremio, where he helps developers navigate modern data architectures. He shares his expertise through videos, podcasts, and articles, and leads the DataLakehouseHub.com community. He is the co-author of Apache Iceberg: The Definitive Guide.
作者簡介(中文翻譯)
Alex Merced 是 Dremio 的開發者關係負責人,他幫助開發者了解現代數據架構。他通過影片、播客和文章分享他的專業知識,並領導 DataLakehouseHub.com 社群。他是 Apache Iceberg: The Definitive Guide 的共同作者。