構建數據湖倉
(美)比爾·恩門(Bill Inmon),(美)瑪麗·萊文斯(Mary Levins),(美)蘭吉特·斯里瓦斯塔瓦(Ranjeet Srivastava)著;上海市靜安區
買這商品的人也買了...
-
$250Maven 應用實戰
-
$294$279 -
$352Apache Kafka 2.0 入門與實踐
-
$454數據中台:讓數據用起來
-
$454超大流量分佈式系統架構解決方案:人人都是架構師2.0
-
$474$450 -
$352數據倉庫, 4/e (Building the Data Warehouse, 4/e)
-
$1,000$780 -
$414$393 -
$505編程的邏輯:如何用面向對象方法實現復雜業務需求
-
$505標簽類目體系:面向業務的數據資產設計方法論
-
$454大數據分析師面試筆試寶典
相關主題
商品描述
目錄大綱
目 錄
引 言
第一章 向數據湖倉演進
1. 技術的演進 ······································································3
2. 組織內的全部數據 ······························································8
3. 商業價值在哪裡? ··························································· 12
4. 數據湖 ··········································································· 13
5. 當前數據架構的挑戰 ························································· 14
6. 數據湖倉的出現 ······························································· 15
第二章 數據科學家和終端用戶
1. 數據湖 ·········································································· 20
2. 分析基礎設施 ································································· 21
3. 不同的受眾 ····································································· 21
4. 分析工具不同 ·································································· 22
5. 分析目的不同 ·································································· 23
6. 分析方法不同 ·································································· 24
7. 數據類型不同 ·································································· 24
第三章 數據湖倉中的不同類型數據
1. 數據的類型 ····································································· 28
2. 不同數據的容量 ······························································· 31
3. 跨越不同類型數據的關聯數據 ············································· 32
4. 基於訪問概率對數據進行分片 ············································· 33
5. 模擬和物聯網環境中的關聯數據 ·········································· 33
6. 分析基礎設施 ································································· 35
第四章 開放的湖倉環境
1. 開放系統的演進 ······························································· 38
2. 與時俱進的創新 ······························································ 39
3. 建立在開放、標準文件格式之上的非結構化湖倉 ······················ 39
4. 開源數據湖倉軟件 ···························································· 40
5. 數據湖倉提供超越 SQL 的開放 API······································· 41
6. 數據湖倉支持開放數據共享 ················································ 42
7. 數據湖倉支持開放數據探索 ················································ 43
8. 數據湖倉通過開放數據目錄簡化數據發現 ······························ 44
9. 利用雲原生架構的數據湖倉 ················································ 45
10. 向開放的數據湖倉演進 ···················································· 46
第五章 機器學習和數據湖倉
1. 機器學習 ········································································ 47
2. 機器學習需要湖倉提供什麽? ············································· 48
3. 從數據中挖掘出新價值 ····················································· 48
4. 解決這個難題 ·································································· 48
5. 非結構化數據問題 ··························································· 49
6. 開源的重要性 ·································································· 51
7. 發揮雲的彈性優勢 ··························································· 51
8. 為數據平臺設計“MLOps”··················································52
9. 案例:運用機器學習對胸透 X 光片進行分類 ··························· 53
10. 數據湖倉的非結構化組件的演進 ········································· 55
第六章 數據湖倉中的分析基礎設施
1. 元數據 ··········································································· 58
2. 數據模型 ······································································· 59
3. 數據質量 ······································································· 60
4. ETL ·············································································· 61
5. 文本 ETL········································································ 62
6. 分類標準 ········································································ 62
7. 數據體量 ······································································· 63
8. 數據血緣 ········································································ 64
9. KPI ··············································································· 65
10. 數據的粒度 ··································································· 66
11. 事務 ············································································ 66
12. 鍵 ··············································································· 66
13. 處理計劃 ······································································ 67
14. 匯總數據 ····································································· 67
15. 最低要求 ······································································ 68
第七章 數據湖倉中的數據融合
1. 湖倉和數據湖倉 ······························································ 69
2. 數據的源頭 ···································································· 70
3. 不同類型的分析 ······························································ 70
4. 通用標識符 ····································································· 72
5. 結構化標識符 ································································· 72
6. 重復數據 ······································································· 73
7. 文本環境中的標識符 ························································ 74
8. 文本數據和結構化數據的融合 ············································· 76
9. 匹配的重要性 ································································· 81
第八章 跨數據湖倉架構的分析類型
1. 已知查詢 ········································································ 83
2. 啟發式分析 ····································································· 85
第九章 數據湖倉倉務管理
1. 數據集成和互操作 ···························································· 92
2. 數據湖倉的主數據及參考數據 ············································· 94
3. 數據湖倉的隱私、保密和數據保護 ········································ 96
4. 數據湖倉中面向未來的數據 ················································ 97
5. 面向未來的數據的五個階段 ··············································· 101
6. 數據湖倉的例行維護 ························································ 108
第十章 可視化
1. 將數據轉化為信息 ··························································· 110
2. 什麽是數據可視化?為什麽它很重要? ································· 112
3. 數據可視化、數據分析和數據解釋之間的差異 ························ 113
4. 數據可視化的優勢 ··························································· 115
第十一章 數據湖倉架構中的數據血緣
1. 計算鏈 ·········································································· 124
2. 數據選取 ······································································· 126
3. 算法差異 ······································································· 126
4. 文本數據血緣 ································································· 127
5. 其他非結構化環境的數據血緣 ············································ 128
6. 數據血緣 ······································································· 129
第十二章 數據湖倉架構中的訪問概率
1. 數據的高效排列 ······························································ 131
2. 數據的訪問概率 ······························································ 131
3. 數據湖倉中不同的數據類型 ··············································· 133
4. 數據量的相對差異 ··························································· 133
5. 數據分片的優勢 ······························································ 134
6. 使用大容量存儲 ······························································ 134
7. 附加索引 ······································································· 135
第十三章 跨越鴻溝
1. 合並數據 ······································································· 136
2. 不同種類的數據 ······························································ 137
3. 不同的業務需求 ······························································ 137
4. 跨越鴻溝 ······································································· 137
第十四章 數據湖倉中的海量數據
1. 海量數據的分佈 ······························································ 145
2. 高性能、大容量的數據存儲 ··············································· 146
3. 附加索引和摘要 ······························································ 146
4. 周期性的數據過濾 ··························································· 148
5. 數據標記法 ···································································· 148
6. 分離文本和數據庫 ··························································· 149
7. 歸檔存儲 ······································································· 149
8. 監測活動 ······································································· 150
9. 並行處理 ······································································· 151
第十五章 數據治理與數據湖倉
1. 數據治理的目的 ······························································ 152
2. 數據生命周期管理 ··························································· 154
3. 數據質量管理 ································································· 156
4. 元數據管理的重要性 ························································ 157
5. 隨著時間推移的數據治理 ·················································· 157
6. 數據治理的類型 ······························································ 158
7. 貫穿數據湖倉的數據治理 ·················································· 159
8. 數據治理的註意事項 ························································ 160
第十六章 現代數據倉庫
1. 應用程序的普及 ······························································ 162
2. 信息孤島 ······································································· 163
3. 復雜網絡環境 ································································· 164
4. 數據倉庫 ······································································· 165
5. 數據倉庫的定義 ······························································ 166
6. 歷史數據 ······································································· 167
7. 關系模型 ······································································· 167
8. 數據的本地形式 ······························································ 168
9. 集成數據的需要 ······························································ 169
10. 時過境遷 ····································································· 170
11. 當今世界 ····································································· 170
12. 不同體量的數據····························································· 172
13. 數據與業務的關系 ·························································· 173
14. 將數據納入數據倉庫 ······················································· 173
15. 現代數據倉庫 ······························································· 174
16. 什麽時候我們不再需要數據倉庫? ····································· 175
17. 數據湖 ········································································ 176
18. 以數據倉庫作為基礎 ······················································· 177
19. 數據堆棧 ····································································· 178