Big Data Integration (Synthesis Lectures on Data Management)
暫譯: 大數據整合(數據管理綜合講座)
Xin Luna Dong, Divesh Srivastava
- 出版商: Morgan & Claypool
- 出版日期: 2014-01-01
- 售價: $2,410
- 貴賓價: 9.5 折 $2,290
- 語言: 英文
- 頁數: 178
- 裝訂: Paperback
- ISBN: 1627052232
- ISBN-13: 9781627052238
-
相關分類:
大數據 Big-data
海外代購書籍(需單獨結帳)
相關主題
商品描述
The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources are very dynamic, and the number of data sources is also rapidly exploding. Third, data sources are extremely heterogeneous in their structure and content, exhibiting considerable variety even for substantially similar entities. Fourth, the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This book explores the progress that has been made by the data integration community on the topics of schema alignment, record linkage and data fusion in addressing these novel challenges faced by big data integration. Each of these topics is covered in a systematic way: first starting with a quick tour of the topic in the context of traditional data integration, followed by a detailed, example-driven exposition of recent innovative techniques that have been proposed to address the BDI challenges of volume, velocity, variety, and veracity. Finally, it presents merging topics and opportunities that are specific to BDI, identifying promising directions for the data integration community.
商品描述(中文翻譯)
大數據時代已經來臨:數據以空前的規模被生成、分析和使用,數據驅動的決策正在社會的各個方面迅速蔓延。由於當數據能夠與其他數據連結和融合時,其價值會爆炸性增長,因此解決大數據整合(BDI)挑戰對於實現大數據的潛力至關重要。BDI在數據的體量、速度、多樣性和真實性等維度上與傳統數據整合有所不同。首先,數據來源不僅可以包含大量數據,且數據來源的數量現在已達到數百萬。其次,由於新收集的數據可用的速度,許多數據來源非常動態,數據來源的數量也在迅速增加。第三,數據來源在結構和內容上極為異質,即使是實質上相似的實體也表現出相當的多樣性。第四,數據來源的質量差異很大,提供的數據在覆蓋範圍、準確性和及時性上存在顯著差異。本書探討了數據整合社群在解決大數據整合所面臨的新挑戰方面,在模式對齊、記錄連結和數據融合等主題上所取得的進展。每個主題都以系統化的方式進行介紹:首先從傳統數據整合的背景下快速概述該主題,接著詳細介紹針對BDI在體量、速度、多樣性和真實性挑戰上所提出的近期創新技術,並以實例為驅動。最後,書中還介紹了特定於BDI的合併主題和機會,並為數據整合社群識別出有前景的方向。