深入淺出 R語言數據分析

米霖

  • 出版商: 清華大學
  • 出版日期: 2020-09-01
  • 定價: $414
  • 售價: 8.5$352
  • 語言: 簡體中文
  • ISBN: 7302543887
  • ISBN-13: 9787302543886
  • 相關分類: R 語言Data Science
  • 立即出貨 (庫存 < 3)

  • 深入淺出 R語言數據分析-preview-1
  • 深入淺出 R語言數據分析-preview-2
  • 深入淺出 R語言數據分析-preview-3
深入淺出 R語言數據分析-preview-1

買這商品的人也買了...

相關主題

商品描述

本書首先介紹數據分析的方法論,然後介紹數據分析的相關模型方法,並進一步通過數據分析案例,講解數據分析的思維、方法及模型實現過程。本書重點介紹R語言在數據分析方面的應用,讓讀者能夠快速地使用R語言進行數據分析、構建模型。 本書分為17章,內容包括:使用R語言獲取數據、數據分析中的數據處理與數據探索、生存分析、主成分分析、多維縮放、線性回歸模型、邏輯回歸模型、聚類模型、關聯規則、隨機森林、支持向量機、神經網絡、文本挖掘、社交網絡分析,以及關於R語言數據分析的兩個延伸內容:H2O機器學習和R語言爬蟲。 本書內容通俗易懂,案例豐富,實用性強,特別適合R語言的入門讀者和進階讀者閱讀,也適合數據分析人員、數據挖掘人員等其他數據科學從業者。另外,本書也適用於統計學、電腦、機器學習、數學等相關專業的本科生、研究生使用。

目錄大綱

目 錄

 

第1章 數據分析項目的流程

 

1.1 數據分析項目中的角色························1

 

1.2 數據分析項目的階段····························2

 

1.2.1 制定目標·····················································3

 

1.2.2 收集數據·····················································3

 

1.2.3 數據處理和分析·········································4

 

1.2.4 構建模型·····················································7

 

1.2.5 評估模型·····················································8

 

1.2.6 展示結果·····················································9

 

1.2.7 部署與維護模型·······································10

 

1.3 總結······················································10

 

第2章 數據的讀取

 

2.1 RData數據 ··········································11

 

2.2 readr高效讀取數據 ····························13

 

2.3 讀取Excel數據 ··································16

 

2.4 讀取SPSS、SAS、STATA數據 ·······17

 

2.5 R語言操作數據庫 ······························19

 

2.6 總結······················································23

 

第3章 數 據 探 索

 

3.1 缺失值的識別與處理··························24

 

3.1.1 缺失值的識別與描述性統計···················25

 

3.1.2 缺失值的可視化展示·······························26

 

3.1.3 缺失值的處理方法···································28

 

3.2 異常值··················································33

 

3.3 dlookr數據處理包 ······························38

 

3.3.1 所有變量的一般性診斷···························38

 

3.3.2 數值型變量的診斷···································39

 

3.3.3 分類變量的診斷·······································39

 

3.3.4 異常值的診斷···········································40

 

3.3.5 創建診斷報告···········································41

 

3.3.6 數據處理···················································42

 

3.3.7 缺失值處理···············································43

 

3.3.8 異常值處理···············································44

 

 

 

3.3.9 數據轉換···················································46 

3.3.10 數據分箱·················································49 

3.3.11 創建數據轉換報告·································52 

3.4 數據相關性··········································53 

3.5 自動化創建數據探索報告··················57 

3.6 總結······················································60 

第4 章生存分析

4.1 生存分析的基本內容··························61 

4.2 使用R 語言進行生存分析·················64 

4.3 非參數模型··········································66 

4.3.1 使用Kaplan-Meier 方法擬合數據 ··········66 

4.3.2 Kaplan-Meier 方法的可視化 ···················68 

4.4 半參數模型生存分析方法··················70 

4.4.1 構建Cox 模型··········································70 

4.4.2 檢查假設···················································71 

4.4.3 Coxph 模型可視化···································73 

4.4.4 預測···························································74 

4.4.5 分層···························································75 

4.5 參數模型··············································77 

4.6 隨機生存森林模型······························80 

4.7 總結······················································82 

第5 章主成分分析

5.1 概述······················································83 

5.1.1 維度相關的問題·······································83 

5.1.2 檢測多重共線性·······································84 

5.1.3 方差膨脹因子···········································84 

5.2 主成分分析詳解··································85 

5.2.1 主成分分析的定義···································85 

5.2.2 主成分分析的簡單原理···························86 

5.2.3 主成分分析的算法···································87 

5.3 使用R 語言進行主成分分析·············88 

5.3.1 主成分分析的實現···································89 

5.3.2 主成分分析案例·······································91 

5.4 總結······················································96 

第6 章多維縮放

6.1 MDS 的工作原理································97 

6.3 MDS 的優點······································105 

6.2 在R 語言中實現MDS·······················98 

6.4 總結····················································106 

 

第7 章線性回歸模型

7.1 線性回歸模型概述····························107 

7.2 在R 語言中實現回歸模型···············108 

7.2.1 圖形分析·················································109 

7.2.2 建立線性模型·········································114 

7.2.3 回歸模型的圖形診斷·····························119 

7.2.4 預測模型·················································122 

7.2.5 抽樣方法·················································124 

7.3 總結····················································126 

第8 章邏輯回歸模型

8.1 邏輯回歸的原理································127 

8.2 在R 語言中實現邏輯回歸模型·······128 

8.2.1 數據探索·················································129 

8.2.2 構建邏輯回歸模型·································131 

8.2.3 邏輯回歸預測·········································133 

8.2.4 邏輯回歸模型評估·································133 

8.3 總結····················································136 

第9 章聚類模型

9.1 概述····················································137 

9.1.1 聚類算法·················································137 

9.1.2 K均值聚類的原理·································138 

9.2 在R 語言中實現聚類模型···············139 

9.2.1 K均值聚類·············································140 

9.2.2 層次聚類·················································143 

9.2.3 Medoids 聚類(PAM) ·························144 

9.3 總結····················································146 

第10 章關聯規則

10.1 關聯規則概述··································147 

10.2 關聯規則的基本概念······················148 

10.3 在R 語言中實現關聯規劃·············148 

10.3.1 訓練模型···············································151 

10.3.2 模型的評估···········································153 

10.3.3 提升關聯規則的效果···························154 

10.3.4 關聯規則的可視化·······························155 

10.4 總結··················································158 

 

第11 章隨機森林

11.1 隨機森林的基本概念······················159 

11.3 總結··················································167 

11.2 在R 語言中實現隨機森林 ·············161 

第12 章支持向量機

12.1 概述··················································168 

12.3 總結··················································179 

12.2 在R 語言中實現支持向量機·········171 

第13 章神經網絡

13.2.2 評估模型效果·······································187

13.1 概述··················································180 

13.2 在R 語言中實現神經網絡·············182 

13.3 總結··················································192 

13.2.1 構建神經網絡模型·······························185 

第14 章文本挖掘

14.1 概述··················································193 

14.2 text2vec 背景及其基本原理 ···········194 

14.3 DTM 與TFIDF 的原理和實現·······194 

14.3.1 DTM 和TFIDF 的原理························194 

14.3.2 DTM 的實現·········································196 

14.3.3 TFIDF 的實現·······································199 

14.4 情感分析··········································199 

14.5 LDA 主題模型及其實現 ················206 

14.6 構建自動問答系統··························208 

14.7 總結··················································211 

第15 章社交網絡分析

15.1 社交網絡概述··································212 

15.2 igraph 簡介 ······································213 

15.2.1 準備工作···············································214 

15.2.2 圖的指標計算·······································215 

15.3 社交網絡的常見結構······················217 

15.4 社交網絡分析算法······················220 

 

IX 

目錄

15.4.1 Girvan-Newman ···································· 221 

15.4.2 基於傳播標簽的社區檢測··················· 223 

15.4.3 基於貪婪優化模塊的社區檢測··········· 224 

15.4.4 自旋轉玻璃社群··································· 224 

15.5 微博社交群體分析·························· 225 

15.5.1 自旋轉玻璃社群··································· 226 

15.5.2 社群檢測··············································· 228 

15.6 總結·················································· 229 

第16 章 H2O 機器學習

16.1 H2O 機器學習平臺························· 230 

16.2 在R 語言中使用H2O ···················· 231 

16.2.1 H2O 的安裝·········································· 231 

16.2.2 案例應用··············································· 231 

16.2.3 H2O 常用API ······································ 234 

16.2.4 模型的通用參數··································· 235 

16.2.5 參數調整··············································· 235 

16.3 H2O Flow········································· 238 

16.3.1 H2O Flow 的安裝································· 238 

16.3.2 H2O Flow 的基本使用方法················· 239 

16.4 總結·················································· 244 

第17 章 R 語言爬蟲

17.1 快速爬取網頁數據·························· 245 

17.2 rvest 簡介········································· 247 

17.2.1 rvest API················································ 248 

17.2.2 rvest API 詳解······································· 249 

17.3 爬取BOSS 直聘數據······················ 250 

17.4 模擬登錄·········································· 254