R for Data Science Cookbook

YuWei, Chiu (David Chiu)

相關主題

商品描述

Key Features

  • Gain insight into how data scientists collect, process, analyze, and visualize data using some of the most popular R packages
  • Understand how to apply useful data analysis techniques in R for real-world applications
  • An easy-to-follow guide to make the life of data scientist easier with the problems faced while performing data analysis

Book Description

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

What you will learn

  • Get to know the functional characteristics of R language
  • Extract, transform, and load data from heterogeneous sources
  • Understand how easily R can confront probability and statistics problems
  • Get simple R instructions to quickly organize and manipulate large datasets
  • Create professional data visualizations and interactive reports
  • Predict user purchase behavior by adopting a classification approach
  • Implement data mining techniques to discover items that are frequently purchased together
  • Group similar text documents by using various clustering methods

About the Author

Yu-Wei, Chiu (David Chiu) is the founder of LargitData (www.LargitData.com), a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.

In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. For more information, visit his personal website at www.ywchiu.com.

Table of Contents

  1. Functions in R
  2. Data Extracting, Transforming, and Loading
  3. Data Preprocessing and Preparation
  4. Data Manipulation
  5. Visualizing Data with ggplot2
  6. Making Interactive Reports
  7. Simulation from Probability Distributions
  8. Statistical Inference in R
  9. Rule and Pattern Mining with R
  10. Time Series Mining with R
  11. Supervised Machine Learning
  12. Unsupervised Machine Learning

商品描述(中文翻譯)

主要特點


  • 深入了解數據科學家使用一些最受歡迎的 R 套件收集、處理、分析和可視化數據的方法

  • 了解如何在 R 中應用有用的數據分析技術,應用於實際應用

  • 提供易於遵循的指南,幫助數據科學家在進行數據分析時解決問題

書籍描述

這本食譜書提供了一系列簡單明瞭的 R 代碼數據分析示例,提供了逐步資源和節省時間的方法,幫助您高效地解決數據問題。

第一部分介紹了如何創建 R 函數,以避免不必要的代碼重複。您將學習如何使用 R 套件準備、處理和執行異構數據源的複雜 ETL。提供了數據操作的示例,演示如何使用 "dplyr" 和 "data.table" 套件高效處理較大的數據結構。我們還專注於 "ggplot2",並向您展示如何創建用於數據探索的高級圖形。

此外,您還將學習如何使用 "ggvis" 套件構建交互式報告。後面的章節提供了有關金融數據的時間序列分析的見解,同時詳細介紹了機器學習的熱門話題,包括數據分類、回歸、聚類、關聯規則挖掘和維度降低。

通過閱讀本書,您將了解如何解決問題,並能夠舒適地提供數據分析中遇到的問題的解決方案。

你將學到什麼


  • 了解 R 語言的功能特性

  • 從異構數據源中提取、轉換和加載數據

  • 了解 R 如何輕鬆應對概率和統計問題

  • 獲取簡單的 R 指令,快速組織和操作大型數據集

  • 創建專業的數據可視化和交互式報告

  • 通過採用分類方法預測用戶購買行為

  • 使用數據挖掘技術發現經常一起購買的商品

  • 使用各種聚類方法對相似文本文檔進行分組

關於作者

邱育偉(David Chiu)是 LargitData(www.LargitData.com)的創始人,該初創公司主要專注於提供大數據和機器學習產品。他曾在趨勢科技擔任軟件工程師,負責為商業智能和客戶關係管理系統構建大數據平台。除了是一位初創企業家和數據科學家外,他還專注於使用 Spark 和 Hadoop 處理大數據,並應用數據挖掘技術進行數據分析。邱育偉還是一位專業講師,曾在 R 和 Python 中進行大數據和機器學習的講座,並在各種會議上發表技術演講。

2015年,邱育偉撰寫了《Machine Learning with R Cookbook》,Packt Publishing 出版。2013年,邱育偉擔任《Bioinformatics with R Cookbook》的審查人,Packt Publishing 出版。更多信息,請訪問他的個人網站 www.ywchiu.com。

目錄


  1. R 中的函數

  2. 數據提取、轉換和加載

  3. 數據預處理和準備

  4. 數據操作

  5. 使用 ggplot2 進行數據可視化

  6. 製作交互式報告

  7. 從概率分佈中進行模擬

  8. R 中的統計推斷

  9. 使用 R 進行規則和模式挖掘

  10. 使用 R 進行時間序列挖掘

  11. 監督式機器學習

  12. 非監督式機器學習