The Unsupervised Learning Workshop: Get started with unsupervised learning algorithms and simplify your unorganized data to help make future predictio
暫譯: 無監督學習工作坊:開始使用無監督學習演算法,簡化您的無序數據以幫助未來預測
Jones, Aaron, Kruger, Christopher, Johnston, Benjamin
- 出版商: Packt Publishing
- 出版日期: 2020-07-28
- 售價: $1,840
- 貴賓價: 9.5 折 $1,748
- 語言: 英文
- 頁數: 550
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1800200706
- ISBN-13: 9781800200708
-
相關分類:
Algorithms-data-structures
海外代購書籍(需單獨結帳)
商品描述
Learning how to apply unsupervised algorithms on unlabeled datasets from scratch can be easier than you thought with this beginner's workshop, featuring interesting examples and activities
Key Features
- Get familiar with the ecosystem of unsupervised algorithms
- Learn interesting methods to simplify large amounts of unorganized data
- Tackle real-world challenges, such as estimating the population density of a geographical area
Book Description
Do you find it difficult to understand how popular companies like WhatsApp and Amazon find valuable insights from large amounts of unorganized data? The Unsupervised Learning Workshop will give you the confidence to deal with cluttered and unlabeled datasets, using unsupervised algorithms in an easy and interactive manner.
The book starts by introducing the most popular clustering algorithms of unsupervised learning. You'll find out how hierarchical clustering differs from k-means, along with understanding how to apply DBSCAN to highly complex and noisy data. Moving ahead, you'll use autoencoders for efficient data encoding.
As you progress, you'll use t-SNE models to extract high-dimensional information into a lower dimension for better visualization, in addition to working with topic modeling for implementing natural language processing (NLP). In later chapters, you'll find key relationships between customers and businesses using Market Basket Analysis, before going on to use Hotspot Analysis for estimating the population density of an area.
By the end of this book, you'll be equipped with the skills you need to apply unsupervised algorithms on cluttered datasets to find useful patterns and insights.
What you will learn
- Distinguish between hierarchical clustering and the k-means algorithm
- Understand the process of finding clusters in data
- Grasp interesting techniques to reduce the size of data
- Use autoencoders to decode data
- Extract text from a large collection of documents using topic modeling
- Create a bag-of-words model using the CountVectorizer
Who this book is for
If you are a data scientist who is just getting started and want to learn how to implement machine learning algorithms to build predictive models, then this book is for you. To expedite the learning process, a solid understanding of the Python programming language is recommended, as you'll be editing classes and functions instead of creating them from scratch.
商品描述(中文翻譯)
從零開始學習如何在未標記數據集上應用無監督算法,這個初學者工作坊將比你想像中更容易,並提供有趣的範例和活動
主要特色
- 熟悉無監督算法的生態系統
- 學習有趣的方法來簡化大量無組織的數據
- 解決現實世界的挑戰,例如估算地理區域的人口密度
書籍描述
你是否發現理解像 WhatsApp 和 Amazon 這樣的知名公司如何從大量無組織的數據中獲取有價值的見解很困難?《無監督學習工作坊》將讓你有信心處理雜亂且未標記的數據集,以簡單且互動的方式使用無監督算法。
本書首先介紹最受歡迎的無監督學習聚類算法。你將了解層次聚類與 k-means 的不同,並理解如何將 DBSCAN 應用於高度複雜和噪聲數據。接下來,你將使用自編碼器進行高效的數據編碼。
隨著進度的推進,你將使用 t-SNE 模型將高維信息提取到較低維度以便更好地可視化,並進行主題建模以實現自然語言處理 (NLP)。在後面的章節中,你將使用市場籃分析來找出客戶與企業之間的關鍵關係,然後使用熱點分析來估算某個區域的人口密度。
到本書結束時,你將具備在雜亂數據集上應用無監督算法以發現有用模式和見解所需的技能。
你將學到什麼
- 區分層次聚類和 k-means 算法
- 理解在數據中尋找聚類的過程
- 掌握有趣的技術來減少數據的大小
- 使用自編碼器解碼數據
- 使用主題建模從大量文檔中提取文本
- 使用 CountVectorizer 創建詞袋模型
本書適合誰
如果你是一位剛入門的數據科學家,想學習如何實現機器學習算法來構建預測模型,那麼這本書適合你。為了加快學習過程,建議對 Python 編程語言有扎實的理解,因為你將編輯類別和函數,而不是從零開始創建它們。
作者簡介
Aaron Jones is a full-time senior data scientist and consultant. He has built models and data products while working in retail, media, and environmental science. Aaron is based in Seattle, Washington and has a particular interest in clustering algorithms, natural language processing, and Bayesian statistics.
Christopher Kruger is a practicing data scientist and AI researcher. He has managed applied machine learning projects across multiple industries while mentoring junior team members on best practices. His primary focus is on pushing both business practicality as well as academic rigor in every project. Chris is currently developing research in the computer vision space.
Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
作者簡介(中文翻譯)
Aaron Jones 是一位全職的資深數據科學家和顧問。他在零售、媒體和環境科學領域工作時,建立了模型和數據產品。Aaron 目前居住在華盛頓州的西雅圖,對聚類算法、自然語言處理和貝葉斯統計特別感興趣。
Christopher Kruger 是一位實踐中的數據科學家和人工智慧研究員。他在多個行業管理應用機器學習項目,同時指導初級團隊成員最佳實踐。他的主要重點是在每個項目中推動商業實用性和學術嚴謹性。Chris 目前正在計算機視覺領域開展研究。
Benjamin Johnston 是全球領先的數據驅動醫療科技公司之一的資深數據科學家,參與整個產品開發過程中創新數字解決方案的開發,從問題定義到解決方案的研究與開發,再到最終部署。他目前正在完成機器學習的博士學位,專攻圖像處理和深度卷積神經網絡。他在醫療設備設計和開發方面擁有超過10年的經驗,曾擔任多種技術角色,並持有澳大利亞悉尼大學的工程學和醫學科學的一級榮譽學士學位。
目錄大綱
- Introduction to Clustering
- Hierarchical Clustering
- Neighborhood Approaches and DBSCAN
- Dimensionality Reduction Techniques and PCA
- Autoencoders
- t-Distributed Stochastic Neighbor Embedding
- Topic Modeling
- Market Basket Analysis
- Hotspot Analysis
目錄大綱(中文翻譯)
- Introduction to Clustering
- Hierarchical Clustering
- Neighborhood Approaches and DBSCAN
- Dimensionality Reduction Techniques and PCA
- Autoencoders
- t-Distributed Stochastic Neighbor Embedding
- Topic Modeling
- Market Basket Analysis
- Hotspot Analysis