Natural Language Processing Projects: Build Next-Generation NLP Applications Using AI Techniques
Kulkarni, Akshay, Shivananda, Adarsha, Kulkarni, Anoosh
- 出版商: Apress
- 出版日期: 2021-12-04
- 定價: $1,800
- 售價: 9.5 折 $1,710
- 貴賓價: 9.0 折 $1,620
- 語言: 英文
- 頁數: 336
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1484273850
- ISBN-13: 9781484273852
-
相關分類:
人工智慧、Text-mining
立即出貨 (庫存=1)
相關主題
商品描述
Chapter 2: Product360 - Sentiment, Emotion & Trend Capturing SystemChapter Goal: Sentiment analysis involves finding the polarity of a sentence and labels it as positive, negative or neutral. Emotion detection involves identifying emotions(sad, anger, happy, etc) from the sentences. Data is extracted from social media like Twitter, Facebook etc. and Ecommerce website, processed and analyzed using different NLP techniques will provide a 360 degree view of that product which enables better decision making. This chapter introduces sentiment analysis to the reader and the various techniques that can be used to analyze text. We will apply sentiment, emotion and trend analysis on reviews data for any E-commerce website like Amazon, Zomato, and IMDb, etc. which contains millions of customer reviews and star ratings. For this task, we will use Python libraries such as Vader, Textblob, etc. No of pages: 30Sub - Topics 1. Text mining and various available libraries. 2. Data preprocessing.3. Data cleaning tricks, optimized feature engineering4. EDA5. Sentiment analysis6. Emotion and trend analysis
Chapter 3: TED Talks Segmentation & Topics Extraction Using Machine LearningChapter Goal: Document clustering is an unsupervised learning process for grouping documents. For example, there are number of e-books and they have to be grouped to build a structure around them saves time while finding the books. Articles grouping, product clustering are the other few examples. Once we identify the clusters, it is important to understand the properties of clusters. So, Topic modeling is performed to extract topics from a set of documents and articles to understand the content of the documents using keywords and be able to tag the articles or documents using those topics. In this chapter will see how to group TED talks based on description using various clustering techniques like K-Means and Hierarchical clustering. Then we will perform topic modeling using Latent Dirichlet Allocation (LDA) to understand what defines each cluster. Important libraries include Gensim, NLTK, Scikit-learn and word2vec for this problem. We will use over 100k articles from different American publications. No of pages: 30Sub - Topics 1. Data understanding and pre-processing2. Computing TF-IDF 3. K-Means and hierarchical clustering4. Evaluation and visualization5. Topic modeling using Latent Dirichlet Allocation
Chapter 4: Enhancing E-commerce Through Advanced Search Engine and Recommendation SystemChapter Goal: An information retrieval system will search product descriptions based on a search query text and gives the results. Search engines are the most common and best use case of information retrieval models. The concept of information retrieval started from a string or word comp
商品描述(中文翻譯)
第一章:自然語言處理和人工智慧概述
章節目標:這是一個入門章節。本章提供了本書將涵蓋的主題的快速複習。由於本書教授圍繞特定技術領域的專案,我們將對這些專案所需的關鍵概念進行簡要介紹。我們不會專注於特定專案,而是討論一些重要概念而不深入細節。這些主題的深度將在具體章節中進行探討。
頁數:25
子主題:
1. 人工智慧範式
2. NLP和AI生命週期
3. NLP概念(TF-IDF、詞嵌入等等)
4. 機器學習概念(監督學習、分類、非監督學習)
5. 深度學習概念(CNN、RNN、LSTM)
第二章:Product360 - 情感、情緒和趨勢捕捉系統
章節目標:情感分析涉及找出句子的極性並將其標記為正面、負面或中性。情緒檢測涉及從句子中識別情緒(悲傷、憤怒、快樂等)。數據從社交媒體(如Twitter、Facebook等)和電子商務網站中提取,使用不同的NLP技術進行處理和分析,將為該產品提供360度的視角,從而實現更好的決策。本章介紹情感分析給讀者,以及可以用於分析文本的各種技術。我們將對任何電子商務網站(如Amazon、Zomato和IMDb等)的評論數據應用情感、情緒和趨勢分析,該數據包含數百萬條客戶評論和星級評分。為此任務,我們將使用Python庫,如Vader、Textblob等。
頁數:30
子主題:
1. 文本挖掘和各種可用庫
2. 數據預處理
3. 數據清理技巧,優化特徵工程
4. EDA(探索性數據分析)
5. 情感分析
6. 情緒和趨勢分析
第三章:使用機器學習進行TED演講分割和主題提取
章節目標:文檔聚類是一種無監督學習過程,用於將文檔分組。例如,有許多電子書,必須將它們分組以建立一個結構,這樣在查找書籍時可以節省時間。文章分組、產品分組是其他幾個例子。一旦我們識別出這些群集,了解群集的特性就變得很重要。因此,通過主題建模從一組文檔和文章中提取主題,使用關鍵詞來理解文檔的內容並能夠使用這些主題標記文章或文檔。在本章中,我們將看到如何使用各種聚類技術(如K-Means和階層聚類)基於描述將TED演講分組。然後,我們將使用潛在狄利克雷分配(LDA)進行主題建模,以了解每個群集的定義。重要的庫包括Gensim、NLTK、Scikit-learn和word2vec。我們將使用來自不同美國出版物的超過10萬篇文章。
頁數:30
子主題:
1. 數據理解和預處理
2. 計算TF-IDF
3. K-Means和階層聚類
4. 評估和可視化
5. 使用潛在狄利克雷分配進行主題建模
第四章:通過高級搜索引擎和推薦系統增強電子商務
章節目標:信息檢索系統將根據搜索查詢文本搜索產品描述並提供結果。搜索引擎是信息檢索模型最常見且最佳的用例。信息檢索的概念始於字符串或單詞的比對。