Apache Solr: A Practical Approach to Enterprise Search

Dikshant Shahi

  • 出版商: Apress
  • 出版日期: 2015-12-19
  • 售價: $2,010
  • 貴賓價: 9.5$1,910
  • 語言: 英文
  • 頁數: 328
  • 裝訂: Paperback
  • ISBN: 1484210719
  • ISBN-13: 9781484210710
  • 相關分類: 全文搜尋引擎 Full-text-search
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Build an enterprise search engine using Apache Solr: index and search documents; ingest data from varied sources; apply various text processing techniques; utilize different search capabilities; and customize Solr to retrieve the desired results. Apache Solr: A Practical Approach to Enterprise Search explains each essential concept-backed by practical and industry examples--to help you attain expert-level knowledge.

The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then introduces you to information retrieval and its implementation in Apache Solr; this will help you understand your search problem, decide the approach to build an effective solution, and use various metrics to evaluate the results.

The book next covers the schema design and techniques to build a text analysis chain for cleansing, normalizing and enriching your documents and addressing different types of search queries. It describes various popular matching techniques which are generally applied to improve the precision and recall of searches.

You will learn the end-to-end process of data ingestion from varied sources, metadata extraction, pre-processing and transformation of content, various search components, query parsers and other advanced search capabilities.

After covering out-of-the-box features, Solr expert Dikshant Shahi dives into ways you can customize Solr for your business and its specific requirements, along with ways to plug in your own components. Most important, you will learn about implementations for Solr scoring, factors affecting the document score, and tuning the score for the application at hand. The book explains why textual scoring is not sufficient for practical ranking of documents and ways to integrate real-world factors for contributing to the document ranking.

You'll see how to influence user experience by providing suggestions and recommendations. You'll also see integration of Solr with important related technologies such as OpenNLP and Tika. Additionally, you will learn about scaling Solr using SolrCloud.

This book concludes with coverage of semantic search capabilities, which is crucial for taking the search experience to the next level. By the end of Apache Solr, you will be proficient in designing and developing your search engine. 

商品描述(中文翻譯)

使用Apache Solr建立企業級搜索引擎:索引和搜索文件;從不同來源提取數據;應用各種文本處理技術;利用不同的搜索功能;並自定義Solr以獲取所需的結果。《Apache Solr:企業搜索的實用方法》通過實際和行業示例解釋了每個基本概念,以幫助您獲得專家級知識。

本書假設讀者具備基本的Java知識,從Solr的介紹開始,然後介紹設置Solr、索引第一批文件和搜索它們的步驟。接下來介紹信息檢索及其在Apache Solr中的實現,這將幫助您了解搜索問題,決定構建有效解決方案的方法,並使用各種指標評估結果。

本書還涵蓋了模式設計和構建文本分析鏈的技術,用於清理、規範化和豐富文檔,並解決不同類型的搜索查詢。它描述了各種常用的匹配技術,通常應用於提高搜索的精確度和召回率。

您將學習從不同來源提取數據的端到端過程,元數據提取,內容的預處理和轉換,各種搜索組件,查詢解析器和其他高級搜索功能。

在介紹了開箱即用的功能之後,Solr專家Dikshant Shahi深入探討了如何根據您的業務和特定需求自定義Solr,以及如何插入自己的組件。最重要的是,您將了解Solr評分的實現方式,影響文檔評分的因素,以及調整應用程序的評分。本書解釋了為什麼僅僅使用文本評分不足以實現實際的文檔排序,以及整合現實因素以貢獻於文檔排序的方法。

您將了解如何通過提供建議和推薦來影響用戶體驗。您還將了解將Solr與重要的相關技術(如OpenNLP和Tika)集成。此外,您還將學習使用SolrCloud進行Solr的擴展。

本書最後介紹了語義搜索功能,這對於提升搜索體驗至關重要。通過《Apache Solr》的學習,您將能夠熟練設計和開發自己的搜索引擎。