Lucene in Action
暫譯: Lucene 實戰

Name: Lucene in Action
Price: 1691 TWD
Availability: Discontinued
Author: Erik Hatcher, Otis Gospodnetic
ISBN: 1932394281

Erik Hatcher, Otis Gospodnetic

出版商: Manning
出版日期: 2004-12-01
售價: $1,780
貴賓價: 9.5 折 $1,691
語言: 英文
頁數: 456
裝訂: Paperback
ISBN: 1932394281
ISBN-13: 9781932394283
相關分類: 全文搜尋引擎 Full-text-search

已過版

買這商品的人也買了...

~~$1,820~~ $1,729

The Java Programming Language, 3/e
~~$650~~ $325

Visual C#.NET 程式設計經典
~~$590~~ $466

ASP.NET 程式設計徹底研究
~~$680~~ $578

C# Primer Plus 中文版 (C# Primer Plus)
~~$560~~ $476

鳥哥的 Linux 私房菜─基礎學習篇增訂版
~~$480~~ $379

人月神話：軟體專案管理之道 (20 週年紀念版)(The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, 2/e)
~~$750~~ $637

JSP 2.0 技術手冊
~~$780~~ $616

建構嵌入式 Linux 系統
~~$780~~ $663

CCNA 認證教戰手冊 Exam 640-801 (CCNA Cisco Certified Network Associate Study Guide, 4/e)
~~$490~~ $382

最新 JavaScript 完整語法參考辭典第三版
~~$780~~ $663

Linux 程式設計教學手冊
~~$650~~ $552

Linux 指令詳解辭典
~~$650~~ $507

ASP.NET 徹底研究進階技巧─高階技巧與控制項實作
~~$680~~ $578

Oracle 10g 資料庫管理實務
~~$490~~ $417

駭客防護實戰系列─木馬防護全攻略
~~$620~~ $527

深度探索 JavaServer Faces 核心
~~$590~~ $460

Struts 333個應用範例技巧大全集
~~$580~~ $452

Sniffer Pro 網路最佳化與故障排除手冊
~~$620~~ $527

Linux iptables 技術實務─防火牆、頻寬管理、連線管制
~~$880~~ $695

深入淺出 Java 程式設計, 2/e (Head First Java, 2/e)
~~$540~~ $459

C++ Primer Plus, 5/e 中文精華版
~~$550~~ $467

Flash 8 躍動的網頁中文版
~~$650~~ $507

ASP.NET 2.0 深度剖析範例集
~~$1,100~~ $1,078

Advanced Engineering Mathematics, 9/e(Abridged International Student Edition)
~~$299~~ $254

Windows Vista 非常 Easy

商品描述

Descriptions:

Lucene is a gem in the open-source world--a highly scalable, fast search engine. It delivers performance and is disarmingly easy to use. Lucene in Action is the authoritative guide to Lucene. It describes how to index your data, including types you definitely need to know such as MS Word, PDF, HTML, and XML. It introduces you to searching, sorting, filtering, and highlighting search results.

Lucene powers search in surprising places--in discussion groups at Fortune 100 companies, in commercial issue trackers, in email search from Microsoft, in the Nutch web search engine (that scales to billions of pages). It is used by diverse companies including Akamai, Overture, Technorati, HotJobs, Epiphany, FedEx, Mayo Clinic, MIT, New Scientist Magazine, and many others.

Adding search to your application can be easy. With many reusable examples and good advice on best practices, Lucene in Action shows you how. And if you would like to search through Lucene in Action over the Web, you can do so using Lucene itself as the search engine--take a look at the authors' awesome Search Inside solution. Its results page resembles Google's and provides a novel yet familiar interface to the entire book and book blog.

Table of Contents:

foreword xvii
preface xix
acknowledgments xxii
about this book xxv

Part 1 Core Lucene 1

1 Meet Lucene 3
1.1 Evolution of information organization and access 4
1.2 Understanding Lucene 6

What Lucene is 7

What Lucene can do for you 7

History of Lucene 9

Who uses Lucene 10

Lucene ports: Perl, Python, C++, .NET, Ruby 10

1.3 Indexing and searching 10

What is indexing, and why is it important? 10

What is searching? 11

1.4 Lucene in action: a sample application 11

Creating an index 12

Searching an index 15

1.5 Understanding the core indexing classes 18

IndexWriter 19

Directory 19

Analyzer 19

Document 20

Field 20

1.6 Understanding the core searching classes 22

IndexSearcher 23

Term 23

Query 23

TermQuery 24

Hits 24

1.7 Review of alternate search products 24

IR libraries 24

Indexing and searching applications 26

Online resources 27

1.8 Summary 27

2 Indexing 28
2.1 Understanding the indexing process 29

Conversion to text 29

Analysis 30

Index writing 31

2.2 Basic index operations 31

Adding documents to an index 31

Removing Documents from an index 33

Undeleting Documents 36

Updating Documents in an index 36

2.3 Boosting Documents and Fields 38
2.4 Indexing dates 39
2.5 Indexing numbers 40
2.6 Indexing Fields used for sorting 41
2.7 Controlling the indexing process 42

Tuning indexing performance 42

In-memory indexing: RAMDirectory 48

Limiting Field sizes: maxFieldLength 54

2.8 Optimizing an index 56
2.9 Concurrency, thread-safety, and locking issues 59

Concurrency rules 59

Thread-safety 60

Index locking 62

Disabling index locking 66

2.10 Debugging indexing 66
2.11 Summary 67

3 Adding search to your application 68
3.1 Implementing a simple search feature 69

Searching for a specific term 70

Parsing a user-entered query expression: QueryParser 72

3.2 Using IndexSearcher 75

Working with Hits 76

Paging through Hits 77

Reading indexes into memory 77

3.3 Understanding Lucene scoring 78

Lucene, you got a lot of ‘splainin’ to do! 80

3.4 Creating queries programmatically 81

Searching by term: TermQuery 82

Searching within a range: RangeQuery 83

Searching on a string: PrefixQuery 84

Combining queries: BooleanQuery 85

Searching by phrase: PhraseQuery 87

Searching by wildcard: WildcardQuery 90

Searching for similar terms: FuzzyQuery 92

3.5 Parsing query expressions: QueryParser 93

Query.toString 94

Boolean operators 94

Grouping 95

Field selection 95

Range searches 96

Phrase queries 98

Wildcard and prefix queries 99

Fuzzy queries 99

Boosting queries 99

To QueryParse or not to QueryParse? 100

3.6 Summary 100

4 Analysis 102
4.1 Using analyzers 104

Indexing analysis 105

QueryParser analysis 106

Parsing versus analysis: when an analyzer isn’t appropriate 107

4.2 Analyzing the analyzer 107

What’s in a token? 108

TokenStreams uncensored 109

Visualizing analyzers 112

Filtering order can be important 116

4.3 Using the built-in analyzers 119

StopAnalyzer 119

StandardAnalyzer 120

4.4 Dealing with keyword fields 121

Alternate keyword analyzer 125

4.5 “Sounds like” querying 125
4.6 Synonyms, aliases, and words that mean the same 128

Visualizing token positions 134

4.7 Stemming analysis 136

Leaving holes 136

Putting it together 137

Hole lot of trouble 138

4.8 Language analysis issues 140

Unicode and encodings 140

Analyzing non-English languages 141

Analyzing Asian languages 142

Zaijian 145

4.9 Nutch analysis 145
4.10 Summary 147

5 Advanced search techniques 149
5.1 Sorting search results 150

Using a sort 150

Sorting by relevance 152

Sorting by index order 153

Sorting by a field 154

Reversing sort order 154

Sorting by multiple fields 155

Selecting a sorting field type 156

Using a nondefault locale for sorting 157

Performance effect of sorting 157

5.2 Using PhrasePrefixQuery 157
5.3 Querying on multiple fields at once 159
5.4 Span queries: Lucene’s new hidden gem 161

Building block of spanning, SpanTermQuery 163

Finding spans at the beginning of a field 165

Spans near one another 166

Excluding span overlap from matches 168

Spanning the globe 169

SpanQuery and QueryParser 170

5.5 Filtering a search 171

Using DateFilter 171

Using QueryFilter 173

Security filters 174

A QueryFilter alternative 176

Caching filter results 177

Beyond the built-in filters 177

5.6 Searching across multiple Lucene indexes 178

Using MultiSearcher 178

Multithreaded searching using ParallelMultiSearcher 180

5.7 Leveraging term vectors 185

Books like this 186

What category? 189

5.8 Summary 193

6 Extending search 194
6.1 Using a custom sort method 195

Accessing values used in custom sorting 200

6.2 Developing a custom HitCollector 201

About BookLinkCollector 202

Using BookLinkCollector 202

6.3 Extending QueryParser 203

Customizing QueryParser’s behavior 203

Prohibiting fuzzy and wildcard queries 204

Handling numeric field-range queries 205

Allowing ordered phrase queries 208

6.4 Using a custom filter 209

Using a filtered query 212

6.5 Performance testing 213

Testing the speed of a search 213

Load testing 217

QueryParser again! 218

Morals of performance testing 220

6.6 Summary 220

Part 2 Applied Lucene 221

7 Parsing common document formats 223
7.1 Handling rich-text documents 224

Creating a common DocumentHandler interface 225

7.2 Indexing XML 226

Parsing and indexing using SAX 227

Parsing and indexing using Digester 230

7.3 Indexing a PDF document 235

Extracting text and indexing using PDFBox 236

Built-in Lucene support 239

7.4 Indexing an HTML document 241

Getting the HTML source data 242

Using JTidy 242

Using NekoHTML 245

7.5 Indexing a Microsoft Word document 248

Using POI 249

Using TextMining.org’s API 250

7.6 Indexing an RTF document 252
7.7 Indexing a plain-text document 253
7.8 Creating a document-handling framework 254

FileHandler interface 255

ExtensionFileHandler 257

FileIndexer application 260

Using FileIndexer 262

FileIndexer drawbacks, and how to extend the framework 263

7.9 Other text-extraction tools 264

Document-management systems and services 264

7.10 Summary 265

8 Tools and extensions 267
8.1 Playing in Lucene’s Sandbox 268
8.2 Interacting with an index 269

lucli: a command-line interface 269

Luke: the Lucene Index Toolbox 271

LIMO: Lucene Index Monitor 279

8.3 Analyzers, tokenizers, and TokenFilters, oh my 282

SnowballAnalyzer 283

Obtaining the Sandbox analyzers 284

8.4 Java Development with Ant and Lucene 284

Using the <index> task 285

Creating a custom document handler 286

Installation 290

8.5 JavaScript browser utilities 290

JavaScript query construction and validation 291

Escaping special characters 292

Using JavaScript support 292

8.6 Synonyms from WordNet 292

Building the synonym index 294

Tying WordNet synonyms into an analyzer 296

Calling on Lucene 297

8.7 Highlighting query terms 300

Highlighting with CSS 301

Highlighting Hits 303

8.8 Chaining filters 304
8.9 Storing an index in Berkeley DB 307

Coding to DbDirectory 308

Installing DbDirectory 309

8.10 Building the Sandbox 309

Check it out 310

Ant in the Sandbox 310

8.11 Summary 311

9 Lucene ports 312
9.1 Ports’ relation to Lucene 313
9.2 CLucene 314

Supported platforms 314

API compatibility 314

Unicode support 316

Performance 317

Users 317

9.3 dotLucene 317

API compatibility 317

Index compatibility 318

Performance 318

Users 318

9.4 Plucene 318

API compatibility 319

Index compatibility 320

Performance 320

Users 320

9.5 Lupy 320

API compatibility 320

Index compatibility 322

Performance 322

Users 322

9.6 PyLucene 322

API compatibility 323

Index compatibility 323

Performance 323

Users 323

9.7 Summary 324

10 Case studies 325
10.1 Nutch: “The NPR of search engines” 326

More in depth 327

Other Nutch features 328

10.2 Using Lucene at jGuru 329

Topic lexicons and document categorization 330

Search database structure 331

Index fields 332

Indexing and content preparation 333

Queries 335

JGuruMultiSearcher 339

Miscellaneous 340

10.3 Using Lucene in SearchBlox 341

Why choose Lucene? 341

SearchBlox architecture 342

Search results 343

Language support 343

Reporting Engine 344

Summary 344

10.4 Competitive intelligence with Lucene in XtraMind’s XM-InformationMinder? 344

The system architecture 347

How Lucene has helped us 350

10.5 Alias-i: orthographic variation with Lucene 351

Alias-i application architecture 352

Orthographic variation 354

The noisy channel model of spelling correction 355

The vector comparison model of spelling variation 356

A subword Lucene analyzer 357

Accuracy, efficiency, and other applications 360

Mixing in context 360

References 361

10.6 Artful searching at Michaels.com 361

Indexing content 362

Searching content 367

Search statistics 370

Summary 371

10.7 I love Lucene: TheServerSide 371

Building better search capability 371

High-level infrastructure 373

Building the index 374

Searching the index 377

Configuration: one place to rule them all 379

Web tier: TheSeeeeeeeeeeeerverSide? 383

Summary 385

10.8 Conclusion 385

appendix A Installing Lucene 387
appendix B Lucene index format 393
appendix C Resources 408
index 415

商品描述(中文翻譯)

描述：
Lucene 是開源世界中的一顆明珠——一個高度可擴展且快速的搜尋引擎。它提供卓越的性能，並且使用起來非常簡單。《Lucene in Action》是 Lucene 的權威指南。它描述了如何對您的數據進行索引，包括您必須了解的類型，如 MS Word、PDF、HTML 和 XML。它還介紹了搜尋、排序、過濾和高亮顯示搜尋結果。

Lucene 在意想不到的地方提供搜尋功能——在《財富》100 強公司的討論組中，在商業問題追蹤器中，在 Microsoft 的電子郵件搜尋中，在 Nutch 網頁搜尋引擎中（可擴展至數十億頁面）。它被包括 Akamai、Overture、Technorati、HotJobs、Epiphany、FedEx、Mayo Clinic、MIT、新科學家雜誌等多家不同公司使用。

將搜尋功能添加到您的應用程式中可以很簡單。《Lucene in Action》提供了許多可重用的範例和最佳實踐的良好建議，向您展示了如何實現。如果您想在網路上搜尋《Lucene in Action》，您可以使用 Lucene 本身作為搜尋引擎——請查看作者的精彩搜尋解決方案。其結果頁面類似於 Google，並提供了一個新穎而又熟悉的界面來瀏覽整本書及其書籍部落格。

目錄：
前言 xvii
序言 xix
致謝 xxii
關於本書 xxv

第一部分核心 Lucene 1
1 認識 Lucene 3
1.1 資訊組織與存取的演變 4
1.2 理解 Lucene 6
Lucene 是什麼 7
Lucene 能為您做什麼 7
Lucene 的歷史 9
誰在使用 Lucene 10
Lucene 的移植：Perl、Python、C++、.NET、Ruby 10
1.3 索引與搜尋 10
什麼是索引，為什麼它很重要？ 10
什麼是搜尋？ 11
1.4 Lucene 實作：範例應用程式 11
創建索引 12
搜尋索引 15
1.5 理解核心索引類別 18
IndexWriter 19
Directory 19
Analyzer 19
Document 20
Field 20
1.6 理解核心搜尋類別 22
IndexSearcher 23
Term 23
Query 23
TermQuery 24
Hits 24
1.7 替代搜尋產品的回顧 24
IR 函式庫 24
索引與搜尋應用程式 26
在線資源 27
1.8 總結 27

2 索引 28
2.1 理解索引過程 29
轉換為文本 29
分析 30
索引寫入 31
2.2 基本索引操作 31
將文檔添加到索引 31
從索引中刪除文檔 33
取消刪除文檔 36
更新索引中的文檔 36
2.3 提升文檔和字段 38
2.4 索引日期 39
2.5 索引數字 40
2.6 用於排序的字段索引 41
2.7 控制索引過程 42
調整索引性能 42
內存索引：RAMDirectory 48
限制字段大小：maxFieldLength 54
2.8 優化索引 56
2.9 並發性、線程安全和鎖定問題 59
並發性規則 59
線程安全 60
索引鎖定 62
禁用索引鎖定 66
2.10 調試索引 66
2.11 總結 67

3 將搜尋添加到您的應用程式 68
3.1 實現簡單的搜尋功能 69
搜尋特定術語 70
解析用戶輸入的查詢表達式：QueryParser 72
3.2 使用 IndexSearcher 75
處理 Hits 76
分頁 Hits 77
將索引讀入內存 77
3.3 理解 Lucene 的計分 78
Lucene，您有很多解釋要做！ 80
3.4 程式化創建查詢 81
按術語搜尋：TermQuery 82
在範圍內搜尋：RangeQuery 83
在字符串上搜尋：PrefixQuery 84
組合查詢：BooleanQuery 85
按短語搜尋：PhraseQuery 87
按通配符搜尋：WildcardQuery 90
搜尋相似術語：FuzzyQuery 92
3.5 解析查詢表達式：QueryParser 93
Query.toString 94
布林運算符 94
分組 95
字段選擇 95
範圍搜尋 96
短語查詢 98
通配符和前綴查詢 99
模糊查詢 99
提升查詢 99
要查詢解析還是不要查詢解析？ 100
3.6 總結 100

4 分析 102
4.1 使用分析器 104
索引分析 105
QueryParser 分析 106
解析與分析：何時不適合使用分析器 107
4.2 分析分析器 107
令牌中包含什麼？ 108
TokenStreams 不受審查 109
可視化分析器 112
過濾順序可能很重要 116
4.3 使用內建分析器 119
StopAnalyzer 119
StandardAnalyzer 120
4.4 處理關鍵字字段 121
替代關鍵字分析器 125
4.5 “聽起來像”查詢 125
4.6 同義詞、別名和意義相同的詞 128
可視化令牌位置 134
4.7 詞幹分析 136
留下空洞 136
整合 137
大量麻煩 138
4.8 語言分析問題 140
Unicode 和編碼 140