Data Mining with Microsoft SQL Server 2000 Technical Reference
Claude Seidman
- 出版商: MicroSoft
- 出版日期: 2001-06-09
- 售價: $2,010
- 貴賓價: 9.5 折 $1,910
- 語言: 英文
- 頁數: 400
- 裝訂: Hardcover
- ISBN: 0735612714
- ISBN-13: 9780735612716
-
相關分類:
MSSQL、SQL、Data-mining
已過版
買這商品的人也買了...
-
$2,500$2,375 -
$1,029Neural Networks: A Comprehensive Foundation, 2/e (精裝)
-
$680$537 -
$2,620$2,489 -
$970Introduction to Algorithms, 2/e
-
$1,150$1,127 -
$880$695 -
$1,274Computer Architecture: A Quantitative Approach, 3/e(精裝本)
-
$490$417 -
$1,860$1,767 -
$1,650$1,568 -
$1,900$1,805 -
$690$587 -
$650$553 -
$590$466 -
$380$342 -
$720$569 -
$1,270$1,207 -
$690$587 -
$280$266 -
$580$493 -
$750$675 -
$560$504 -
$880$695 -
$750$593
相關主題
商品描述
Description:
Learn how to turn mountains of raw data into useful information with this guide.
The amount of information stored in corporate databases is exploding exponentially. Data mining—finding meaningful patterns in all that data—can give any organization a competitive advantage. This book is the in-depth reference from Microsoft® for anyone who wants to take full advantage of the powerful data-mining features in SQL Server™ 2000. It examines the SQL Server 2000 Analysis Services architecture and shows how data mining fits into its complete suite of information-extraction technologies. Then it demonstrates how to structure and mine large databases with the algorithms included with SQL Server 2000 to find nuggets of useful information. It even shows how to create a practice data-mining model using data downloaded from a database. Coverage includes:
• INTRODUCTION TO DATA MINING: What data mining is and isn’t, plus important principles and definitions behind data-mining methodologies, including the role of data-mining models, statistics, and algorithms
• SQL SERVER 2000 ARCHITECTURE: How data mining fits into the SQL Server 2000 Analysis Services architecture and how it builds on the SQL Server 2000 relational database and its embedded online analytical processing (OLAP) engine
• DATA-MINING METHODS: How to choose the best data-mining method for the job—decision trees or clustering
• EASE OF USE FEATURES: How to use the Mining Model Wizard and the OLAP Mining Model Editor to simplify creating, training, and processing a model
• PROGRAMMING THE DATA-MINING SERVICES: How to use data-mining models and Data Transformation Services, PivotTable® Services, decision-support objects (DSO), PERL, Visual Basic®, Scripting Edition, XML, and other tools and languages to work with the data-mining engine
Table of Contents:
Acknowledgments xi | xi |
Introduction xiii | xiii |
PART I INTRODUCING DATA MINING | |
1 Understanding Data Mining | 3 |
What Is Data Mining? | 3 |
Why Use Data Mining? | 4 |
How Data Mining Is Currently Used | 6 |
Defining the Terms | 7 |
Data Mining Methodology | 9 |
Analyzing the Problem | 10 |
Extracting and Cleansing the Data | 10 |
Validating the Data | 10 |
Creating and Training the Model | 10 |
Querying the Data Mining Model Data | 10 |
Maintaining the Validity of the Data-Mining Model | 10 |
Overview of Microsoft Data Mining | 11 |
Data Mining vs. OLAP | 11 |
Data-Mining Models | 11 |
Data-Mining Algorithms | 12 |
Using SQL Server Syntax to Data Mine | 14 |
Summary | 14 |
2 Microsoft SQL Server Analysis Services Architecture | 15 |
Introduction to OLAP | 16 |
MOLAP | 18 |
ROLAP | 18 |
HOLAP | 19 |
Server Architecture | 20 |
Data Mining Services Within Analysis Services | 20 |
Client Architecture | 21 |
PivotTable Service | 22 |
OLE DB | 23 |
Decision Support Objects (DSO) | 24 |
Multidimensional Expressions (MDX) | 25 |
Prediction Joins | 25 |
Summary | 26 |
3 Data Storage Models | 27 |
Why Data Mining Needs a Data Warehouse | 27 |
Maintaining Data Integrity | 28 |
Reporting Against OLTP Data Can Be Hazardous to Your Performance | 31 |
Data Warehousing Architecture for Data Mining | 33 |
Creating the Warehouse from OLTP Data | 33 |
Optimizing Data for Mining | 36 |
Physical Data Mining Structure | 42 |
Three-Tier Architecture | 43 |
Relational Data Warehouse | 43 |
Advantages of Relational Data Storage | 44 |
Building Supporting Tables for Data Mining | 45 |
OLAP cubes | 46 |
How Data Mining Uses OLAP Structures | 46 |
Advantages of OLAP Storage | 47 |
When OLAP Is Not Appropriate for Data Mining | 49 |
Summary | 49 |
4 Approaches to Data Mining | 51 |
Directed Data Mining | 51 |
Undirected Data Mining | 52 |
Data Mining vs. Statistics | 52 |
Learning from Historical Data | 57 |
Predicting the Future | 59 |
Training Data-Mining Models | 61 |
Evaluating the Models and Avoiding Errors | 62 |
Summary | 65 |
PART II DATA-MINING METHODS | |
5 Microsoft Decision Trees | 69 |
Creating the Model | 69 |
Analysis Manager | 70 |
Visualizing the Model | 87 |
Dependency Network Browser | 94 |
Inside the Decision Tree Algorithm | 97 |
How Predictions Are Derived | 109 |
Navigating the Tree | 109 |
Navigation vs. Rules | 112 |
When to Use Decision Trees | 113 |
Summary | 114 |
6 Creating Decision Trees with OLAP | 115 |
Creating the Model | 115 |
Select Source Type | 116 |
Select Source Cube and Data-Mining Technique | 116 |
Select Case | 118 |
Select Predicted Entity | 119 |
Select Training Data | 121 |
Select Dimension and Virtual Cube | 121 |
Completing the Data-Mining Model | 123 |
OLAP Mining Model Editor | 125 |
Content Detail Pane | 126 |
Structure Panel | 126 |
Prediction Tree List | 126 |
Analyzing Data with the OLAP Data-Mining Model | 126 |
Using the Generated Virtual Cube | 128 |
Using the Generated Dimension | 129 |
Summary | 133 |
7 Microsoft Clustering | 135 |
The Search for Order | 136 |
Looking for Ways to Understand Data | 136 |
Clustering as an Undirected Data-Mining Technique | 137 |
How Clustering Works | 138 |
Overview of the Algorithm | 138 |
The K-Means Method Clustering Algorithm | 138 |
What Is Being Measured Exactly? | 142 |
Clustering Factors | 142 |
Measuring "Closeness" | 143 |
When to Use Clustering | 146 |
Visualize Relationships | 146 |
Highlight Anomalies | 146 |
Create Samples for Other Data-Mining Efforts | 148 |
Weaknesses of Clustering | 148 |
Creating a Data-Mining Model Using Clustering | 149 |
Select Source Type | 150 |
Select the Table or Tables for Your Mining Model | 150 |
Select the Data-Mining Technique | 151 |
Edit Joins | 152 |
Select the Case Key Column for Your Mining Model | 152 |
Select the Input and Predictable Columns | 152 |
Viewing the Model | 154 |
Organization of the Cluster Nodes | 154 |
Order of the Cluster Nodes | 156 |
Analyzing the Data | 156 |
Summary | 158 |
PART III CREATING DATA–MINING APPLICATIONS WITH CODE | |
8 Using Microsoft Data Transformation Services (DTS) | 161 |
What Is DTS? | 162 |
DTS Tasks | 162 |
Transform | 162 |
Bulk Insert | 163 |
Data Driven Query | 163 |
Execute Package | 164 |
Connections | 167 |
Sources | 167 |
Configuring a Connection | 168 |
DTS Package Workflow | 169 |
DTS Package Steps | 169 |
Precedence Constraints | 170 |
DTS Designer | 171 |
Opening the DTS Designer | 171 |
Saving a DTS Package | 172 |
dtsrun Utility | 174 |
Using DTS to Create a Data-Mining Model | 177 |
Preparing the SQL Server Environment | 178 |
Creating the Package | 182 |
Summary | 208 |
9 Using Decision Support Objects (DSO) | 209 |
Scripting vs. Visual Basic | 210 |
The Server Object | 211 |
The Database Object | 219 |
Creating the Relational Data-Mining Model Using DSO | 221 |
Creating the OLAP Data-Mining Model Using DSO | 230 |
The DataSource Object | 232 |
Data-Mining Model (Decision Support Objects) | 233 |
Adding a New Data Source | 233 |
Analysis Server Roles | 234 |
Data-Mining Model Roles | 235 |
Summary | 236 |
10 Understanding Data-Mining Structures | 237 |
The Structure of the Data-Mining Model Case | 237 |
Data-Mining Models Look Like Tables | 237 |
Using Code to Browse Data-Mining Models | 238 |
Using the Schema Rowsets | 243 |
MINING_MODELS Schema Rowset | 243 |
MINING_COLUMNS Schema Rowset | 249 |
MINING_MODEL_CONTENT Schema Rowset | 259 |
MINING_SERVICES Schema Rowset | 262 |
SERVICE_PARAMETERS Schema Rowset | 266 |
MODEL_CONTENT_PMML Schema Rowset | 268 |
Summary | 269 |
11 Data Mining Using PivotTable Service | 271 |
Redistributing Components | 272 |
Installing and Registering Components | 273 |
File Locations | 274 |
Installation Registry Settings | 275 |
Redistribution Setup Programs | 275 |
Connecting to the PivotTable Service | 276 |
Connect to Analysis Services Using PivotTable Service | 276 |
Connect to Analysis Services Using HTTP | 280 |
Building a Local Data-Mining Model | 280 |
Storage of Local Mining Models | 284 |
SELECT INTO Statement | 286 |
INSERT INTO Statement | 286 |
OPENROWSET Syntax | 287 |
Nested Tables and the SHAPE Statement | 289 |
Using XML in Data Mining | 290 |
The PMML Standard | 290 |
Summary | 296 |
12 Data-Mining Queries | 297 |
Components of a Prediction Query | 297 |
The Basic Prediction Query | 298 |
Specifying the Test Case Source | 298 |
Specifying Columns | 300 |
The PREDICTION JOIN Clause | 300 |
Using Functions as Columns | 304 |
Using Tabular Values as Columns | 304 |
The WHERE Clause | 306 |
Prediction Functions | 307 |
Predict | 307 |
PredictProbability | 308 |
PredictSupport | 308 |
PredictVariance | 309 |
PredictStdev | 310 |
PredictProbabilityVariance | 310 |
PredictProbabilityStdev | 310 |
PredictHistogram | 310 |
TopCount | 313 |
TopSum | 313 |
TopPercent | 314 |
RangeMin | 314 |
RangeMid | 314 |
RangeMax | 314 |
PredictScore | 314 |
PredictNodeId | 315 |
Prediction Queries with Clustering Models | 315 |
Cluster | 315 |
ClusterProbability | 316 |
ClusterDistance | 316 |
Using DTS to Run Prediction Queries | 317 |
Summary | 322 |
APPENDIX | 325 |
GLOSSARY | 349 |
INDEX | 359 |