Request It Save to My Lists Export

Previous Record Next Record

Book Cover

Book

Title Contrast data mining : concepts, algorithms, and applications / edited by Guozhu Dong and James Bailey

Published Boca Raton, Fla. ; London : Chapman & Hall/CRC, 2013

Boca Raton, FL : CRC Press, [2013]

©2013

Copies

Location Call no. Vol. Availability

MELB 006.312 Don/Cdm AVAILABLE

Description xxiv, 410 pages : illustrations ; 24 cm

Series Chapman & Hall/CRC data mining and knowledge discovery series

Chapman & Hall/CRC data mining and knowledge discovery series.

Contents Machine generated contents note: I.Preliminaries and Statistical Contrast Measures -- 1.Preliminaries / Guozhu Dong -- 1.1.Datasets of Various Data Types -- 1.2.Data Preprocessing -- 1.3.Patterns and Models -- 1.4.Contrast Patterns and Models -- 2.Statistical Measures for Contrast Patterns / James Bailey -- 2.1.Introduction -- 2.1.1.Terminology -- 2.2.Measures for Assessing Quality of Discrete Contrast Patterns -- 2.3.Measures for Assessing Quality of Continuous Valued Contrast Patterns -- 2.4.Feature Construction and Selection: PCA and Discriminative Methods -- 2.5.Summary -- II.Contrast Mining Algorithms -- 3.Mining Emerging Patterns Using Tree Structures or Tree Based Searches / Kotagiri Ramamohanarao -- 3.1.Introduction -- 3.1.1.Terminology -- 3.2.Ratio Tree Structure for Mining Jumping Emerging Patterns -- 3.3.Contrast Pattern Tree Structure -- 3.4.Tree Based Contrast Pattern Mining with Equivalence Classes -- 3.5.Summary and Conclusion --

Contents note continued: 4.Mining Emerging Patterns Using Zero-Suppressed Binary Decision Diagrams / Elsa Loekito -- 4.1.Introduction -- 4.2.Background on Binary Decision Diagrams and ZBDDs -- 4.3.Mining Emerging Patterns Using ZBDDs -- 4.4.Discussion and Summary -- 5.Efficient Direct Mining of Selective Discriminative Patterns for Classification / Philip S. Yu -- 5.1.Introduction -- 5.2.DDPMine: Direct Discriminative Pattern Mining -- 5.2.1.Branch-and-Bound Search -- 5.2.2.Training Instance Elimination -- 5.2.2.1.Progressively Shrinking FP-Tree -- 5.2.2.2.Feature Coverage -- 5.2.3.Efficiency Analysis -- 5.2.4.Summary -- 5.3.Harmony: Efficiently Mining The Best Rules For Classification -- 5.3.1.Rule Enumeration -- 5.3.2.Ordering of the Local Items -- 5.3.3.Search Space Pruning -- 5.3.4.Summary -- 5.4.Performance Comparison Between DDPMine and Harmony -- 5.5.Related Work -- 5.5.1.MbT: Direct Mining Discriminative Patterns via Model-based Search Tree --

Contents note continued: 5.5.2.NDPMine: Direct Mining Discriminative Numerical Features -- 5.5.3.uHarmony: Mining Discriminative Patterns from Uncertain Data -- 5.5.4.Applications of Discriminative Pattern Based Classification -- 5.5.5.Discriminative Frequent Pattern Based Classification vs. Traditional Classification -- 5.6.Conclusions -- 6.Mining Emerging Patterns from Structured Data / James Bailey -- 6.1.Introduction -- 6.2.Contrasts in Sequence Data: Distinguishing Sequence Patterns -- 6.2.1.Definitions -- 6.2.2.Mining Approach -- 6.3.Contrasts in Graph Datasets: Minimal Contrast Subgraph Patterns -- 6.3.1.Terminology and Definitions for Contrast Subgraphs -- 6.3.2.Mining Algorithms for Minimal Contrast Subgraphs -- 6.4.Summary -- 7.Incremental Maintenance of Emerging Patterns / Guozhu Dong -- 7.1.Background & Potential Applications -- 7.2.Problem Definition & Challenges -- 7.2.1.Potential Challenges -- 7.3.Concise Representation of Pattern Space: The Border --

Contents note continued: 7.4.Maintenance of Border -- 7.4.1.Basic Border Operations -- 7.4.2.Insertion of New Instances -- 7.4.3.Removal of Existing Instances -- 7.4.4.Expansion of Query Item Space -- 7.4.5.Shrinkage of Query Item Space -- 7.5.Related Work -- 7.6.Closing Remarks -- III.Generalized Contrasts, Emerging Data Cubes, and Rough Sets -- 8.More Expressive Contrast Patterns and Their Mining / Guozhu Dong -- 8.1.Introduction -- 8.2.Disjunctive Emerging Pattern Mining -- 8.2.1.Basic Definitions -- 8.2.2.ZBDD Based Approach to Disjunctive EP Mining -- 8.3.Fuzzy Emerging Pattern Mining -- 8.3.1.Advantages of Fuzzy Logic -- 8.3.2.Fuzzy Emerging Patterns Defined -- 8.3.3.Mining Fuzzy Emerging Patterns -- 8.3.4.Using Fuzzy Emerging Patterns in Classification -- 8.4.Contrast Inequality Discovery -- 8.4.1.Basic Definitions -- 8.4.2.Brief Introduction to GEP -- 8.4.3.GEP Algorithm for Mining Contrast Inequalities -- 8.4.4.Experimental Evaluation of GEPCIM -- 8.4.5.Future Work --

Contents note continued: 8.5.Contrast Equation Mining -- 8.6.Discussion -- 9.Emerging Data Cube Representations for OLAP Database Mining / Rosine Cicchetti -- 9.1.Introduction -- 9.2.Emerging Cube -- 9.3.Representations of the Emerging Cube -- 9.3.1.Representations for OLAP Classification -- 9.3.1.1.Borders [L; U] -- 9.3.1.2.Borders [U#; U] -- 9.3.2.Representations for OLAP Querying -- 9.3.2.1.L-Emerging Closed Cubes -- 9.3.2.2.U#-Emerging Closed Cubes -- 9.3.2.3.Reduced U#-Emerging Closed Cubes -- 9.3.3.Representation for OLAP Navigation -- 9.4.Discussion -- 9.5.Conclusion -- 10.Relation Between Jumping Emerging Patterns and Rough Set Theory / Krzysztof Walczak -- 10.1.Introduction -- 10.2.Theoretical Foundations -- 10.3.JEPs with Negation -- 10.3.1.Negative Knowledge in Transaction Databases -- 10.3.2.Transformation to Decision Table -- 10.3.3.Properties -- 10.3.4.Mining Approaches -- 10.4.JEP Mining by Means of Local Reducts -- 10.4.1.Global Condensation --

Contents note continued: 10.4.1.1.Condensed Decision Table -- 10.4.1.2.Proper Partition Finding as Graph Coloring -- 10.4.1.3.Discovery Method -- 10.4.2.Local Projection -- 10.4.2.1.Locally Projected Decision Table -- 10.4.2.2.Discovery Method -- IV.Contrast Mining for Classification & Clustering -- 11.Overview and Analysis of Contrast Pattern Based Classification / Guozhu Dong -- 11.1.Introduction -- 11.2.Main Issues in Contrast Pattern Based Classification -- 11.3.Representative Approaches -- 11.3.1.Contrast Pattern Mining and Selection -- 11.3.2.Classification Strategy -- 11.3.3.Summary -- 11.4.Bias Variance Analysis of iCAEP and Others -- 11.5.Overfitting Avoidance by CP-Based Approaches -- 11.6.Solving the Imbalanced Classification Problem -- 11.6.1.Advantages of Contrast Pattern Based Classification -- 11.6.2.Performance Results of iCAEP -- 11.7.Conclusion and Discussion -- 12.Using Emerging Patterns in Outlier and Rare-Class Prediction / Guozhu Dong -- 12.1.Introduction --

Contents note continued: 12.2.EP-length Statistic Based Outlier Detection -- 12.2.1.EP Based Discriminative Information for One Class -- 12.2.2.Mining EPs From One-class Data -- 12.2.3.Defining the Length Statistics of EPs -- 12.2.4.Using Average Length Statistics for Classification -- 12.2.5.The Complete OCLEP Classifier -- 12.3.Experiments on OCLEP on Masquerader Detection -- 12.3.1.Masquerader Detection -- 12.3.2.Data Used and Evaluation Settings -- 12.3.3.Data Preprocessing and Feature Construction -- 12.3.4.One-class Support Vector Machine (ocSVM) -- 12.3.5.Experiment Results Using OCLEP -- 12.3.5.1.SEA Experiment -- 12.3.5.2.1v49' Experiment -- 12.3.5.3.Situations When OCLEP is Better -- 12.3.5.4.Feature Based OCLEP Ensemble -- 12.4.Rare-class Classification Using EPs -- 12.5.Advantages of EP-based Rare-class Instance Creation -- 12.6.Related Work and Discussion -- 13.Enhancing Traditional Classifiers Using Emerging Patterns / Kotagiri Ramamohanarao -- 13.1.Introduction --

Contents note continued: 13.2.Emerging Pattern Based Class Membership Score -- 13.3.Emerging Pattern Enhanced Weighted/Fuzzy SVM -- 13.3.1.Determining Instance Relevance Weight -- 13.3.2.Constructing Weighted SVM -- 13.3.3.Performance Evaluation -- 13.4.Emerging Pattern Based Weighted Decision Trees -- 13.4.1.Determining Class Membership Weight -- 13.4.2.Constructing Weighted Decision Trees -- 13.4.3.Performance Evaluation -- 13.4.4.Discussion -- 13.5.Related Work -- 14.CPC: A Contrast Pattern Based Clustering Algorithm / Guozhu Dong -- 14.1.Introduction -- 14.2.Related Work -- 14.3.Preliminaries -- 14.3.1.Equivalence Classes of Frequent Itemsets -- 14.3.2.CPCQ: Contrast Pattern Based Clustering Quality Index -- 14.4.CPC Design and Rationale -- 14.4.1.Overview -- 14.4.2.MPQ -- 14.4.3.The CPC Algorithm -- 14.4.4.CPC Illustration -- 14.4.5.Optimization and Implementation Details -- 14.5.Experimental Evaluation -- 14.5.1.Datasets and Clustering Algorithms -- 14.5.2.CPC Parameters --

Contents note continued: 14.5.3.Experiment Settings -- 14.5.4.Categorical Datasets -- 14.5.5.Numerical Dataset -- 14.5.6.Document Clustering -- 14.5.7.CPC Execution Time and Memory Use -- 14.5.8.Effect of Pattern Limit on Clustering Quality -- 14.6.Discussion and Future Work -- 14.6.1.Alternate MPQ Definition -- 14.6.2.Future Work -- V.Contrast Mining for Bioinformatics and Chemoinformatics -- 15.Emerging Pattern Based Rules Characterizing Subtypes of Leukemia / Limsoon Wong -- 15.1.Introduction -- 15.2.Motivation and Overview of PCL -- 15.3.Data Used in the Study -- 15.4.Discovery of Emerging Patterns -- 15.4.1.Step 1: Gene Selection and Discretization -- 15.4.2.Step 2: Discovering EPs -- 15.5.Deriving Rules from Tree-Structured Leukemia Datasets -- 15.5.1.Rules for T-All vs Others1 -- 15.5.2.Rules for E2A-PBX1 vs Others2 -- 15.5.3.Rules through Level 3 to Level 6 -- 15.6.Classification by PCL on the Tree-Structured Data --

Contents note continued: 15.6.1.PCL: Prediction by Collective Likelihood of Emerging Patterns -- 15.6.2.Strengthening the Prediction Method at Levels 1 & 2 -- 15.6.3.Comparison with Other Methods -- 15.7.Generalized PCL for Parallel Multi-Class Classification -- 15.8.Performance Using Randomly Selected Genes -- 15.9.Summary -- 16.Discriminating Gene Transfer and Microarray Concordance Analysis / Guozhu Dong -- 16.1.Introduction -- 16.2.Datasets Used in Experiments and Preprocessing -- 16.3.Discriminating Genes and Associated Classifiers -- 16.4.Measures for Transferability -- 16.4.1.Measures for Discriminative Gene Transferability -- 16.4.2.Measures for Classifier Transferability -- 16.5.Findings on Microarray Concordance -- 16.5.1.Concordance Test by Classifier Transferability -- 16.5.2.Split Value Consistency Rate Analysis -- 16.5.3.Shared Discriminating Gene Based P-Value -- 16.6.Discussion -- 17.Towards Mining Optimal Emerging Patterns Amidst 1000s of Genes / Guozhu Dong --

Contents note continued: 17.1.Introduction -- 17.2.Gene Club Formation Methods -- 17.2.1.The Independent Gene Club Formation Method -- 17.2.2.The Iterative Gene Club Formation Method -- 17.2.3.Two Divisive Gene Club Formation Methods -- 17.3.Interaction Based Importance Index of Genes -- 17.4.Computing IBIG and Highest Support EPs for Top IBIG Genes -- 17.5.Experimental Evaluation of Gene Club Methods -- 17.5.1.Ability to Find Top Quality EPs from 75 Genes -- 17.5.2.Ability to Discover High Support EPs and Signature EPs, Possibly Involving Lowly Ranked Genes -- 17.5.3.High Support Emerging Patterns Mined -- 17.5.4.Comparison of the Four Gene Club Methods -- 17.5.5.IBIG vs Information Gain Based Ranking -- 17.6.Discussion -- 18.Emerging Chemical Patterns - Theory and Applications / Jurgen Bajorath -- 18.1.Introduction -- 18.2.Theory -- 18.3.Compound Classification -- 18.4.Computational Medicinal Chemistry Applications -- 18.4.1.Simulated Lead Optimization --

Contents note continued: 18.4.2.Simulated Sequential Screening -- 18.4.3.Bioactive Conformation Analysis -- 18.5.Chemoinformatics Glossary -- 19.Emerging Patterns as Structural Alerts for Computational Toxicology / Ronan Bureau -- 19.1.Introduction -- 19.2.Frequent Emerging Molecular Patterns as Potential Structural Alerts -- 19.2.1.Definition of Frequent Emerging Molecular Pattern -- 19.2.2.Using RPMPs as Condensed Representation of FEMPs -- 19.2.3.Notes on the Computation -- 19.2.4.Related Work -- 19.3.Experiments in Predictive Toxicology -- 19.3.1.Materials and Experimental Setup -- 19.3.2.Generalization of the RPMPs -- 19.4.A Chemical Analysis of RPMPs -- 19.5.Conclusion -- VI.Contrast Mining for Special Domains -- 20.Emerging Patterns and Classification for Spatial and Image Data / Krzysztof Walczak -- 20.1.Introduction -- 20.2.Previous Work -- 20.3.Image Representation -- 20.4.Jumping Emerging Patterns with Occurrence Counts -- 20.4.1.Formal Definition --

Contents note continued: 20.4.2.Mining Algorithm -- 20.4.3.Use in Classification -- 20.5.Spatial Emerging Patterns -- 20.6.Jumping Emerging Substrings -- 20.7.Experimental Results -- 20.8.Conclusions -- 21.Geospatial Contrast Mining with Applications on Labeled Spatial Data / Josue Salazar -- 21.1.Introduction -- 21.2.Related Work -- 21.3.Problem Formulation -- 21.4.Identification of Geospatial Discriminative Patterns and Discovery of Optimal Boundary -- 21.5.Pattern Summarization -- 21.6.Application on Vegetation Analysis -- 21.7.Application on Presidential Election Data Analysis -- 21.8.Application on Biodiversity Analysis of Bird Species -- 21.9.Conclusion -- 22.Mining Emerging Patterns for Activity Recognition / Jian Lu -- 22.1.Introduction -- 22.2.Data Preprocessing -- 22.3.Mining Emerging Patterns For Activity Recognition -- 22.3.1.Problem Statement -- 22.3.2.Mining Emerging Patterns from Sequential Activity Instances -- 22.4.The epSICAR Algorithm --

Contents note continued: 22.4.1.Score Function for Sequential Activity -- 22.4.1.1.EP Score -- 22.4.1.2.Coverage Score -- 22.4.1.3.Correlation Score -- 22.4.2.Score Function for Interleaved and Concurrent Activities -- 22.4.3.The epSICAR Algorithm -- 22.5.Empirical Studies -- 22.5.1.Trace Collection and Evaluation Methodology -- 22.5.2.Experiment 1: Accuracy Performance -- 22.5.3.Experiment 2: Model Analysis -- 22.6.Conclusion -- 23.Emerging Pattern Based Prediction of Heart Diseases and Powerline Safety / Minghao Piao -- 23.1.Introduction -- 23.2.Prediction of Myocardial Ischemia -- 23.3.Coronary Artery Disease Diagnosis -- 23.4.Classification of Powerline Safety -- 23.5.Conclusion -- 24.Emerging Pattern Based Crime Spots Analysis and Rental Price Prediction / Atsushi Takizawa -- 24.1.Introduction -- 24.2.Street Crime Analysis -- 24.2.1.Studied Area and Databases -- 24.2.2.Attributes on Visibility -- 24.2.3.Preparation of the Analysis -- 24.2.4.Result --

Contents note continued: 24.3.Prediction of Apartment Rental Price -- 24.3.1.Background and Motivation -- 24.3.2.Data -- 24.3.3.Extracting Frequent Subgraphs -- 24.3.4.Discovering Primary Subgraphs by Emerging Patterns -- 24.3.5.Rent Price Prediction Model -- VII.Survey of Other Papers -- 25.Overview of Results on Contrast Mining and Applications / Guozhu Dong -- 25.1.General Papers, Events, PhD Dissertations -- 25.2.Analysis and Measures on Contrasts and Similarity -- 25.3.Contrast Mining Algorithms -- 25.3.1.Mining Contrasts and Changes in General Data -- 25.3.2.Mining Contrasts in Stream, Temporal, Sequence Data -- 25.3.3.Mining Contrasts in Spatial, Image, and Graph Data -- 25.3.4.Unusual Subgroup Discovery and Description -- 25.3.5.Mining Conditional Contrasts and Gradients -- 25.4.Contrast Pattern Based Classification -- 25.5.Contrast Pattern Based Clustering -- 25.6.Contrast Mining and Bioinformatics and Chemoinformatics --

Contents note continued: 25.7.Contrast Mining Applications in Various Domains -- 25.7.1.Medicine, Environment, Security, Privacy, Activity Recognition -- 25.7.2.Business, Customer Behavior, Music, Video, Blog -- 25.7.3.Model Error Analysis, and Genetic Algorithm Improvement

Summary "Preface Contrasting is one of the most basic types of analysis. Contrasting based analysis is routinely employed, often subconsciously, by all types of people. People use contrasting to better understand the world around them and the challenging problems they want to solve. People use contrasting to accurately assess the desirability of important situations, and to help them better avoid potentially harmful situations and embrace potentially beneficial ones. Contrasting involves the comparison of one dataset against another. The datasets may represent data of different time periods, spatial locations, or classes, or they may represent data satisfying different conditions. Contrasting is often employed to compare cases with a desirable outcome against cases with an undesirable one, for example comparing the benign and diseased tissue classes of a cancer, or comparing students who graduate with university degrees against those who do not. Contrasting can identify patterns that capture changes and trends over time or space, or identify discriminative patterns that capture differences among contrasting classes or conditions. Traditional methods for contrasting multiple datasets were often very simple so that they could be performed by hand. For example, one could compare the respective feature means, compare the respective attribute-value distributions, or compare the respective probabilities of simple patterns, in the datasets being contrasted. However, the simplicity of such approaches has limitations, as it is difficult to use them to identify specific patterns that offer novel and actionable insights, and identify desirable sets of discriminative patterns for building accurate and explainable classifiers"--Publisher

Notes Formerly CIP. Uk

Bibliography Includes bibliographical references and index

Subject Contrast data mining.

Author Dong, Guozhu, 1957- editor of compilaton

Bailey, James, 1971 June 30- editor of compilation

LC no. 2012025085

ISBN 9781439854327 (hardback) (acid-free paper)

1439854327 (hardback) (acid-free paper)

Permalink