# (Data Mining)

Instructor: Jugal Kalita

### Class Material

• Syllabus Syllabus, Text book, Grading Scheme

### Home Work Assignments

I will give you 2 or 3 home work assignments. They will invlove programs that learn with real or imagined data. Please make sure you finish all assignments before the final and demo them to me.

### Lecture Schedule

Here is the list of topics discussed in class.

• Two lectures: Chapter 1 of Data Mining by Margaret Dunham, Introduction: Basic Data Mining Tasks, Data Mining vs. Knowledge Discovery in Databases, Data Mining Metrics, Data Mining Issues, etc.
• Two lectures: Chapter 2 of Data Mining by Margaret Dunham, Related Concepts: Information Retrieval, Web Search Engines, Machine Learning, Pattern Matching, Fuzzy Sets and Fuzzy Logic, etc.
• Two lectures: Chapters 2 of Neuro-Fuzzy and Soft Computing by Jang, Sun and Mizutani, Fuzzy Sets: Introduction, Basic Definitions and Terminology; Set-Theoretic Operations, Membership Fuctions and Prameterization; Other ways of implementing Fuzzy Union, Intersection and Complement
• Two lectures: Chapters 3 of Neuro-Fuzzy and Soft Computing by Jang, Sun and Mizutani, Fuzzy Rules and Fuzzy Reasoning: Extension Principle, Fuzzy If-Then Rules, Fuzzy Reasoning
• One lecture: Section 5.5 of Expert Systems by Giarratano and Riley, Fuzzy Rules: Max-min Composition, Moment Methods,
• Two lectures: Chapter 5 of Introduction to Modern Information Retrieval by Salton and McGill: Retrieval Evaluation: Recall and Precision, Fallout, Generality, Single Value Measures
• One lecture: Chapter 17 of Numerical Methods by Chapra and Canale, Least-Squares Regression: Linear Regression, Straight Line Fitting, Errors, Linearlization of Non-linear Relationships, Polynomial Regression, Multiple Linear Regression, General Linear Least Squares, Non-linear Regression
• Two lectures: Chapter 3 of Machine Learning by Mitchell, Decision Tree Learning : Introduction, Entropy Measure, Information Gain, Inductive Biases, Avoiding Overfitting, Continuous-valued Attributes, Differing Cost Attributes
• One lecture: Chapter 5 of an old KEE Manual from Intellicorp, Understanding Rule Based Reasoning: Forward Chaining, Backward Chaining, Choosing Between the two, Using Rules with Variables; Section 5.3, Production Systems from Artificial Intelligence by Lugar
• One lecture: Chapter 4 of Machine Learning by Mitchell: Artificial Neural Networks: Introduction, Perceptrons, Perceptron Training Rule, Gradient Descent and Delta Rule, Backpropagation algorithm, Convergence issues (with help from Tony Anzelmo)
• One lecture: Chapter 9 of Machine Learning by Mitchell: Genetic Algorithms: Genetic Operators, Fitness Function, Example
• Two lectures: Chapter 5 of Data Mining by Margaret Dunham, Clustering: Similarity and Distance Measures, Outliers, Hierarchical Algorithms, Partitional Algorithms, Clustering Large Databases
• Two lectures: Chapter 6 of Data Mining by Margaret Dunham, Association Rules: Definitions, Apriori Algorithm, Sampling, Partitioning Algorithm
• One lecture (by Jeff Schott): Chapter 9 of The Handbook of Data Mining by Ye, Psychometric Methods of Latent Variable Modeling
• One lecture (by Ankur Deshmukh): Chapter 5 of The Handbook of Data Mining by Ye, Bayesian Data Analysis
• One lecture: Naive Bayes Classifiers for Spam Detection by Jugal Kalita, MXLogic, Inc., from Summer 2002
• One lecture (by Steve Boone) : Chapter 9 of Data Mining by Margaret Dunham, Temporal Mining
• One lecture (by Ankur Patwa): Chapter 21 of The Handbook of Data Mining by Ye, Text Mining
• One lecture (by Jaya Potharaju): Chapter 27 of The Handbook of Data Mining by Ye, Mining Image Data
• One lecture (by Tony Anzelmo): Chapter 14 of The Handbook of Data Mining by Ye, Data Collection, Preparation, Quality and Visualization
• One lecture (by Priyadarshini Selvam): Chapter 25 of The Handbook of Data Mining by Ye, Mining Customer Relationship Management (CRM) Data