# Data Mining For Beginners : Gentle Introduction

##### Basic Introduction to Data Mining

Welcome to our new tutorial Introduction to Data Mining. Data Mining refers to the discovering of a meaningful patterns and trends using some mathematical algorithm on huge amount of stored data. Also, Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.

Data mining is also known as data discovery and knowledge discovery. Data mining is a five-step process:

• Problem Definition

First, we have to understand the requirements. It is the most important KDD step though it doesn’t have any technical aspects. We need to focus on What problem are we trying to solve? Customer Acquisition? Retention?

Note :- Knowledge Discovery in Databases(KDD) is an iterative processwhere evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.

• Identifying the source information

First, this phase starts with the collection of data.Different datasets tend to expose new issues and challenges, so you need to identify the data, or the sources of information, and from that you should be able to determine what information you should be studying to retrieve data from.

• Picking the data points that need to be analyzed

The data from different sources should be selected, cleaned, transformed, formatted, anonymized, and constructed (if required).Data cleaning is a process to “clean” the data by smoothing noisy data and filling in missing values.

• Extracting the relevant information from the data
1. Classification is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes.
2. Clustering is used to identify data that are like each other. This process helps to understand the differences and similarities between the data.
3. Regression analysis is the data mining method of identifying and analyzing the relationship between variables.
• Interpreting and reporting the results

The data that you extracted in earlier stages can be combined into the final result. The information has to be represented in such a way that stakeholders can use it whenever they want. We use visualization tools to represent data mining results. Data is visualized in the form of reports, tables, etc.

##### Sources of Data
• Supermarket scanners,
• Network data
• Credit card transactions
• Demographic data
• Sensor networks data
• Camera feed
• Server logs
• Call center records

This much for Introduction to Data Mining. We will see you in next tutorial soon. Keep loving us.

