Basic Introduction to Data Mining
Welcome to our new tutorial Introduction to Data Mining. Data Mining refers to the discovering of a meaningful patterns and trends using some mathematical algorithm on huge amount of stored data. Also, Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.
Data mining is also known as data discovery and knowledge discovery. Data mining is a five-step process:
- Problem Definition
First, we have to understand the requirements. It is the most important KDD step though it doesn’t have any technical aspects. We need to focus on What problem are we trying to solve? Customer Acquisition? Retention?
Note :- Knowledge Discovery in Databases(KDD) is an iterative processwhere evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.
- Identifying the source information
First, this phase starts with the collection of data.Different datasets tend to expose new issues and challenges, so you need to identify the data, or the sources of information, and from that you should be able to determine what information you should be studying to retrieve data from.
- Picking the data points that need to be analyzed
The data from different sources should be selected, cleaned, transformed, formatted, anonymized, and constructed (if required).Data cleaning is a process to “clean” the data by smoothing noisy data and filling in missing values.
- Extracting the relevant information from the data
- Classification is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes.
- Clustering is used to identify data that are like each other. This process helps to understand the differences and similarities between the data.
- Regression analysis is the data mining method of identifying and analyzing the relationship between variables.
- Interpreting and reporting the results
The data that you extracted in earlier stages can be combined into the final result. The information has to be represented in such a way that stakeholders can use it whenever they want. We use visualization tools to represent data mining results. Data is visualized in the form of reports, tables, etc.
Sources of Data
- Supermarket scanners,
- Network data
- Credit card transactions
- Demographic data
- Sensor networks data
- Camera feed
- Server logs
- Call center records
This much for Introduction to Data Mining. We will see you in next tutorial soon. Keep loving us.
Here is a huge collection of 100+ free #Data Science books which cover the wide variety of topics under Data Science and the best thing is all are available for free.
If you find this post useful & think it can be helpful to anyone, feel free to share it with your friends or connections as well as with others.