Basic Introduction to Data Mining
Welcome to our new tutorial Introduction to Data Mining. Data Mining refers to the discovering of a meaningful patterns and trends using some mathematical algorithm on huge amount of stored data. Also, Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.
Data mining is also known as data discovery and knowledge discovery. Data mining is a five-step process:
- Problem Definition
First, we have to understand the requirements. It is the most important KDD step though it doesn’t have any technical aspects. We need to focus on What problem are we trying to solve? Customer Acquisition? Retention?
Note :- Knowledge Discovery in Databases(KDD) is an iterative processwhere evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.
- Identifying the source information
First, this phase starts with the collection of data.Different datasets tend to expose new issues and challenges, so you need to identify the data, or the sources of information, and from that you should be able to determine what information you should be studying to retrieve data from.
- Picking the data points that need to be analyzed
The data from different sources should be selected, cleaned, transformed, formatted, anonymized, and constructed (if required).Data cleaning is a process to “clean” the data by smoothing noisy data and filling in missing values.
- Extracting the relevant information from the data
- Classification is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes.
- Clustering is used to identify data that are like each other. This process helps to understand the differences and similarities between the data.
- Regression analysis is the data mining method of identifying and analyzing the relationship between variables.
- Interpreting and reporting the results
The data that you extracted in earlier stages can be combined into the final result. The information has to be represented in such a way that stakeholders can use it whenever they want. We use visualization tools to represent data mining results. Data is visualized in the form of reports, tables, etc.
Sources of Data
- Supermarket scanners,
- Network data
- Credit card transactions
- Demographic data
- Sensor networks data
- Camera feed
- Server logs
- Call center records
This much for Introduction to Data Mining. We will see you in next tutorial soon. Keep loving us.
Here is a huge collection of 100+ free #Data Science books which cover the wide variety of topics under Data Science and the best thing is all are available for free.
If you find this post useful & think it can be helpful to anyone, feel free to share it with your friends or connections as well as with others.
My spouse and I stumbled over here by a different website and
thought I may as well check things out. I like what I see so now i’m following you.
Look forward to looking into your web page again. adreamoftrains web hosting companies
I was wondering if you ever thought of changing the layout of your website?
Its very well written; I love what youve got to say.
But maybe you could a little more in the way of content so people could connect with it better.
Youve got an awful lot of text for only having one or
2 images. Maybe you could space it out better?
Please mail this book
click the link & download
comment