Data Mining For Beginners : Gentle Introduction

Introduction to data mining
Basic Introduction to Data Mining

Welcome to our new tutorial Introduction to Data Mining. Data Mining refers to the discovering of a meaningful patterns and trends using some mathematical algorithm on huge amount of stored data. Also, Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.

Data mining is also known as data discovery and knowledge discovery. Data mining is a five-step process:

Stages of data mining
  • Problem Definition

First, we have to understand the requirements. It is the most important KDD step though it doesn’t have any technical aspects. We need to focus on What problem are we trying to solve? Customer Acquisition? Retention?

Note :- Knowledge Discovery in Databases(KDD) is an iterative processwhere evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.

  • Identifying the source information

First, this phase starts with the collection of data.Different datasets tend to expose new issues and challenges, so you need to identify the data, or the sources of information, and from that you should be able to determine what information you should be studying to retrieve data from.

  • Picking the data points that need to be analyzed

The data from different sources should be selected, cleaned, transformed, formatted, anonymized, and constructed (if required).Data cleaning is a process to “clean” the data by smoothing noisy data and filling in missing values.

  • Extracting the relevant information from the data
  1. Classification is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes.
  2. Clustering is used to identify data that are like each other. This process helps to understand the differences and similarities between the data.
  3. Regression analysis is the data mining method of identifying and analyzing the relationship between variables.
  • Interpreting and reporting the results

The data that you extracted in earlier stages can be combined into the final result. The information has to be represented in such a way that stakeholders can use it whenever they want. We use visualization tools to represent data mining results. Data is visualized in the form of reports, tables, etc.

Introduction to data mining
Sources of Data
  • Supermarket scanners,
  • Network data
  • Credit card transactions
  • Demographic data
  • Sensor networks data
  • Camera feed
  • Server logs
  • Call center records

This much for Introduction to Data Mining. We will see you in next tutorial soon. Keep loving us.

Data Science eBook

Here is a huge collection of 100+ free #Data Science books which cover the wide variety of topics under Data Science and the best thing is all are available for free.

If you find this post useful & think it can be helpful to anyone, feel free to share it with your friends or connections as well as with others.

About Diwas Pandey

Highly motivated, strong drive with excellent interpersonal, communication, and team-building skills. Motivated to learn, grow and excel in Data Science, Artificial Intelligence, SEO & Digital Marketing

View all posts by Diwas Pandey →

7 Comments on “Data Mining For Beginners : Gentle Introduction”

  1. I was wondering if you ever thought of changing the layout of your website?
    Its very well written; I love what youve got to say.
    But maybe you could a little more in the way of content so people could connect with it better.
    Youve got an awful lot of text for only having one or
    2 images. Maybe you could space it out better?

Leave a Reply

Your email address will not be published.