What is Data Mining?

  • Simply Data Mining refers to the process of looking for hidden pattern and trends in data which are not apparently visible from summarizing the data.
  • It can be called as an sub-domain of databases but involves ‘No Querying’
  • It is based on an ‘Interestingness Criteria’.
  • In other words It takes input as Data and an Interestingness criteria and produces Hidden Patterns as Output.

  • Here the Data may be tabular data  , Spatial  data,tree data , sequence  data,text multimedia data
  • Here the Interstingnesss criteria could  be frequency , rarity , Co-relation ,Consistency


  • Often people Confuse data Mining with Statistical Inference but both are different terms
  • In Data mining we build up a hypothesis by on passing the given data and criteria to the data miner
  • Rather in Statistical inference we start off with an hypothesis and we find out methods to prove that our assumption is true in statistical inference

WHERE IT IS USED( An Realtime Example)

  • Consider in a super market where for eg they sell bag,caryon and uniform the owner may record all the purchase of the customers in a database table this forms our data
  • Here our interstingness criteria is frequency
  • So we feed this into the data miner and we get the probables (i.e) there may be support that when customer buying bag and uniform buys crayons too after that
  • So by this Analysis the Owner can keep bag,uniform,Crayons together so that they can be bought together.
  • Some of the freely available data miners are Statistica which responds according to the user fed inputs
  • Google also uses data mining technique in its search to identify the patterns in which the user is searching and give results optimised for them.

The basic terminologies and further more details on Data Mining will be covered on Subsequent Articles.

