The process of finding patterns in data manually, is tedious as data here is ubiquitous, and has witnessed a dramatic transformation in strategy throughout the years. Whether we refer to hunters seeking to understand the animal migration patterns, or farmers attempting to model harvest evolution, or turn to more current concerns, like sales trend analysis, assisted medical diagnosis, or building models of the surrounding world from scientific data, we reach the same conclusion: hidden within raw data we could find important new pieces of information and knowledge. Traditional approaches for deriving knowledge from data rely strongly on manual analysis and interpretation. For any domain scientific, marketing, finance, health, business, etc. the success of a traditional analysis depends on the capabilities of one more specialists to read in the data. Eg: Scientists go through remote images of planets and asteroids to mark interested objects, such as impact craters. Data mining can be classified into two methods: Descriptive data mining and Predictive data mining. Descriptive data mining tasks describe the general characteristics of data in the database. Clustering, Summarization, Association rules for mining and sequence discovery are the various tasks of descriptive modeling. Predictive mining tasks make presumptions on the recent data to estimate unknown or future values of interest. Classification, regression, time series analysis and prediction are the various tasks of predictive modeling. The relative significance of both prediction and description differ with respect to the applications and the techniques. The following are different data mining techniques fulfilling the above discussed objectives: Classification, Regression, Time Series, Clustering, Summarization, Analysis, Prediction, Association Rules and Sequence Discovery.

Classification:

Classification is defined as the process of classifying data items into one of the predefined categorical classes. It is based on two steps which involves examining the features of newly arrived objects and assigning it to a predefined class. The first step is to build a model which describes a predetermined set of data class or concepts and training data sets which is used to make a model. The designed model used in the second step for classification. Various and most popular classification techniques are Bayesian classifiers, support vector machines, K-Nearest Neighbor, decision trees and neural networks .

Regression

Regression is used to design data items to an original valued prediction variable. Regression is the traditional and most well – known statistical method most often used for numeric prediction. Regression analysis is extensively used for prediction and forecasting the data for which values are unknown. Regression analysis is also used to understand the relationship among independent variable and dependent variable. Regression analysis can be used in difficult circumstances to infer causal relationships between the independent and dependent variables.

Time Series Analysis

Time series analysis includes methods and techniques for analyzing time series data to extract meaningful statistics and other characteristics of the data.The values are usually attained as evenly spaced time points (hourly, daily, weekly, etc.)

Prediction

It predicts future data states based on past and current data. It is same as classification. The primary difference is that in prediction, it predicts the future state rather than the current state. Prediction applications are included in flooding, speech recognition, machine learning, and pattern recognition.

Clustering

Clustering is the process of grouping same set of objects into clusters and objects from different clusters are dissimilar. Clustering is also called as unsupervised classification. Clustering is similar to classification technique but one difference in this, grouped data are not predefined. The similarity among these is measured by using some measures like Euclidian distance. Various types of clustering are as follows: Hierarchical clustering, Partition clustering, Categorical clustering, Density based clustering and Grid based clustering .

SummarizationData mining can be classified into two methods: Descriptive data mining and Predictive data mining. Descriptive data mining tasks describe the general characteristics of data in the database. Clustering, Summarization, Association rules for mining and sequence discovery are the various tasks of descriptive modeling. Predictive mining tasks make presumptions on the recent data to estimate unknown or future values of interest. Classification, regression, time series analysis and prediction are the various tasks of predictive modeling. The relative significance of both prediction and description differ with respect to the applications and the techniques. The following are different data mining techniques fulfilling the above discussed objectives: Classification, Regression, Time Series, Clustering, Summarization, Analysis, Prediction, Association Rules and Sequence Discovery.

Classification:

Classification is defined as the process of classifying data items into one of the predefined categorical classes. It is based on two steps which involves examining the features of newly arrived objects and assigning it to a predefined class. The first step is to build a model which describes a predetermined set of data class or concepts and training data sets which is used to make a model. The designed model used in the second step for classification. Various and most popular classification techniques are Bayesian classifiers, support vector machines, K-Nearest Neighbor, decision trees and neural networks .

Regression

Regression is used to design data items to an original valued prediction variable. Regression is the traditional and most well – known statistical method most often used for numeric prediction. Regression analysis is extensively used for prediction and forecasting the data for which values are unknown. Regression analysis is also used to understand the relationship among independent variable and dependent variable. Regression analysis can be used in difficult circumstances to infer causal relationships between the independent and dependent variables.

Time Series Analysis

Time series analysis includes methods and techniques for analyzing time series data to extract meaningful statistics and other characteristics of the data.The values are usually attained as evenly spaced time points (hourly, daily, weekly, etc.)

Prediction

It predicts future data states based on past and current data. It is same as classification. The primary difference is that in prediction, it predicts the future state rather than the current state. Prediction applications are included in flooding, speech recognition, machine learning, and pattern recognition.

Clustering

Clustering is the process of grouping same set of objects into clusters and objects from different clusters are dissimilar. Clustering is also called as unsupervised classification. Clustering is similar to classification technique but one difference in this, grouped data are not predefined. The similarity among these is measured by using some measures like Euclidian distance. Various types of clustering are as follows: Hierarchical clustering, Partition clustering, Categorical clustering, Density based clustering and Grid based clustering .

Summarization

Summarization is the process to provide a simple description for a subset of data. It is also known as characterization or generalization. It attains higher information about the database. This may be achieved by actually retrieving portions of the data.

Association

Association rule mining is used to discover the association between attributes in a given transactional database through association rules. Association rules are used to detect or discover the frequency of items occurring together. The extracted rules are used for decision making for more effectiveness. Rules are produced based on the user defined minimum support and confidence value. Association rule mining is mainly used in market analysis. Different types of algorithms that are used in this technique are Apriori algorithm, FPgrowth algorithm, Partition algorithm, Pincer-search algorithm and Dynamic Itemset Counting algorithm.

Summarization is the process to provide a simple description for a subset of data. It is also known as characterization or generalization. It attains higher information about the database. This may be achieved by actually retrieving portions of the data.

Association

Association rule mining is used to discover the association between attributes in a given transactional database through association rules. Association rules are used to detect or discover the frequency of items occurring together. The extracted rules are used for decision making for more effectiveness. Rules are produced based on the user defined minimum support and confidence value. Association rule mining is mainly used in market analysis. Different types of algorithms that are used in this technique are Apriori algorithm, FPgrowth algorithm, Partition algorithm, Pincer-search algorithm and Dynamic Itemset Counting algorithm.

Moreover, the volume of generating data is increasing dramatically, which makes traditional approaches impractical in most domains. Within the large volumes of data hidden strategic pieces of information in fields such as science, health or business. Besides the possibility to collect and store large volumes of data, the information era has also provided us with an increased computational power. The natural attitude is to employ this power to automate the process of discovering interesting models and patterns in raw data. Thus, the purpose of the knowledge discovery methods is to provide solutions to one of the problems triggered by the information era: “data overload” Fay96.

A formal definition of data mining (DM), also known historically as data fishing, data dredging, knowledge discovery in databases or depending on the domain, as business intelligence, information discovery, information harvesting or data pattern processing is Fay96