Data mining is the process of extracting patterns from large sets of data using statistics and artificial intelligence and integrating those disciplines with database management. Before data mining can be done, data cleaning must be done to make sure that the data doesn't have any omissions or wrong information that would interfere with the interpretation of large quantities of data. Then, the data is segmented into feature vectors and analyzed one vector or set at a time.
Data Mining can consist of various sub-disciplines. Clustering involves discovering groups and structures in the data that are relatively similar.
Classification involves generalizing structures that have been identified to apply it to new data. For example, someone who attempts to sign up on our directory who doesn't have a company name, or who mispells a company name of a very large multi-national corporation would be classified as a false signup.
Regression tries to find a function which models the data with the last amount of error. Association rule learning looks for relationships between variables. Amazon.com uses association to study book buying patterns. If a customer who buys Toyota Kaizen, also frequently buys some other Title about business philosophy, then Amazon can use this information to recommend the other book to whomever shows interest in or purchases a book about Toyota Kaizen. In business, the bottom line in data mining is to identify patterns in customer behavior, as well as learning how to target specific individuals or groups of individuals with marketing campaigns.
|