Data Mining, Machine Learning, and Pattern Recognition


There is a considerable confusion in terms of data mining, machine learning, and pattern recognition among the beginning researchers and practitioners because of significant overlap in terms of aims and methods of these fields. It is always a challenge to explain the difference between the three fields. Pattern recognition is the oldest of these three field dating back to early 1950’s when researchers were trying to develop machines for OCR and speech recognition. My take on pattern recognition is that it is a field that is concerned with the design and development of systems to recognize or group patterns – objects, signals, and processes, captured through some sensing mechanism; it has somewhat of an engineering flavor. The term machine learning came out of artificial intelligence community and its focus is on learning relationships present in data to build classification models. The emphasis in machine learning is on algorithmic models for learning and their properties. The term data mining appeared late in the game and it was used to designate activities that had a strong application focus and were aimed at extracting useful patterns from data, mostly in business data. Both pattern recognition and machine learning methods form an important component of any data mining effort.

One can get an interesting picture of how these three fields have emerged by going to Google’s Ngram Viewer. The viewer is an outcome of massive digitalization effort undertaken by Google. You can use the viewer to chart the frequency of usage of phrases of interest over the years in the corpus of books digitized by Google. The graph below was generated by Ngram Viewer for three phrases, “Data Mining + data mining”, “Pattern Recognition + pattern recognition”, and “Machine Learning + machine learning”. Since Ngram viewer is case-sensitive, I included both lower and upper cases of the three phrases of interest. As the graph shows, the term pattern recognition started to appear in early fifties. Both machine learning and data mining terms came much later. beginning with nineties, machine learning and data mining have grown in popularity while the term pattern recognition has become slightly less fashionable.

Screen Shot 2013-06-19 at 9.34.02 PM