December 7, 2006

CRISP-DM What is it?

Newer people to data mining often ask me about CRISP-DM. They may have heard about it at conferences or from peers. However, as statisticians often do, we assume everyone knows all of the acronyms. In the interest of demystifying data mining for the novice, I will try to provide some basic definitions and resources for common terminology. CRISP-DM is a great place to start. Basically, it is just the standard method for a data mining / predictive modeling project. The letters stand for "CRoss-Industry Standard Process for Data Mining."

"CRISP-DM has not been built in a theoretical, academic manner working from technical principles, nor did elite committees of gurus create it behind closed doors. Both these approaches to developing methodologies have been tried in the past, but have seldom led to practical, successful and widely-adopted standards. CRISP-DM succeeds because it is soundly based on the practical, real-world experience of how people do data mining projects. And in that respect, we are overwhelmingly indebted to the many practitioners who contributed their efforts and their ideas throughout the project."

