January 24, 2011

CRISP-DM: does this really capture our work?

The Cross Industry Standard Process for Data Mining--the time honored road map for building data-mining, evaluation and deployment, and ultimately building a self-sustaining cycle of new information, new insight and new analysis. As a construct it is both intuitive and transformational; identifying small steps as processes and linking them to in a larger approach to successful predictive analysis.

However I question how often it is ever truly realized? Is there merely an aspirant blue print? The white whale of our data-mining efforts? In the non-profit space I have a difficult time thinking of organizations where this has become organic.

I reviewed this after reading a recent posting suggesting many projects and even organizations start at very different points in this process. Sometimes the same organizations may start at a different place depending on the project design, data available, timeline, resources available etc.

Just curious what others think of CRISP-DM. Is it a firm road map to successful data-mining, or does it suggest merely an outline of processes that is malleable?

Doing Data Mining Out of Order
I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don't always; data mining often requires more creativity and "art" to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn't always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.

Read more

Labels: , ,


Blogger Dean Abbott said...

Great questions raised in your post. I think most organizations roughly follow the sequence in CRISP-DM. The biggest deviations I think are these:
1) some start at data understanding and work back to business understanding. In other words, "we have some data, what can we do with it?"
2) others work outside in, "we want to make better decisions in our real-time system. Let's mock up how a solution could be used, then go back and put the models in place that can achieve this." Technically, you could say this is all part of Business Understanding, when they mock-up what they will do in deployment, I think that is outside the written scope of CRISP-DM
3) the example I gave of building models before cleaning up data to see what you've got to work with, especially with the target variable. Again, technically, a CRISP-DM advocate would say this is part of the data design, or that this is part of the feedback loop between Modeling and Data Preparation, but I don't think this was intended.

I don't have any problem at all with CRISP-DM mind you. I use it and recommend it. It just isn't a recipe. We still need to use our brains and understand why we do which steps at which stages!

January 31, 2011 at 10:46 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home