January 29, 2011

RFM: Not a substitute for predictive modeling

Just a note to thank Kevin MacDonell for the mention on his CoolData blog. If by chance there are still a few of you that don't follow his blog, please do so. You will be smarter.

One encouragement about software in response: If you are able to explore the world of statistics software (SPSS, SAS, others), you may find that it makes many things much easier than using Excel or other more generic products. They've come a long way from the syntax-only days. Some of us still like the syntax (we may even dream about it--did I just write that?!?). But, really, you can point & click through seemingly complex calculations. Maybe DIY learning would seem daunting, but the Prospect-DMM and Fundraising Analytics Forum communities are a very sharing bunch.

I've worked with several clients who were able to build regression models for major and planned giving by day two of working together. We could usually get to producing an RFM score in a morning.

January 24, 2011

CRISP-DM: does this really capture our work?


The Cross Industry Standard Process for Data Mining--the time honored road map for building data-mining, evaluation and deployment, and ultimately building a self-sustaining cycle of new information, new insight and new analysis. As a construct it is both intuitive and transformational; identifying small steps as processes and linking them to in a larger approach to successful predictive analysis.

However I question how often it is ever truly realized? Is there merely an aspirant blue print? The white whale of our data-mining efforts? In the non-profit space I have a difficult time thinking of organizations where this has become organic.

I reviewed this after reading a recent posting suggesting many projects and even organizations start at very different points in this process. Sometimes the same organizations may start at a different place depending on the project design, data available, timeline, resources available etc.

Just curious what others think of CRISP-DM. Is it a firm road map to successful data-mining, or does it suggest merely an outline of processes that is malleable?


Doing Data Mining Out of Order
I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don't always; data mining often requires more creativity and "art" to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn't always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.


Read more

Labels: , ,

January 19, 2011

Some great statistics reads...

If you are into analytics or even basic statistics Andrew Gelman is a guy you should be aware of.  He has some similarities to Steven Levitt of Freakenomics fame (both award winning professors under the age of 40) but Gelman is more focused on stats than the self professed math-novice Levitt. Gelman is also a social scientist while Levitt is proudly homo-economicus. They both love to ask and answer questions using data.

Gelman has also produced what is in my mind the best definition of what statistics is: "the study of uncertainty and variation".

Here is a link to Gelman's stats reading list--a pretty broad selection and titles any quant-head should have on their shelves.

As a personal note I would add "Against the Gods - the remarkable story of risk" to this list as the DonorCast selection.

Andrew Gelman on Statistics
Award-winning statistician and political scientist Andrew Gelman says that uncertainty is an important part of life, and recognition of that uncertainty is itself an important step. This is where statistics can help us

Read more

Labels: , ,