November 12, 2007

(Re)-emerging strategies for the “narrative” or “unstructured data” problem.

This article discusses a re-emerging field in predictive analytics called Text Analytics. I say re-emerging, because as the author points out, narrative analysis was a cornerstone of the earliest business intelligence strategies. Today this concept may have utility especially when combined with segmentation or donor-targeting strategies. From prospect management report sheets, phone-a-thon caller logs, to the infamous “other” box on a simple survey question, Text Analytics can provide opportunities for more nuanced insight into the “narrative” data we do have—as well as applications to quantitative models we construct.

One of the fundamental problems of using mathematics to analyze human behavior is the unstructured, or as I like to call it, “narrative” data problem. The amount of purely numerical or quantifiable information available to those in the predictive analytics field is limited—and what this quantifiable information available can tell you is variable as well. I consider non-profit or fundraising analytics to be more opaque than for-profit sectors in respect to this reality. Individuals, on a basic level, need to purchase goods and services. Therefore intent and preference are more transparent. In for-profits, purchasing a product can imply a variety of affinity relationships; this product is a necessity, I prefer this product to other similar products, etc.

Philanthropic giving, monetary or in-kind, is less clear in respect to quantifiable variables producing specific affinity. Attitudes towards institutions or missions may often be more personal than the type of soap you buy, so a donation may imply high affinity. The source of affinity however, can differ greatly: I am an alumnus, my child was a patient, the institution is important to the community, I like the sports teams, etc. Also the absence of immediately available options (there are no supermarkets to choose between charitable organizations) makes comparisons difficult as well. Giving data, capacity rating, alumni classification are all quantifiable values, but some more “narrative” fields like the basic question, “why is giving to us important to you” are more complex.

While the technology for Text Analysis may be more complex and costly than many organizations care to absorb, I believe this represents a very exciting frontier; making predictive modeling more accurate, dynamic, and relevant.

Text analytics is a new IT discipline that has already proved itself in applications ranging from pharmaceutical drug discovery to counter-terrorism to survey analysis, in science, government, and industry. It is poised to break out into the broader analytics market, in workbench form, integrated with business intelligence solutions, embedded in line-of-business applications, and enabling semantic search.

Text analytics is an answer to the “unstructured data” problem, which is best expressed by the truism that eighty percent of enterprise information originates and is locked in “unstructured” form. That problem has been recognized for decades. In fact, the first definition of business intelligence (BI) itself, in an October 1958 IBM Journal article by H.P. Luhn, A Business Intelligence System, describes a system that will:

“…utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points.”

So we see that the earliest BI focus was on text – on extraction, categorization, and classification rather than on numerical data!


Read More

Labels: ,