March 30, 2007

Data Classification: Brains or Brawn

Elements of data classification may apply strongly to your data mining program. I recommend building consolidated classification coding systems for attribute, interest, and funding categories. For example engineering graduates may have an engineering interest, which rolls-up into a science interest, which rolls into the constituency pool for science and technology. When a person notes their interest in engineering on a survey, attends an engineering event, or gives to engineering, they join this pool as well.

By "smart-coding" your entire systems into these categories, you will multiply the availability of independent characteristics for predictive modeling. Similar work might be done for occupations and industries. The manual mapping is the most difficult step in these classification projects.

On a deeper level, Here is an article on data classification for the techies on the list.

The current state of data classification is largely a byproduct of historical, hierarchical storage management (HSM) implementations where data age is the primary classification criterion. Early visions of classifying data based on business value never fully came to fruition because it required a manual, brute force approach and was too hard to automate. Age-based classification enabled automation processes to be more easily applied to data classification initiatives and became the de facto standard.

Read More

Labels: ,