Can We Please Stop Saying "Unstructured" Data?

Grant Ingersoll (LucidWorks) in an article on GigaOm asking for a different term than “unstructured” data to characterize natural form text:

Text is easily one of the most highly structured data types we face, filled with misspellings, misdirection, flowery language, ambiguity and implicit knowledge. Text is so often misunderstood that researchers in the field even have a metric (inter-annotator agreement) that tracks how often two people examining the same piece of text agree on the answer to some question on the text.

Some random thoughts:

  1. Unstructured data doesn’t refer only to natural form text.
  2. Speaking of text, from the 4 years of (Romanian) grammar I’ve learned in school, the thing I remember the best is the countless exceptions. To me that sounds like lack of structure.
  3. I’m pretty sure there are analysts out there that have come up with different terms, but sometimes having everyone understand the meaning of a term is more important than the term itself.

