ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Can We Please Stop Saying "Unstructured" Data?

Grant Ingersoll (LucidWorks) in an article on GigaOm asking for a different term than “unstructured” data to characterize natural form text:

Text is easily one of the most highly structured data types we face, filled with misspellings, misdirection, flowery language, ambiguity and implicit knowledge. Text is so often misunderstood that researchers in the field even have a metric (inter-annotator agreement) that tracks how often two people examining the same piece of text agree on the answer to some question on the text.

Some random thoughts:

  1. Unstructured data doesn’t refer only to natural form text.
  2. Speaking of text, from the 4 years of (Romanian) grammar I’ve learned in school, the thing I remember the best is the countless exceptions. To me that sounds like lack of structure.
  3. I’m pretty sure there are analysts out there that have come up with different terms, but sometimes having everyone understand the meaning of a term is more important than the term itself.

Original title and link: Can We Please Stop Saying “Unstructured” Data? (NoSQL database©myNoSQL)

via: http://gigaom.com/2013/03/17/can-we-please-stop-saying-unstructured-data/