NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data marketplace: All content tagged as data marketplace in NoSQL databases and polyglot persistence

Licensing and Distribution Holding Back the Age of Data

Stephen O’Grady (RedMonk):

Absent a market with well understood licensing and distribution mechanisms, each data negotiation — whether the subject is attribution, exclusivity, license, price or all of the above — is a one off. Without substantial financial incentives, such as the potential returns IBM might see from its vertical Watson applications, few have the patience or resources to pursue datasets individually. We’ve experienced this firsthand, in fact; as we’ve looked for data sources to incorporate into RedMonk Analytics, conversations around licensing have been very uneven.

RedMonk is an analysts company needing the data to improve their services. And they are prepared to pay for accessing it. But what about researchers or market observers like myself?

I’ve expressed concerns in the past about the direction the data market is going. At least in terms of regulations. While I’ve tried to be optimistical about Big Data adoption, Stephen’s post made me realize that we are getting fast into the age of data monopolies.

Welcome back data silos!

Original title and link: Licensing and Distribution Holding Back the Age of Data (NoSQL database©myNoSQL)


How Big Data Anonymization Works

Marketers can break down and manage this information:

  • Distinguish the unique identifier across all the data sources.
  • Connect ad cookie data with web analytics cookie data to build the profile of each unique identifier.
  • Connect that profile with data already logged in from other sources, including profiles with Facebook Twitter IDs.
  • Continue to build on this basic profile, adding new data from sources like Foursquare as they become available.

Now I’ve figured out how it works:

  • tell your users all data is anomymized
  • on the backend use whatever it takes to deanonymize it

So long privacy and on the internet, nobody knows you’re a dog!

On the Internet, nobody knows you're a dog

Original title and link: How Big Data Anonymization Works (NoSQL databases © myNoSQL)


How would an iTunes model for data address licensing and ownership?

Gil Elbaz (Factual)

In the case of iTunes, in a single click I purchase a track, download it, establish licensing rights on my iPhone and up to four other authorized devices, and it’s immediately integrated into my daily life. Similarly, the deepest value will come for a marketplace that, with a single click, allows a developer to license data and have it automatically integrated into their particular application development stack. That might mean having the data instantly accessible via API, automatically replicated to a MySQL server on EC2, synchronized at, or copied to Google App Engine.

An iTunes for data could be priced from a single record/entity to a complete dataset. And it could be licensed for single use, caching allowed for 24 hours, or perpetual rights for a specific application.

I don’t think that moving data around and temporary access are part of this future. Otherwise I agree the model sounds appealing.

Original title and link: How would an iTunes model for data address licensing and ownership? (NoSQL databases © myNoSQL)


Stop Trying to Put a Monetary Value on Data - It's the Wrong Path

Rob Karel:

data in and of itself has no value!
The only value data/information has to offer – and the reason I do still consider it an “asset” at all – is in the context of the business processes, decisions, customer experiences, and competitive differentiators it can enable.

Just a different way to correctly say that BigData is snake oil.

Original title and link: Stop Trying to Put a Monetary Value on Data - It’s the Wrong Path (NoSQL databases © myNoSQL)


5 Criteria to Compare Data Marketplaces

Interesting way to compare data marketplaces:

  • A free level of developer access.
  • A variety of data spanning a wide range of topics.
  • Several different methods to access the data, including at least data dumps (think CSV) and a web API
  • RESTful API that returns JSON; it’s even better if it also has a YQL binding.
  • General-purpose client libraries in your language of choice

Comparing Data Marketplaces

Even if I’d like seeing the above 5 features, I remain with my criteria.

Original title and link: 5 Criteria to Compare Data Marketplaces (NoSQL databases © myNoSQL)


Make Data Available - Open Data Manual

From the Open Data Manual:

Open data needs to be ‘technically’ open as well as legally open. Specifically the data needs be:

  1. Available — at no more than a reasonable cost of reproduction, preferably for free download on the Internet. Summary: publish your information on the Internet wherever possible.
  2. In bulk. The data should be available as a whole (a web API or service may also be very useful but is not a substitute for bulk access)
  3. In an open, machine-readable format. Machine-readability is important because it facilitates reuse, for example, tables of figures in a PDF can be read easily by humans but are very hard for a computer to use which greatly limits the ability to reuse that data.

Sir Tim Berners-Lee’s linked open data star scheme provides an unambiguous way to categorize open data. And while I’m at open data there’s also the Open Data Protocol which is meant to enable the creation of HTTP-based data services.

Original title and link: Make Data Available - Open Data Manual (NoSQL databases © myNoSQL)

Data Privacy and Data Marketplaces Future

Data gathered and sold by RapLeaf can be very specific. According to documents reviewed by the Journal, RapLeaf’s segments recently included a person’s household income range, age range, political leaning, and gender and age of children in the household, as well as interests in topics including religion, the Bible, gambling, tobacco, adult entertainment and “get rich quick” offers. In all, RapLeaf segmented people into more than 400 categories, the documents indicated.

Obscure data ownership + cryptic TOS + unregulated data marketplaces = 1984

Original title and link: Data Privacy and Data Marketplaces Future (NoSQL databases © myNoSQL)


The Market for Online Privacy Heats Up


The Wall Street Journal’s year-long What They Know investigation into online tracking has exposed a fast-growing network of hundreds of companies that collect highly personal details about Internet users—their online activities, political views, health worries, shopping habits, financial situations and even, in some cases, their real names—to feed the $26 billion U.S. online-advertising industry.

Today is harvest time. But tomorrow they’ll start using it. Then selling it. It is your data though. And nobody and nothing protects your interests today or tomorrow.

Original title and link: The Market for Online Privacy Heats Up (NoSQL databases © myNoSQL)


Google Could Make Data Marketplaces Actually Useful

Paul Miller about the possible big data marketplaces evolution thanks to Google’s Public Data Explorer and Dataset Publishing Language (DSPL):

Matters become far more complex when you want to start combining different data sets, even within a single data marketplace. Typically, it’s not what these services are designed for, and typically, there is insufficient metadata to enable sensible combinations. […] Without knowledge of the units used, the newly combined data set is worthless — and, possibly, dangerously misleading.

Sir Tim-Berners Lee realized these issues long time ago and have been talking about linked open data star scheme.

Original title and link: Google Could Make Data Marketplaces Actually Useful (NoSQL databases © myNoSQL)


Big Data Marketplace: Windows Azure Marketplace DataMarket

Just another big data marketplace, this time on Windows Azure:

One stop shop for Data. Get all data you need for your insights: trusted commercial and premium public domain data.

Original title and link: Big Data Marketplace: Windows Azure Marketplace DataMarket (NoSQL databases © myNoSQL)

Big Data Marketplaces: The Future?

I read stories about companies/services like Infochimps or Amazon Public Data Sets and I’m wondering if marketplaces represent the/one future of Big Data[1].

And if that’ll be the case, then I have a couple of concerns related to distribution of big data:

  • who decides/regulates data ownership?

    While you might have granted rights to one company to data, I’m pretty sure that in most cases details like selling for profit have not been agreed upon.

  • who decides/regulates the levels of privacy on the data set?

    As proved by Facebook’s history, privacy has different meanings for different entities. And while some ‘anonymization’ might seem enough at fine grain levels, when talking large data sets things may be completely different.

  • who can quantify and/or guarantee the quality of the data sets?

    Leaving aside the different ‘anonymization’ filters applied to ‘clean data’, there can be other causes leading to lowering the quality of the data. Who can clarify, detail, and measure the quality of such big data sets?

  1. A related story about Infochimps attempt to sell a large set of Twitter data can be read here.  ()

Original title and link: Big Data Marketplaces: The Future? (NoSQL databases © myNoSQL)