NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



future: All content tagged as future in NoSQL databases and polyglot persistence

The Future of NoSQL Databases: Hybrid Tools for OLTP and OLAP

John L. Myers has an interesting hypothesis for the future of NoSQL databases based on their capability of handling “unstructured” data:

I think the future of NoSQL platforms is going to reside in the ability of those systems to apply different operational or analytical schemas to multi-structured data sets rather than letting the data reside in a schema-free format. Merely storing multi- structured data sets will not be enough to have a NoSQL platform meet business objectives. The true business value will be in the ability to apply the structures of a particular schema for analysis or for operational workloads in real-time or near real-time.

What Myers suggests here is that storing unstructured data allows an application to define different “schemas” to repurpose the way data is used. In theory this sounds quite interesting. If done dynamically, this could define a system that could provide both OLTP and OLAP features.

The structure of the data has a very important influence on the data access implementation and the simple addition of structure metadata would not lead to the system to continue to perform optimally in various scenarios or for different workloads. Put it differently, OLTP and OLAP systems require data to be organized (and stored) differently in order to handle the different access patterns and different workloads. Switching from one to another while maintaining the characteristics of the system (reliability, performance, stability, etc.) seems to lead to a level of complexity that would be very difficult for a single system to handle.

Original title and link: The Future of NoSQL Databases: Hybrid Tools for OLTP and OLAP (NoSQL database©myNoSQL)


Petabyte Reliable DNA Storage

The abstract of the report “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA“:

This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high- density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. […]We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information10 of 5.2x10^6 bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy.

The article is behind the paywall, but Gizmodo writes the the results published:

[…] they can store 2.2 petabytes of information in a single gram of DNA, and recover it with 100 percent accuracy.

Original title and link: Petabyte Reliable DNA Storage (NoSQL database©myNoSQL)


Salmon DNA Used in Data Storage Device

Salmon … they’re good to eat, provide a livelihood for fishermen, are an important part of their ecosystem, and now it seems that they can store data. More specifically, their DNA can. Scientists from National Tsing Hua University in Taiwan and the Karlsruhe Institute of Technology in Germany have created a “write-once-read-many-times” (WORM) memory device, that combines electrodes, silver nanoparticles, and salmon DNA. While the current device is simply a proof-of-concept model, the researchers have stated that DNA could turn out to be a less expensive alternative to traditional inorganic materials such as silicon.

The most delicious storage solution.

Original title and link: Salmon DNA Used in Data Storage Device (NoSQL database©myNoSQL)


How Does the Future of Computing Look Like?

We’ll get long lasting batteries and teraflops chips, Big Data on Micro servers, GPU-accelerated Databases and super-speed Internet connections with gear that is already available:

Caltech and the University of Victoria have broken the world record for sustained, computer-to-computer transfer over a network. Between the SuperComputing 2011 (SC11) convention in Seattle and the University of Victoria Computer Centre, Canada — a distance of 134 miles (217km) — a transfer rate of 186 gigabits per second was achieved over a 100Gbps bidirectional fiber optic link; 98Gbps in one direction, 88Gbps in the other.

The Future of Computing

From Stanford a lightning-Fast, efficient data transmission:

A team at Stanford’s School of Engineering has demonstrated an ultrafast nanoscale light-emitting diode (LED) that is orders of magnitude lower in power consumption than today’s laser-based systems—”Our device is some 2,000 times more energy efficient than best devices in use today,” said Vuckovic.— and is able to transmit data at the very rapid rate of 10 billion bits per second.

Then from Intel, the Knights Corner one teraflops chip:

The Knights Corner chip acts as a co-processor - taking over some of the most complicated tasks from the computer’s central processing unit (CPU). It packs more than 50 cores - or individual processors - onto a single piece of silicon.

And again from Stanford, a new battery electrode:

A team of researchers from Stanford have developed a new battery electrode that can survive 40,000 charge cycles. That’s about a hundred times more than a normal Lithium-Ion battery, and enough to make it usable for somewhere between 10-30 years.

Last, but not least IBM is looking into GPU-accelerated databases.

Original title and link: The Future of Computing (NoSQL database©myNoSQL)

GPU-Accelerated Databases

Wolfgang Gruener reporting on a new patent filed by IBM:

Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, “GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity.”

IBM GPU-accelerated databases

Amazon has made a move in the GPU-world by offering Cluster GPU instances which can be used for quite a few interesting scenarios.

Original title and link: GPU-Accelerated Databases (NoSQL database©myNoSQL)


Big Data on Micro Servers? You Bet

Derrick Harris (GigaOm):

Online dating service eHarmony is using SeaMicro’s specialized Intel Atom-powered servers as the foundation of its Hadoop infrastructure, demonstrating that big data might be a killer app for low-powered micro servers. The general consensus is that specialized gear from startups such as SeaMicro and Calxeda—which can save money and power by using processors initially designed for netbooks and smartphones instead of servers—will need to attract both applications and big-name users before it really catches on. Big data looks like it might bring both.

Intriguing idea. Amazon seems to be on board already with this direction—the specialized processing infrastructure —by providing the Amazon Cluster GPU.

Original title and link: Big Data on Micro Servers? You Bet (NoSQL database©myNoSQL)


Big Data Marketplaces: The Future?

I read stories about companies/services like Infochimps or Amazon Public Data Sets and I’m wondering if marketplaces represent the/one future of Big Data[1].

And if that’ll be the case, then I have a couple of concerns related to distribution of big data:

  • who decides/regulates data ownership?

    While you might have granted rights to one company to data, I’m pretty sure that in most cases details like selling for profit have not been agreed upon.

  • who decides/regulates the levels of privacy on the data set?

    As proved by Facebook’s history, privacy has different meanings for different entities. And while some ‘anonymization’ might seem enough at fine grain levels, when talking large data sets things may be completely different.

  • who can quantify and/or guarantee the quality of the data sets?

    Leaving aside the different ‘anonymization’ filters applied to ‘clean data’, there can be other causes leading to lowering the quality of the data. Who can clarify, detail, and measure the quality of such big data sets?

  1. A related story about Infochimps attempt to sell a large set of Twitter data can be read here.  ()

Original title and link: Big Data Marketplaces: The Future? (NoSQL databases © myNoSQL)