NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MongoDB is growing up

If Curt Monash says so…

With that caveat, the MongoDB rewrite story is something like:

  • Updating has been reworked. Most of the benefits are coming later.
  • Query optimization and execution have been reworked. Most of the benefits are coming later, except that …
  • … you can now directly filter on multiple indexes in one query; previously you could only simulate doing that by pre-building a compound index.
  • One of those future benefits is more index types, for example R-trees or inverted lists.
  • Concurrency improvements are down the road.
  • So are rewrites of the storage layer, including the introduction of compression.

Original title and link: MongoDB is growing up (NoSQL database©myNoSQL)


Eventual consistency models: release consistency

William Sharp:

In order to write data, a node must acquire a memory object. When the write is complete, the node releases the memory object.

The model is implemented by distributing all the writes across the system before another node can acquire the memory object.

Sounds like distributed locks. In a distributed system.

Original title and link: Eventual consistency models: release consistency (NoSQL database©myNoSQL)


maxTimeMS in MongoDB 2.6

Jason McCay (MongoHQ) explains the new maxTimeMS API in MongoDB 2.6:

There are a number of scenarios where a flag like this can be helpful. For example, if you are in discovery mode and want to protect your database performance against unintended runaway operations, you could ensure all your queries include this flag.

Another scenario would be the batching of results, allowing you to define the amount of time/effort the database should spend returning results until it quits and moves on to the next request. In this situation, the cursor would continue to return results until the allotted amount of time has expired.

Original title and link: maxTimeMS in MongoDB 2.6 (NoSQL database©myNoSQL)


Hadoop distro for IBM's Mainframe

IBM and its partner Veristorm are working to merge the worlds of big data and Big Iron with zDoop, a new offering unveiled last week that offers Apache Hadoop running in the mainframe’s Linux environment.

3 hip hip hoorays for Hadoop on mainframes.

Original title and link: Hadoop distro for IBM’s Mainframe (NoSQL database©myNoSQL)


Diving into H2O with R

Joseph Rickert on how the oxdata H20 engine integrate with R:

The R H2O package communicates with the H2O JVM over a REST API. R sends RCurl commands and H2O sends back JSON responses. Data ingestion, however, does not happen via the REST API. Rather, an R user calls a function that causes the data to be directly parsed into the H2O KV store. The H2O R package provides several functions for doing this Including: h20.importFile() which imports and parses files from a local directory, h20.importURL() which imports and pareses files from a website, an

Original title and link: Diving into H2O with R (NoSQL database©myNoSQL)


Which companies produce more than 10TB of data per day?

Couple of interesting answers on Quora, but this part from Michael E. Driscoll’s answer is quite interesting:

You could even get 100s of daily TBs of data yourself:  if you can afford the network bandwidth fees, there are ~100 marketplaces (Twitter’s MoPub, Google’s AdX, Facebook’s FBX to name a few) that surface approximately 200 Billion advertising auctions per day.  You can build a bidder, get a seat on their exchanges, and make millions of daily trades — you’ll just need to convince a brand to act as their broker, and take your 20% cut of spend.

Original title and link: Which companies produce more than 10TB of data per day? (NoSQL database©myNoSQL)


Getting started with Neo4j 2.0

Very good introductory post by Jim Webber about Neo4j and some of the new features in the 2.0 release:

In this article we’ve seen how Neo4j 2.0 and the new version of the Cypher query language can be used to store and query a range of retail data from product catalogue to customer purchases. We also saw how straightforward it was to quickly gain insight from that data, despite the domain being highly and intricately connected.

Original title and link: Getting started with Neo4j 2.0 (NoSQL database©myNoSQL)


Hadoop and big data: Where Apache Slider slots in and why it matters

Arun Murthy for ZDNet about Apache Slider:

Slider is a framework that allows you to bridge existing always-on services and makes sure they work really well on top of YARN without having to modify the application itself. That’s really important.

Right now it’s HBase and Accumulo but it could be Cassandra, it could be MongoDB, it could be anything in the world. That’s the key part.

I couldn’t find the project on the Incubator page.

Original title and link: Hadoop and big data: Where Apache Slider slots in and why it matters (NoSQL database©myNoSQL)


Price Comparison for Big Data Appliance and Hadoop

The main differences between Oracle Big Data Appliance and a DIY approach are:

  1. A DIY system - at list price with basic installation but no optimization - is a staggering $220 cheaper as an initial purchase
  2. A DIY system - at list price with basic installation but no optimization - is almost $250,000 more expensive over 3 years.
  3. The support for the DIY system includes five (5) vendors. Your hardware support vendor, the OS vendor, your Hadoop vendor, your encryption vendor as well as your database vendor. Oracle Big Data Appliance is supported end-to- end by a single vendor: Oracle
  4. Time to value. While we trust that your IT staff will get the DIY system up and running, the Oracle system allows for a much faster “loading dock to loading data” time. Typically a few days instead of a few weeks (or even months)
  5. Oracle Big Data Appliance is tuned and configured to take advantage of the software stack, the CPUs and InfiniBand network it runs on
  6. Any issue we, you or any other BDA customer finds in the system is fixed for all customers. You do not have a unique configuration, with unique issues on top of the generic issues.

This is coming from Oracle. Now, without nitpicking prices — I’m pretty sure you’ll find better numbers for the different components — how do you sell Hadoop to the potential customer that took a look at this?

Original title and link: Price Comparison for Big Data Appliance and Hadoop (NoSQL database©myNoSQL)


Hadoop analytics startup Karmasphere sells itself to FICO

Derrick Harris (GigaOm):

The Fair Isaac Corporation, better known as FICO, has acquired the intellectual property of Hadoop startup Karmasphere. Karmasphere launched in 2010, and was one of the first companies to push the idea of an easy, visual interface for analyzing Hadoop data, and even analyzing it using traditional SQL queries.

Original title and link: Hadoop analytics startup Karmasphere sells itself to FICO (NoSQL database©myNoSQL)


Why the clock is ticking for MongoDB

Robert Haas takes a comparative look at PostgreSQL and MongoDB’s features emphasized by its MongoDB CEO in an interview:

Schireson also mentions another advantage of document stores: schema flexibility. Of course, he again ignores the possible advantages, for some users, of a fixed schema, such as better validity checking. But more importantly, he ignores the fact that relational databases such as PostgreSQL have had similar capabilities since before MongoDB existed. PostgreSQL’s hstore, which provides the ability to store and index collections of key-value pairs in a fashion similar to what MongoDB provides, was first released in December of 2006, the year before MongoDB development began. True JSON capabilities were added to the PostgreSQL core as part of the 9.2 release, which went GA in September of 2012. The 9.4 release, expected later this year, will greatly expand those capabilities. In today’s era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.

It’s difficult impossible to debate or contradict the majority of facts and arguments the author is making. But in order to understand the history and future of developer tools, it’s worth emphasizing one aspect that has been almost completely ignored for way too long. — and the author mentions it just briefly.

Developers want to get things done. Fast and Easy.

For too long vendors thought that a tool that had a feature covered was enough. Even if the user had to read a book or two, hire an army of consultants, postpone the deadlines, and finally make three incantations to get it working. This strategy worked well for decades. It worked especially well in the space of databases where buying decisions where made at the top level due to the humongous costs.

MySQL became one of the most popular database because it was free and perceived to be easier than any of the alternatives. Not because it was first. Not because it was feature complete. And definitely not because it was technically superior — PostgreSQL was always technically superior, but never got the install base MySQL got.

MongoDB replays this story by the book. It’s free. It promises features that were missing or are considered complicated in the other products. And it’s perceived as the easiest to use database — a look at MongoDB’s history will reveal immediately its primary focus on ease of use: great documentation, friendly setup, fast getting started experience. For a lot of people, it really doesn’t matter anymore that there are alternative solutions that offer technically superior solutions. They’ve got their things done. Fast and Easy. Tomorrow is another day.

Original title and link: Why the clock is ticking for MongoDB (NoSQL database©myNoSQL)


We will find the author of the Bitcoin whitepaper even if he doesn’t want us to

Nermin Hajdarbegovic (CoinDesk):

A group of forensic linguistics experts from Aston University believe the real creator of bitcoin is former law professor Nick Szabo.

Dr. Grieve explained:

The number of linguistic similarities between Szabo’s writing and the bitcoin whitepaper is uncanny, none of the other possible authors were anywhere near as good of a match.

Privacy is all gone.

Original title and link: We will find the author of the Bitcoin whitepaper even if he doesn’t want us to (NoSQL database©myNoSQL)