NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data grid: All content tagged as data grid in NoSQL databases and polyglot persistence

GridGain vs Hadoop: GridGrain Is Web Scale

At least as funny as MongoDB is Web Scale. But it wants to be taken seriously.

Original title and link: GridGain vs Hadoop: GridGrain Is Web Scale (NoSQL database©myNoSQL)

GridGain and Hadoop: About Fundamental Flaws

Would you run your analytics today off the tape drives? That’s what you do when you use Hadoop MapReduce.

The fundamental flaw in Hadoop MapReduce is an assumption that a) storing data and b) acting upon data should be based off the same underlying storage.

What Hadoop does is offering an approach for problems where having all data in memory is almost impossible and definitely not cost effective. What GridGain data grid does is offering an approach where having data in memory is cost effective. None of these assumptions are fundamental flaws.

The only fundamental flaw is positioning a product by making the wrong assumptions about alternative solutions. Like we’ve seen it before: NoSQL Wants To Be Elastic Caching When It Grows Up… Does It Really? or In-Memory Elastic Databases.

Original title and link: GridGain and Hadoop: About Fundamental Flaws (NoSQL database©myNoSQL)


Enterprise Caches Versus Data Grids Versus NoSQL Databases

RedHat/JBoss Manik Surtani:

[…] If you want to compare distributed systems, both data grids and NoSQL have kind of come from different starting points, if you will. They solve different problems, but where they stand today they’ve kind of converged. Data grids have been primarily in-memory but now they spill off onto disk and so on and so forth and they’ve added in-query and mapreduce onto it while NoSQL have primarily been on disk, but now cache stuff in-memory anyway for performance. They are starting to look the same now, or are very similar.

One big difference though that I see between data grids and NoSQL, something that still exists today, is how you actually interact with these systems. Data grids tend to be in VM, they tend to be embedded, you tend to launch a Java or JVM program, you tend to connect to a data grid API and you work with it whereas NoSQL tends to be a little bit more client server, a bit more like old-fashion databases where you open a socket to your NoSQL database or your NoSQL grid, if you will, and start talking to it. That’s the biggest difference I see today, but even that will eventually go away.

They seem to converge, but:

  • spilling off to disk is not equivalent to optimized disk access
  • distributed, sometimes even transactional caches are not equivalent with single node caches

Original title and link: Enterprise Caches Versus Data Grids Versus NoSQL Databases (NoSQL database©myNoSQL)


An Alternative Approach for Big Data Real Time Analytics

Starting from the architecture of Facebook’s realtime analytics presented in the paper Apache Hadoop Goes Realtime at Facebook and Dhruba Borthakur’s excellent posts HDFS: Realtime Hadoop and HBase Usage at Facebook, Nati Shalom describes an alternative approach for real-time analytics using data grids making the following assumptions:

They had some assumptions in design that centered around the reliability of in-memory systems and database neutrality that affected what they did: for memory, that transactional memory was unreliable, and for the database, that HBase was the only targeted data store.

What if those assumptions are changed? We can see reliable transactional memory in the field, as a requirement for any in-memory data grid, and certainly there are more databases than HBase; given database and platform neutrality, and reliable transactional memory, how could you build a realtime analytics system?

While a great read, I get the feeling there’s something wrong. Maybe this:

There are lots of areas in which you can see potential improvements, if the assumptions are changed. As a contrast to Facebook’s working system: […] We can consolidate the analytics system so that management is easier and unified. While there are system management standards like SNMP that allow management events to be presented  in the same way no matter the source, having so many different pieces means that managing the system requires an encompassing understanding, which makes maintenance and scaling more difficult.

and then:

One other advantage of data grids is in write-through support. With write-through, updates to the data grid are written asynchronously to a backend data store – which could be HBase (as used by Facebook), Cassandra, a relational database such as MySQL, or any other data medium you choose for long-term storage, should you need that.

Original title and link: An Alternative Approach for Big Data Real Time Analytics (NoSQL database©myNoSQL)