data grid: All content tagged as data grid in NoSQL databases and polyglot persistence
Friday, 30 March 2012
GridGain vs Hadoop: GridGrain Is Web Scale
At least as funny as MongoDB is Web Scale. But it wants to be taken seriously.
Original title and link: GridGain vs Hadoop: GridGrain Is Web Scale (©myNoSQL)
GridGain and Hadoop: About Fundamental Flaws
Would you run your analytics today off the tape drives? That’s what you do when you use Hadoop MapReduce.
The fundamental flaw in Hadoop MapReduce is an assumption that a) storing data and b) acting upon data should be based off the same underlying storage.
What Hadoop does is offering an approach for problems where having all data in memory is almost impossible and definitely not cost effective. What GridGain data grid does is offering an approach where having data in memory is cost effective. None of these assumptions are fundamental flaws.
The only fundamental flaw is positioning a product by making the wrong assumptions about alternative solutions. Like we’ve seen it before: NoSQL Wants To Be Elastic Caching When It Grows Up… Does It Really? or In-Memory Elastic Databases.
Original title and link: GridGain and Hadoop: About Fundamental Flaws (©myNoSQL)
via: http://gridgaintech.wordpress.com/2012/03/28/gridgain-and-hadoop/
Monday, 12 December 2011
Enterprise Caches Versus Data Grids Versus NoSQL Databases
RedHat/JBoss Manik Surtani:
[…] If you want to compare distributed systems, both data grids and NoSQL have kind of come from different starting points, if you will. They solve different problems, but where they stand today they’ve kind of converged. Data grids have been primarily in-memory but now they spill off onto disk and so on and so forth and they’ve added in-query and mapreduce onto it while NoSQL have primarily been on disk, but now cache stuff in-memory anyway for performance. They are starting to look the same now, or are very similar.
One big difference though that I see between data grids and NoSQL, something that still exists today, is how you actually interact with these systems. Data grids tend to be in VM, they tend to be embedded, you tend to launch a Java or JVM program, you tend to connect to a data grid API and you work with it whereas NoSQL tends to be a little bit more client server, a bit more like old-fashion databases where you open a socket to your NoSQL database or your NoSQL grid, if you will, and start talking to it. That’s the biggest difference I see today, but even that will eventually go away.
They seem to converge, but:
- spilling off to disk is not equivalent to optimized disk access
- distributed, sometimes even transactional caches are not equivalent with single node caches
Original title and link: Enterprise Caches Versus Data Grids Versus NoSQL Databases (©myNoSQL)
Thursday, 4 August 2011
An Alternative Approach for Big Data Real Time Analytics
Starting from the architecture of Facebook’s realtime analytics presented in the paper Apache Hadoop Goes Realtime at Facebook and Dhruba Borthakur’s excellent posts HDFS: Realtime Hadoop and HBase Usage at Facebook, Nati Shalom describes an alternative approach for real-time analytics using data grids making the following assumptions:
They had some assumptions in design that centered around the reliability of in-memory systems and database neutrality that affected what they did: for memory, that transactional memory was unreliable, and for the database, that HBase was the only targeted data store.
What if those assumptions are changed? We can see reliable transactional memory in the field, as a requirement for any in-memory data grid, and certainly there are more databases than HBase; given database and platform neutrality, and reliable transactional memory, how could you build a realtime analytics system?
While a great read, I get the feeling there’s something wrong. Maybe this:
There are lots of areas in which you can see potential improvements, if the assumptions are changed. As a contrast to Facebook’s working system: […] We can consolidate the analytics system so that management is easier and unified. While there are system management standards like SNMP that allow management events to be presented in the same way no matter the source, having so many different pieces means that managing the system requires an encompassing understanding, which makes maintenance and scaling more difficult.
and then:
One other advantage of data grids is in write-through support. With write-through, updates to the data grid are written asynchronously to a backend data store – which could be HBase (as used by Facebook), Cassandra, a relational database such as MySQL, or any other data medium you choose for long-term storage, should you need that.
Original title and link: An Alternative Approach for Big Data Real Time Analytics (©myNoSQL)