ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

hbase: All content about hbase in NoSQL databases and polyglot persistence

Consensus-based replication in HBase

Konstantin Boudnik (WANdisco):

The idea behind consensus-based replication is pretty simple: instead of trying to guarantee that all replicas of a node in the system are synced post-factum to an operation, such a system will coordinate the intent of an operation. If a consensus on the feasibility of an operation is reached, it will be applied by each node independently. If consensus is not reached, the operation simply won’t happen. That’s pretty much the whole philosophy.

Not enough details, but doesn’t this sound like Paxos applied earlier?

Original title and link: Consensus-based replication in HBase (NoSQL database©myNoSQL)

via: http://blogs.wandisco.com/2014/06/16/consunsus-based-replication-hbase/


Choice of NoSQL databases from Cloudera

Adam Fowler1 looks at the potential confusion for Cloudera’s customers when talking about NoSQL databases:

As for Cloudera customers I’m not too sure. It may confuse people asking Cloudera about NoSQL. Below is a potential conversation that, as a sales engineer for NoSQL vendor MarkLogic, I can see easily happening:

This announcement struck me as being too publicized — it’s normal for companies with similar interests to partner, but a fair amount of care should be put into clearing all possible confusions and I don’t think this happened.

Just to summarize: Cloudera provides support for HBase and Accumulo. And it has a deal with MongoDB and Oracle. I assume in the sale process, Cloudera will go with: “we work with whatever you already have in place”. As for recommending a NoSQL solution for their customers, it will probably go as in Adam Fowler’s post. To which we could probably add Oracle too.


  1. Adam Fowler works for MarkLogic. 

Original title and link: Choice of NoSQL databases from Cloudera (NoSQL database©myNoSQL)

via: http://adamfowlerml.wordpress.com/2014/05/05/choice-of-nosql-databases-from-cloudera/


OhmData C5: an improved HBase

You’ll probably recognize the names behind OhmData and their improved HBase product C5. In their own HN words:

  • We say we can do failover in a couple of seconds. We want to make it subsecond, but we can’t do that reliably yet. In HBase this story is much more mixed.
  • We wanted to really reduce complexity, as a result, you can just apt-get install c5 on each node and you are done. It’s one daemon, one log file, and that’s it. No xmx nonsense, and almost no tuning or config files. I don’t know if you have dealt with hadoop before, but the complexity is high.
  • Finally we have a much more advanced wireformat. In fact it’s advanced by being simple (protobufs + http). As a result clients in languages other than java become very easy, without a thrift client.

Are we in a new stage of NoSQL databases: “X that doesn’t suck”?

Original title and link: OhmData C5: an improved HBase (NoSQL database©myNoSQL)


Hadoop and big data: Where Apache Slider slots in and why it matters

Arun Murthy for ZDNet about Apache Slider:

Slider is a framework that allows you to bridge existing always-on services and makes sure they work really well on top of YARN without having to modify the application itself. That’s really important.

Right now it’s HBase and Accumulo but it could be Cassandra, it could be MongoDB, it could be anything in the world. That’s the key part.

I couldn’t find the project on the Incubator page.

Original title and link: Hadoop and big data: Where Apache Slider slots in and why it matters (NoSQL database©myNoSQL)

via: http://www.zdnet.com/hadoop-and-big-data-where-apache-slider-slots-in-and-why-it-matters-7000028073/


HBase block caches - Optimizing for random reads

Great post by Nick Dimiduk1 covering the whats, whys, and hows of caching data blocks in HBase, the mechanism through which HBase is optimizing random reads2:

There is a single BlockCache instance in a region server, which means all data from all regions hosted by that server share the same cache pool. The BlockCache is instantiated at region server startup and is retained for the entire lifetime of the process. Traditionally, HBase provided only a single BlockCache implementation: the LruBlockCache. The 0.92 release introduced the first alternative in HBASE-4027: the SlabCache. HBase 0.96 introduced another option via HBASE-7404, called the BucketCache.


  1. Nick Dimiduk works at Hortonworks and is the co-author of HBase in Action

  2. For optimizing recent edits, HBase has another mechanism, the MemStore

Original title and link: HBase block caches - Optimizing for random reads (NoSQL database©myNoSQL)

via: http://www.n10k.com/blog/blockcache-101/


MySQL is a great Open Source project. How about open source NoSQL databases?

In a post titled Some myths on Open Source, the way I see it, Anders Karlsson writes about MySQL:

As far as code, adoption and reaching out to create an SQL-based RDBMS that anyone can afford, MySQL / MariaDB has been immensely successful. But as an Open Source project, something being developed together with the community where everyone work on their end with their skills to create a great combined piece of work, MySQL has failed. This is sad, but on the other hand I’m not so sure that it would have as much influence and as wide adoption if the project would have been a “clean” Open Source project.

The article offers a very black-and-white perspective on open source versus commercial code. But that’s not why I’m linking to it.

The above paragraph made me think about how many of the most popular open source NoSQL databases would die without the companies (or people) that created them.

Here’s my list: MongoDB, Riak, Neo4j, Redis, Couchbase, etc. And I could continue for quite a while considering how many there are out there: RavenDB, RethinkDB, Voldemort, Tokyo, Titan.

Actually if you reverse the question, the list would get extremely short: Cassandra, CouchDB (still struggling though), HBase. All these were at some point driven by community. Probably the only special case could be LevelDB.

✚ As a follow up to Anders Karlsson post, Robert Hodges posted The Scale-Out Blog: Why I Love Open Source.

Original title and link: MySQL is a great Open Source project. How about open source NoSQL databases? (NoSQL database©myNoSQL)

via: http://karlssonondatabases.blogspot.com/2014/01/some-myths-on-open-source-way-i-see-it.html


An intro to HBase’s Thrift interface

If you’ve never used Thrift (with or without HBase), the two articles authored by Jesse Anderson and posted on Cloudera’s blog will give you both a quick intro and

  1. How-to: Use the HBase Thrift Interface, Part 1: setting up, getting the language bindings, and connecting;
  2. How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows: using HBase’s Thrift API from Python

Original title and link: An intro to HBase’s Thrift interface (NoSQL database©myNoSQL)


Approaches to Backup and Disaster Recovery in HBase

This shouldmust be part of your HBase operational manual:

Let’s start with the least disruptive, smallest data footprint, least performance-impactful mechanism and work our way up to the most disruptive, forklift-style tool:

  • Snapshots
  • Replication
  • Export
  • CopyTable
  • HTable API
  • Offline backup of HDFS data

HBase backup strategies

When you return to the office after the winter holiday make sure you take a copy of this with you and pass it around.

Original title and link: Approaches to Backup and Disaster Recovery in HBase (NoSQL database©myNoSQL)

via: http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/


Dropbox: Challenges in mirroring large MySQL systems to HBase

A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:

MySQL benefits:

  • fast queries for known keys over a (relatively) small dataset
  • high read throughput

HBase benetits:

  • high write throughput
  • large suite of pre-existing tools for distributed computation
  • easier to perform large processing tasks

✚ Both are consistent

✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)


Apache HBase 0.96.0 released after more than 2000 issues resolved

This is a an important release for HBase. Both Hortonworks and Cloudera have posts covering it:

HBase 0.94 has been released over a year and a half ago.

Original title and link: Apache HBase 0.96.0 released after more than 2000 issues resolved (NoSQL database©myNoSQL)


Results of collaboration on improving the Mean Time to Recovery in HBase

Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:

  • Node/RegionServer failures while writing
  • Node/RegionServer failures while reading

Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase (NoSQL database©myNoSQL)


A prolific season for Hadoop and its ecosystem

In 4 years of writing this blog I haven’t seen such a prolific month:

  • Apache Hadoop 2.2.0 (more links here)
  • Apache HBase 0.96 (here and here)
  • Apache Hive 0.12 (more links here)
  • Apache Ambari 1.4.1
  • Apache Pig 0.12
  • Apache Oozie 4.0.0
  • Plus Presto.

Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.

Original title and link: A prolific season for Hadoop and its ecosystem (NoSQL database©myNoSQL)