ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

HBase: All content tagged as HBase in NoSQL databases and polyglot persistence

Hadoop and big data: Where Apache Slider slots in and why it matters

Arun Murthy for ZDNet about Apache Slider:

Slider is a framework that allows you to bridge existing always-on services and makes sure they work really well on top of YARN without having to modify the application itself. That’s really important.

Right now it’s HBase and Accumulo but it could be Cassandra, it could be MongoDB, it could be anything in the world. That’s the key part.

I couldn’t find the project on the Incubator page.

Original title and link: Hadoop and big data: Where Apache Slider slots in and why it matters (NoSQL database©myNoSQL)

via: http://www.zdnet.com/hadoop-and-big-data-where-apache-slider-slots-in-and-why-it-matters-7000028073/


HBase block caches - Optimizing for random reads

Great post by Nick Dimiduk1 covering the whats, whys, and hows of caching data blocks in HBase, the mechanism through which HBase is optimizing random reads2:

There is a single BlockCache instance in a region server, which means all data from all regions hosted by that server share the same cache pool. The BlockCache is instantiated at region server startup and is retained for the entire lifetime of the process. Traditionally, HBase provided only a single BlockCache implementation: the LruBlockCache. The 0.92 release introduced the first alternative in HBASE-4027: the SlabCache. HBase 0.96 introduced another option via HBASE-7404, called the BucketCache.


  1. Nick Dimiduk works at Hortonworks and is the co-author of HBase in Action

  2. For optimizing recent edits, HBase has another mechanism, the MemStore

Original title and link: HBase block caches - Optimizing for random reads (NoSQL database©myNoSQL)

via: http://www.n10k.com/blog/blockcache-101/


MySQL is a great Open Source project. How about open source NoSQL databases?

In a post titled Some myths on Open Source, the way I see it, Anders Karlsson writes about MySQL:

As far as code, adoption and reaching out to create an SQL-based RDBMS that anyone can afford, MySQL / MariaDB has been immensely successful. But as an Open Source project, something being developed together with the community where everyone work on their end with their skills to create a great combined piece of work, MySQL has failed. This is sad, but on the other hand I’m not so sure that it would have as much influence and as wide adoption if the project would have been a “clean” Open Source project.

The article offers a very black-and-white perspective on open source versus commercial code. But that’s not why I’m linking to it.

The above paragraph made me think about how many of the most popular open source NoSQL databases would die without the companies (or people) that created them.

Here’s my list: MongoDB, Riak, Neo4j, Redis, Couchbase, etc. And I could continue for quite a while considering how many there are out there: RavenDB, RethinkDB, Voldemort, Tokyo, Titan.

Actually if you reverse the question, the list would get extremely short: Cassandra, CouchDB (still struggling though), HBase. All these were at some point driven by community. Probably the only special case could be LevelDB.

✚ As a follow up to Anders Karlsson post, Robert Hodges posted The Scale-Out Blog: Why I Love Open Source.

Original title and link: MySQL is a great Open Source project. How about open source NoSQL databases? (NoSQL database©myNoSQL)

via: http://karlssonondatabases.blogspot.com/2014/01/some-myths-on-open-source-way-i-see-it.html


An intro to HBase’s Thrift interface

If you’ve never used Thrift (with or without HBase), the two articles authored by Jesse Anderson and posted on Cloudera’s blog will give you both a quick intro and

  1. How-to: Use the HBase Thrift Interface, Part 1: setting up, getting the language bindings, and connecting;
  2. How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows: using HBase’s Thrift API from Python

Original title and link: An intro to HBase’s Thrift interface (NoSQL database©myNoSQL)


Approaches to Backup and Disaster Recovery in HBase

This shouldmust be part of your HBase operational manual:

Let’s start with the least disruptive, smallest data footprint, least performance-impactful mechanism and work our way up to the most disruptive, forklift-style tool:

  • Snapshots
  • Replication
  • Export
  • CopyTable
  • HTable API
  • Offline backup of HDFS data

HBase backup strategies

When you return to the office after the winter holiday make sure you take a copy of this with you and pass it around.

Original title and link: Approaches to Backup and Disaster Recovery in HBase (NoSQL database©myNoSQL)

via: http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/


Dropbox: Challenges in mirroring large MySQL systems to HBase

A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:

MySQL benefits:

  • fast queries for known keys over a (relatively) small dataset
  • high read throughput

HBase benetits:

  • high write throughput
  • large suite of pre-existing tools for distributed computation
  • easier to perform large processing tasks

✚ Both are consistent

✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)


Apache HBase 0.96.0 released after more than 2000 issues resolved

This is a an important release for HBase. Both Hortonworks and Cloudera have posts covering it:

HBase 0.94 has been released over a year and a half ago.

Original title and link: Apache HBase 0.96.0 released after more than 2000 issues resolved (NoSQL database©myNoSQL)


Results of collaboration on improving the Mean Time to Recovery in HBase

Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:

  • Node/RegionServer failures while writing
  • Node/RegionServer failures while reading

Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase (NoSQL database©myNoSQL)


A prolific season for Hadoop and its ecosystem

In 4 years of writing this blog I haven’t seen such a prolific month:

  • Apache Hadoop 2.2.0 (more links here)
  • Apache HBase 0.96 (here and here)
  • Apache Hive 0.12 (more links here)
  • Apache Ambari 1.4.1
  • Apache Pig 0.12
  • Apache Oozie 4.0.0
  • Plus Presto.

Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.

Original title and link: A prolific season for Hadoop and its ecosystem (NoSQL database©myNoSQL)


Cloudera Announces Support for Apache Accumulo - what, how, why

Cloudera, the leader in enterprise analytic data management powered byApache Hadoop™, today announced its formal support for, and integration with, Apache Accumulo, a highly distributed, massively parallel processing database that is capable of analyzing structured and unstructured data and delivers fine-grained user access control and authentication. Accumulo uniquely enables system administrators to assign data access at the cell- level, ensuring that only authorized users can view and manipulate individual data points. This increased control allows a database to be accessed by a maximum number of users, while remaining compliant with data privacy and security regulations.

What about HBase?

Mike Olson:

It offers a strong complement to HBase, which has been part of our CDH offering since 2010, and remains the dominant high-performance delivery engine for NoSQL workloads running on Hadoop. However, Accumulo was expressly built to augment sensitive data workloads with fine-grained user access and authentication controls that are of mission-critical importance for federal and highly regulated industries.

The way I read this is: if you don’t need security go with HBase. If you need advanced security features you go with Accumulo.

How?

While there aren’t any details about what formal support means, I assume Cloudera will start offering Accumulo as an alternative to HBase.

CE_diagram

I might be wrong though about Accumulo being a replacement for HBase. I’d love to learn how and why the 2 would co-exist.

Why?

The obvious reason is that Cloudera wants to get into government and super-regulated markets contracts where security is a top requirement.

Another reason might be that Cloudera is continuing to expand its portfolio to catch as many customers as possible. Something à la Oracle or IBM. The alternative would be to stay focused. Like Teradata.

Original title and link: Cloudera Announces Support for Apache Accumulo (NoSQL database©myNoSQL)

via: http://www.cloudera.com/content/cloudera/en/about/press-center/press-releases/release.html?ReleaseID=1859607


Hoya, HBase on YARN, Architecture

The architecture of HBase on top of YARN, a project named Hoya:

Hoya-Application-Architecture

The main question I had about what YARN would bring to HBase is answered in the post. But I’m still not sure I get the whole picture of how YARN improves HBase’s availability (if it does it):

YARN keeps an eye on the health of the containers, telling the AM when there is a problem. It also monitors the Hoya AM itself. When the AM fails, YARN allocates a new container for it, and restarts it. This provides an availability solution to Hoya without it having to code it in itself.

Original title and link: Hoya, HBase on YARN, Architecture (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/hoya-hbase-on-yarn-application-architecture/


Big Data Debate: HBase or Cassandra

This debate about the pros and cons of HBase and Cassandra set up by Doug Henschen for InformationWeek and featuring Jonathan Ellis (Cassandra, DataStax) and Michael Hausenbias (MapR) will stir some strong feelings:

Michael Hausenbias: An interesting proof point for the superiority of HBase is the fact that Facebook, the creator of Cassandra, replaced Cassandra with HBase for their internal use.

Jonathan Ellis: The technical shortcomings driving HBase’s lackluster adoption fall into two major categories: engineering problems that can be addressed given enough time and manpower, and architectural flaws that are inherent to the design and cannot be fixed.

✚ One question I couldn’t answer about this dialog is why HBase-side wasn’t covered by either a HBase community member or a user. Indeed MapR has interest in HBase, but their product is not HBase.

Original title and link: Big Data Debate: HBase or Cassandra (NoSQL database©myNoSQL)

via: http://www.informationweek.com/software/enterprise-applications/big-data-debate-will-hbase-become-domina/240159475?nomobile=1