column store: All content tagged as column store in NoSQL databases and polyglot persistence
Since announcing the GA couple of weeks ago, I’ve been noticing quite a few data related posts on the Google Compute Engine blog:
- Mon., 9th: DataStax Enterprise feels right at home in Google Compute Engine
- Tue., 10th: DataTorrent offers massive-scale, real-time stream analytics on Google Compute Engine
- Thu., 12th: Qubole helps you run Hadoop on Google Compute Engine
If you look at these, you’ll notice a theme: covering data from every angle; Cassandra/DSE from DataStax for OLTP, DataTorrent for stream processing, Qubole for Hadoop, MapR for their Hadoop-like solution. I can see this continuing for a while and making Google Compute Engine a strong competitor for Amazon Web Services.
One question remains though: will they be able to come up with a good integration strategy for all these 3rd party tools?
Original title and link: Google Compute Engine and Data ( ©myNoSQL)
If you’ve never used Thrift (with or without HBase), the two articles authored by Jesse Anderson and posted on Cloudera’s blog will give you both a quick intro and
- How-to: Use the HBase Thrift Interface, Part 1: setting up, getting the language bindings, and connecting;
- How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows: using HBase’s Thrift API from Python
Original title and link: An intro to HBase’s Thrift interface ( ©myNoSQL)
A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:
- fast queries for known keys over a (relatively) small dataset
- high read throughput
- high write throughput
- large suite of pre-existing tools for distributed computation
- easier to perform large processing tasks
✚ Both are consistent
✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)
This is a an important release for HBase. Both Hortonworks and Cloudera have posts covering it:
- Hortonworks: Announcing Apache HBase 0.96.0, More than 2000 issues resolved!
- Cloudera: HBase 0.96.0 Released!
Original title and link: Apache HBase 0.96.0 released after more than 2000 issues resolved ( ©myNoSQL)
Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:
- Node/RegionServer failures while writing
- Node/RegionServer failures while reading
Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase ( ©myNoSQL)
In 4 years of writing this blog I haven’t seen such a prolific month:
- Apache Hadoop 2.2.0 (more links here)
- Apache HBase 0.96 (here and here)
- Apache Hive 0.12 (more links here)
- Apache Ambari 1.4.1
- Apache Pig 0.12
- Apache Oozie 4.0.0
- Plus Presto.
Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.
Original title and link: A prolific season for Hadoop and its ecosystem ( ©myNoSQL)