column store: All content tagged as column store in NoSQL databases and polyglot persistence
A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:
- fast queries for known keys over a (relatively) small dataset
- high read throughput
- high write throughput
- large suite of pre-existing tools for distributed computation
- easier to perform large processing tasks
✚ Both are consistent
✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)
This is a an important release for HBase. Both Hortonworks and Cloudera have posts covering it:
- Hortonworks: Announcing Apache HBase 0.96.0, More than 2000 issues resolved!
- Cloudera: HBase 0.96.0 Released!
Original title and link: Apache HBase 0.96.0 released after more than 2000 issues resolved ( ©myNoSQL)
Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:
- Node/RegionServer failures while writing
- Node/RegionServer failures while reading
Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase ( ©myNoSQL)
In 4 years of writing this blog I haven’t seen such a prolific month:
- Apache Hadoop 2.2.0 (more links here)
- Apache HBase 0.96 (here and here)
- Apache Hive 0.12 (more links here)
- Apache Ambari 1.4.1
- Apache Pig 0.12
- Apache Oozie 4.0.0
- Plus Presto.
Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.
Original title and link: A prolific season for Hadoop and its ecosystem ( ©myNoSQL)
Holy cow! That’s a 4 followed by a 5… with no dots in between.
Derrick Harris for GigaOm: NoSQL startup DataStax raises $45M to ride Cassandra’s wave:
Cassandra’s success with such large users has to do with its ability to handle large-scale online applications that demand steady levels of performance, DataStax CEO Billy Bosworth told me. Scalability and performance have never been among Cassandra’s shortcomings, and the database is capable of replicating data across data centers. Large companies used to choose Oracle for applications that needed these capabilities, but now that NoSQL options are around and relatively mature, companies are rethinking whether the relational database model was ever really correct for some applications in the first place.
DataStax will use the funding to build out globally and invest in Apache Cassandra, the NoSQL open-source project and foundation for the company’s database distributions. The funding also signals a potential IPO for DataStax but much will depend on the direction of the markets, said CEO Billy Bosworth in an interview yesterday. “We are building the company for that direction (IPO),” he said. “A l lot depends on external factors. Internally, the company is already starting that process.”
According to my books:
- This is the largest round raised by a NoSQL company. It tops 10gen’s $45mil for MongoDB.
- This is the 3rd largest round raised in the new data market, after Cloudera’s $65mil. and Hortonworks’s $50mil. rounds.
Original title and link: $45millions more for DataStax ( ©myNoSQL)