BigTable: All content tagged as BigTable in NoSQL databases and polyglot persistence
Just found slideck (embedded below) describing the data workflow at Klout. Their architecture includes many interesting pieces combining both NoSQL and relational databases with Hadoop and Hive and Pig and traditional BI. Even Excel gets a mention in the slides:
- Pig and Hive
- Elastic Search
Where Cassandra REALLY shines and is often overlooked is ease of maintenance. Cassandra’s ability to bootstrap new nodes, replicate, reshard and handle down nodes (w/ hinted handoff) is almost magical. I use it in production and it works very reliably.
Sure, it’s got some cool big data stuff, but try doing any of those “maintenance” operations on other databases without ripping your hair out. For example, even bringing up a new MySQL slave is a huge pain in the ass, let alone doing something non-trivial like promoting a new master.
Reinforcing exactly what I emphasized as merits of NoSQL systems in is SQL or NoSQL better for programmers.
Original title and link: Where Cassandra Really Shines ( ©myNoSQL)
Hortonworks has announced the 1.0 release of the Hortonworks Data Platform prior to the Hadoop Summit 2012 together with a lot of supporting quotes from companies like Attunity, Dataguise, Datameer, Karmasphere, Kognitio, MarkLogic, Microsoft, NetApp, StackIQ, Syncsort, Talend, 10gen, Teradata, and VMware.
Some info points:
Hortonworks Data Platform is a platform meant to simplify the installation, integration, management, and use of Apache Hadoop
- HDP 1.0 is based on Apache Hadoop 1.0
- Apache Ambari is used for installation and provisioning
- The same Apache Amabari is behind the Hortonworks Management Console
- For Data integration, HDP offers WebHDFS, HCatalog APIs, and Talend Open Studio
- Apache HCatalog is the solution offering metadata and table management
Hortonworks Data Platform is 100% open source—I really appreciate Hortonworks’s dedication to the Apache Hadoop project and open source community
- HDP comes with 3 levels of support subscriptions, pricing starting at $12500/year for a 10 nodes cluster
One of the most interesting aspects of the Hortonworks Data Platform release is that the high-availability (HA) option for HDP is based on using VMWare-powered virtual machines for the NameNode and JobTracker. My first thought about this approach is that it was chosen to strengthen a partnership with VMWare. On the other hand, Hadoop 2.0 contains already a new highly-available version of the NameNode (Cloudera Hadoop Distribution uses this solution) and VMWare has bigger plans for a virtualization-friendly Hadoop environment with project Serengeti.
Original title and link: Hortonworks Data Platform 1.0 ( ©myNoSQL)
Two posts by Oliver Meyn on measuring the performance of two HBase clusters—first results on the original cluster and results on the upgraded cluster— using
org.apache.hadoop.hbase.PerformanceEvaluation, the resulting performance charts, Ganglia charts, and some thoughts and feedback from the HBase community.
Original title and link: Performance Evaluation of HBase and How Hardware Changes Results ( ©myNoSQL)
- Read caching improvements
- Seek optimizations
- WAL writes optimizations
- added functionality to HBck: fixing orphaned regions, region holes, overlapping regions
- simplified region sizing
- atomic Put & Delete in a single transaction
Original title and link: HBase 0.94 Released: What’s New ( ©myNoSQL)