ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Stinger: All content tagged as Stinger in NoSQL databases and polyglot persistence

Big Data benchmark: Redshift, Hive, Impala, Shark, Stinger/Tez

Hosted on amplab, the origin of Spark this benchmark compares Redshift, Hive, Shark, Impala, Stinger/Tez:

Several analytic frameworks have been announced in the last year. Among them are inexpensive data-warehousing solutions based on traditional Massively Parallel Processor (MPP) architectures (Redshift), systems which impose MPP- like execution engines on top of Hadoop (Impala, HAWQ) and systems which optimize MapReduce to improve performance on analytical workloads (Shark, Stinger/Tez). This benchmark provides quantitative and qualitative comparisons of five systems. It is entirely hosted on EC2 and can be reproduced directly from your computer.

More important than the results:

  1. the clear methodology
  2. and its reproducibility

Original title and link: Big Data benchmark: Redshift, Hive, Impala, Shark, Stinger/Tez (NoSQL database©myNoSQL)

via: https://amplab.cs.berkeley.edu/benchmark/


Everything is faster than Hive

Derrick Harris has brought together a series of benchmarks conducted by the different SQL-on-Hadoop implementors comparing their solution (Impala, Stinger/Tez, HAWQ, Shark) with

For what it’s worth, everyone is faster than Hive — that’s the whole point of all of these SQL-on-Hadoop technologies. How they compare with each other is harder to gauge, and a determination probably best left to individual companies to test on their own workloads as they’re making their own buying decisions. But for what it’s worth, here is a collection of more benchmark tests showing the performance of various Hadoop query engines against Hive, relational databases and, sometimes, themselves.

As Derrick Harris remarks, the only direct comparisons are between HAWQ and Impala (and this seems to be old as it mentions Impala being in beta) and the benchmark run by AMPlab (the guys behind Shark) comparing Redshift, Hive, Shark, and Impala.

The good part is that both the Hive Testbench and AMPlab benchmark are available on GitHub.

Original title and link: Everything is faster than Hive (NoSQL database©myNoSQL)

via: http://gigaom.com/2014/01/13/cloudera-says-impala-is-faster-than-hive-which-isnt-saying-much/


Stinger and Tez: a primer

Matthieu Lieber summarizes what he has learned from a talk by Alan Gates. If any of the following questions interests you, head to his post:

  1. What is Stinger?
  2. Why build upon Hive rather than build a new system?
  3. Why is SQL compatibility important?
  4. What is Tez and how does it related to the Stinger initiative?
  5. What Tez means for Pig and other tools?

Original title and link: Stinger and Tez: a primer (NoSQL database©myNoSQL)


Status update on Project Stinger, the interactive query for Apache Hive

Cloudera is investing in Impala. Pivotal in HAWQ. Facebook, who created Hive, has announced Presto.

Hortonworks continues to work on Hive with project Stinger and Apache Tez. Mid-October, they announced Hive 0.12:

Hive12deux

And at the end of October, Hortonworks has shared a new set of results:

Historically, even simple Hive queries could not run in less than 30 seconds, yet many of these queries are running in less than 10 seconds. How did that happen? The answer mainly boils down to Apache Tez and Apache Hadoop YARN, which proves that Hadoop is more than just batch. Tez features such as container pre-launch and re-use overcome Hadoop’s traditional latency barriers, and are available to any data processing framework running in Hadoop.

stinger1

Pretty impressive.

Original title and link: Status update on Project Stinger, the interactive query for Apache Hive (NoSQL database©myNoSQL)


Apache Hive 0.11: Stinger Phase 1 Delivered

Owen O’Malley on Hortonworks’ blog:

As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. This substantial release embodies the work of a wide group of people from Microsoft, Facebook , Yahoo, SAP and others. Together we have addressed 386 JIRA tickets, of which there were 28 new features and 276 bug fixes. There were FIFTY-FIVE developers involved in this and I would like to thank every one of them.

This is indeed the power of open. But don’t forget that too much bragging might diminish it: keep repeating a word and its value will slowly vanish.

Original title and link: Apache Hive 0.11: Stinger Phase 1 Delivered (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/


Hortonworks: The Fastest Path to Innovation: Community Driven Open Source

Shaun Connolly for the Hortonworks blog:

we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.

In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.

I’m forced, but extremely happy to take back what I said.

  • Stinger: an initiative to speed up Apache Hive for interactive queries. Read about it here
  • Know Gateway: a solution for authentication and security in Hadoop. More details here
  • Tez framework: a new Hadoop YARN-based runtime for improved latency and throughput. Details here

Hortonworks believes in open source.

Original title and link: Hortonworks: The Fastest Path to Innovation: Community Driven Open Source (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/hortonworks-community-leadership/