ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Tez: All content tagged as Tez in NoSQL databases and polyglot persistence

Big Data benchmark: Redshift, Hive, Impala, Shark, Stinger/Tez

Hosted on amplab, the origin of Spark this benchmark compares Redshift, Hive, Shark, Impala, Stinger/Tez:

Several analytic frameworks have been announced in the last year. Among them are inexpensive data-warehousing solutions based on traditional Massively Parallel Processor (MPP) architectures (Redshift), systems which impose MPP- like execution engines on top of Hadoop (Impala, HAWQ) and systems which optimize MapReduce to improve performance on analytical workloads (Shark, Stinger/Tez). This benchmark provides quantitative and qualitative comparisons of five systems. It is entirely hosted on EC2 and can be reproduced directly from your computer.

More important than the results:

  1. the clear methodology
  2. and its reproducibility

Original title and link: Big Data benchmark: Redshift, Hive, Impala, Shark, Stinger/Tez (NoSQL database©myNoSQL)

via: https://amplab.cs.berkeley.edu/benchmark/


Everything is faster than Hive

Derrick Harris has brought together a series of benchmarks conducted by the different SQL-on-Hadoop implementors comparing their solution (Impala, Stinger/Tez, HAWQ, Shark) with

For what it’s worth, everyone is faster than Hive — that’s the whole point of all of these SQL-on-Hadoop technologies. How they compare with each other is harder to gauge, and a determination probably best left to individual companies to test on their own workloads as they’re making their own buying decisions. But for what it’s worth, here is a collection of more benchmark tests showing the performance of various Hadoop query engines against Hive, relational databases and, sometimes, themselves.

As Derrick Harris remarks, the only direct comparisons are between HAWQ and Impala (and this seems to be old as it mentions Impala being in beta) and the benchmark run by AMPlab (the guys behind Shark) comparing Redshift, Hive, Shark, and Impala.

The good part is that both the Hive Testbench and AMPlab benchmark are available on GitHub.

Original title and link: Everything is faster than Hive (NoSQL database©myNoSQL)

via: http://gigaom.com/2014/01/13/cloudera-says-impala-is-faster-than-hive-which-isnt-saying-much/


Stinger and Tez: a primer

Matthieu Lieber summarizes what he has learned from a talk by Alan Gates. If any of the following questions interests you, head to his post:

  1. What is Stinger?
  2. Why build upon Hive rather than build a new system?
  3. Why is SQL compatibility important?
  4. What is Tez and how does it related to the Stinger initiative?
  5. What Tez means for Pig and other tools?

Original title and link: Stinger and Tez: a primer (NoSQL database©myNoSQL)


How Safari Books Online uses Google BigQuery for BI

Looking for alternative solutions to built our dashboards and enable interactive ad-hoc querying, we played with several technologies, including Hadoop. In the end, we decided to use Google BigQuery.

Compare the original processing flow:

BigQuery processing flow

with these 2 possible alternatives and tell me if you notice any significant differences.

Alternatives to BigQuery

Original title and link: How Safari Books Online uses Google BigQuery for BI (NoSQL database©myNoSQL)

via: http://googlecloudplatform.blogspot.com/2013/07/how-safari-books-online-uses-google.html


Hortonworks: The Fastest Path to Innovation: Community Driven Open Source

Shaun Connolly for the Hortonworks blog:

we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.

In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.

I’m forced, but extremely happy to take back what I said.

  • Stinger: an initiative to speed up Apache Hive for interactive queries. Read about it here
  • Know Gateway: a solution for authentication and security in Hadoop. More details here
  • Tez framework: a new Hadoop YARN-based runtime for improved latency and throughput. Details here

Hortonworks believes in open source.

Original title and link: Hortonworks: The Fastest Path to Innovation: Community Driven Open Source (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/hortonworks-community-leadership/