ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

TeraSort: All content tagged as TeraSort in NoSQL databases and polyglot persistence

Hadoop, HPCC, MapR and the TeraSort Benchmark

Just in, from LexisNexis:

HPCC Systems 4 nodes cluster sorts 100 gigabytes in 98 seconds and is 25% faster than a 20 nodes Hadoop cluster.

Results achieved in December 2011 show that an HPCC Systems four node Thor cluster took only 98 seconds to complete a Terasort with a job size of 100 gigabytes (GB) on a cluster five times smaller than Hadoop. The HPCC Systems four node cluster was comprised of one (1) Dell PowerEdge C6100 2U server with Intel® Xeon® processors E5675 series, 48GB of memory, and 6 x 146GB SAS HDD’s. The Dell C6100 houses four nodes inside the 2U enclosure. The previous leader ran the same Terasort benchmark in 130 seconds on a 20-node Hadoop cluster using equivalent node hardware. HPCC Systems is an Open Source, enterprise-proven Big Data analytics-processing platform.

Thus Armando Escalante (SVP and CTO of LexisNexis Risk Solutions and head of HPCC Systems) concludes:

These results demonstrate that HPCC Systems is a leader in Big Data processing

Now switching to a post on MapR’s blog:

Recently a world record was claimed for a Hadoop benchmark. […] We were surprised to see that this world record was for a TeraSort benchmark on a 100GB of data. TeraSort is a standard benchmark and the name is derived from “sorting a terabyte”.  Any record claims for sorting a 100GB dataset across a 20 node cluster with 10 times as much memory is comical. The test is named TeraSort not GigaSort.

Original title and link: Hadoop, HPCC, MapR and the TeraSort Benchmark (NoSQL database©myNoSQL)