ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Intel: All content tagged as Intel in NoSQL databases and polyglot persistence

How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?


  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)


Intel Distribution of H* in 21 Links

I don’t think anyone beside the PR department at Intel had the time to read through all the media coverage Intel Distribution H* got in the last couple of days. Here’s a collection of links for your reference. Pick wisely.

Intel Announcements

  1. Intel Aims to Enrich Lives by Unlocking the Power of Big Data

  2. Intel Jumps into HADOOP

Media Coverage

  1. NYTimes Bits: Intel’s Big Data Push

  2. Wired: Intel Leaps on Software Elephant for Trip to Hardware Heaven

  3. WSJ: Intel Releases Own Version of Hit Hadoop Software

  4. ZDNet: Intel baking Apache Hadoop into silicon for big data, security uses

  5. The Register: Intel takes on all Hadoop disties to rule big data munching

  6. Forbes: Can Intel Heal the Hadoop Open Source Ecosystem?

  7. Forbes: Intel Drops a Big Data Shocker

  8. Slashdot: Intel Launches Its Own Apache Hadoop Distribution

  9. GigaOm: Cloudera who? Intel announces its own Hadoop distribution

  10. SilliconAngle: Intel Gets Inside Big Data Chips With Hadoop

  11. eweek: Intel Releases Hadoop Distribution for Big Data

  12. InformationWeek: Intel Unveils New Distribution For Apache Hadoop

  13. Computerworld: Intel releases Hadoop software primed for its own chips

  14. PCMag: [Intel Tackles Big Data With Release of Apache Hadoop Platform](http://www.pcmag.com/article2/0,2817,2415931,00.asp “{{rel=’external nofollow’}}”

  15. DataInformed: Intel Jumps into Big Data Pool with Hadoop Distribution

  16. Slashdot: Intel’s New Hadoop Distribution Could Benefit Its Hardware Bottom Line

  17. VentureBeat: Intel moves into ‘big data’ software with Apache Hadoop distribution

  18. DatacenterKnowledge: Intel Enters the Hadoop Software Market

  19. Datacenter Dynamics: Intel launches own Hadoop distribution

Intel Distribution Partners

Intel Distribution Partners

If like me you’re interested in archiving these, I’ve put this list together in a format easier to read and archive.

Original title and link: Intel Distribution of H* in 21 Links (NoSQL database©myNoSQL)


Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem

Avik Dey (Intel) sent the announcement of the new open source project from Intel to the Hadoop mailing list:

As the Apache Hadoop ecosystem extends into new markets and sees new use cases with security and compliance challenges, the benefits of processing sensitive and legally protected data with Hadoop must be coupled with protection for private information that limits performance impact. Project Rhino is our open source effort to enhance the existing data protection capabilities of the Hadoop ecosystem to address these challenges, and contribute the code back to Apache.

Project Rhino targets security at all levels: from encryption and key management, cell level ACLs to audit logging.

Original title and link: Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem (NoSQL database©myNoSQL)

via: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201302.mbox/%3cCD5137E5.15610%25avik.dey@intel.com%3e


EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing

The Greenplum Analytics Workbench incorporates technology from the world’s leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. The test bed cluster, which consists of 1,000+ hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.

Thanks!

Original title and link: EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing (NoSQL database©myNoSQL)

via: http://www.greenplum.com/news/greenplum-analytics-workbench


Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?


  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)


Franz's AllegroGraph Sets New Triple Store Record

The 310 billion triple result that Franz is announcing today was achieved in only two weeks of access (actual loading time of just over 78 hours) to an 8-socket Intel Xeon E7-8870 processor-based server system configured with 2 terabytes of physical memory and 22 terabytes of physical disk.

“We’re confident that with additional time, another terabyte of memory, and a bit more storage capacity, the previously unreachable goal of 1 trillion triples can be achieved. Even double that is not out of the question,” stated Dr. Jans Aasman, CEO of Franz Inc.

I’m afraid to ask how much would this cost. But we already know that scaling graph databases is still an open question.

This next answer shows why different data and processing models are needed for different scenarios:

Dr. Aasman said, “Some people have asked, ‘Why not do this on a distributed cloud system with Hadoop?’ The quick answer: NoSQL databases like Hadoop and Cassandra fail on joins. Big Enterprise, big web companies and big government intelligence organizations are all looking into big data to work with massive amounts of semi-unstructured data. They are finding that NoSQL databases are wonderful if one needs access to a single object in an ocean of billions of objects, however, they also find that the current NoSQL databases fall short if you need to run graph database operations that require many complicated joins. A typical example would be performing a social network analysis query on a large telecom call detail record database.”

Original title and link: Franz’s AllegroGraph Sets New Triple Store Record (NoSQL databases © myNoSQL)

via: http://finance.yahoo.com/news/Franzs-AllegroGraphR-Sets-New-iw-3088956781.html