Intel: All content tagged as Intel in NoSQL databases and polyglot persistence
Monday, 4 March 2013
How Many Hadoops?
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
-
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? (©myNoSQL)
Thursday, 28 February 2013
Intel Distribution of H* in 21 Links
I don’t think anyone beside the PR department at Intel had the time to read through all the media coverage Intel Distribution H* got in the last couple of days. Here’s a collection of links for your reference. Pick wisely.
Intel Announcements
Media Coverage
-
NYTimes Bits: Intel’s Big Data Push
-
Wired: Intel Leaps on Software Elephant for Trip to Hardware Heaven
-
ZDNet: Intel baking Apache Hadoop into silicon for big data, security uses
-
The Register: Intel takes on all Hadoop disties to rule big data munching
-
Forbes: Intel Drops a Big Data Shocker
-
GigaOm: Cloudera who? Intel announces its own Hadoop distribution
-
SilliconAngle: Intel Gets Inside Big Data Chips With Hadoop
-
InformationWeek: Intel Unveils New Distribution For Apache Hadoop
-
Computerworld: Intel releases Hadoop software primed for its own chips
-
PCMag: [Intel Tackles Big Data With Release of Apache Hadoop Platform](http://www.pcmag.com/article2/0,2817,2415931,00.asp “{{rel=’external nofollow’}}”
-
DataInformed: Intel Jumps into Big Data Pool with Hadoop Distribution
-
Slashdot: Intel’s New Hadoop Distribution Could Benefit Its Hardware Bottom Line
-
VentureBeat: Intel moves into ‘big data’ software with Apache Hadoop distribution
-
DatacenterKnowledge: Intel Enters the Hadoop Software Market
-
Datacenter Dynamics: Intel launches own Hadoop distribution
Intel Distribution Partners
If like me you’re interested in archiving these, I’ve put this list together in a format easier to read and archive.
Original title and link: Intel Distribution of H* in 21 Links (©myNoSQL)
Monday, 25 February 2013
Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem
Avik Dey (Intel) sent the announcement of the new open source project from Intel to the Hadoop mailing list:
As the Apache Hadoop ecosystem extends into new markets and sees new use cases with security and compliance challenges, the benefits of processing sensitive and legally protected data with Hadoop must be coupled with protection for private information that limits performance impact. Project Rhino is our open source effort to enhance the existing data protection capabilities of the Hadoop ecosystem to address these challenges, and contribute the code back to Apache.
Project Rhino targets security at all levels: from encryption and key management, cell level ACLs to audit logging.
Original title and link: Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem (©myNoSQL)
Monday, 2 April 2012
EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing
The Greenplum Analytics Workbench incorporates technology from the world’s leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. The test bed cluster, which consists of 1,000+ hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.
Thanks!
Original title and link: EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing (©myNoSQL)
via: http://www.greenplum.com/news/greenplum-analytics-workbench
Wednesday, 30 November 2011
Data Is the New Currency. But Who’s Leading the Way?
In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:
And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.
The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.
Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.
To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?
-
Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development. ↩
-
Or they are very secretive about their internal initiatives and research. ↩
Original title and link: Data Is the New Currency. But Who’s Leading the Way? (©myNoSQL)
Tuesday, 7 June 2011
Franz's AllegroGraph Sets New Triple Store Record
The 310 billion triple result that Franz is announcing today was achieved in only two weeks of access (actual loading time of just over 78 hours) to an 8-socket Intel Xeon E7-8870 processor-based server system configured with 2 terabytes of physical memory and 22 terabytes of physical disk.
“We’re confident that with additional time, another terabyte of memory, and a bit more storage capacity, the previously unreachable goal of 1 trillion triples can be achieved. Even double that is not out of the question,” stated Dr. Jans Aasman, CEO of Franz Inc.
I’m afraid to ask how much would this cost. But we already know that scaling graph databases is still an open question.
This next answer shows why different data and processing models are needed for different scenarios:
Dr. Aasman said, “Some people have asked, ‘Why not do this on a distributed cloud system with Hadoop?’ The quick answer: NoSQL databases like Hadoop and Cassandra fail on joins. Big Enterprise, big web companies and big government intelligence organizations are all looking into big data to work with massive amounts of semi-unstructured data. They are finding that NoSQL databases are wonderful if one needs access to a single object in an ocean of billions of objects, however, they also find that the current NoSQL databases fall short if you need to run graph database operations that require many complicated joins. A typical example would be performing a social network analysis query on a large telecom call detail record database.”
Original title and link: Franz’s AllegroGraph Sets New Triple Store Record (NoSQL databases © myNoSQL)
via: http://finance.yahoo.com/news/Franzs-AllegroGraphR-Sets-New-iw-3088956781.html
