eBay: All content tagged as eBay in NoSQL databases and polyglot persistence
Slidedeck from eBay explaining how they have implemented a graph based recommendation system based on,—surprise! not a graph database—Cassandra.
Original title and link: Graph Based Recommendation Systems at eBay ( ©myNoSQL)
Shawn Rogers has a short but compelling list of Big Data deployments in his article Big Data is Scaling BI and Analytics. This list also shows that even if there are some common components like Hadoop, there are no blueprints yet for dealing with Big Data.
Facebook: Hadoop analytic data warehouse, using HDFS to store more than 30 petabytes of data. Their Big Data stack is based only on open source solutions.
Quantcast: 3,000 core, 3,500 terabyte Hadoop deployment that processes more than a petabyte of raw data each day
University of Nebraska-Lincoln: 1.6 petabytes of physics data Hadoop cluster
Yahoo!: 100,000 CPUs in 40,000 computers, all running Hadoop. Also running a 12 terabyte MOLAP cube based on Tableau Software
eBay: has 3 separate analytics environments:
- 6PB data warehouse for structured data and SQL access
- 40PB deep analytics (Teradata)
- 20PB Hadoop system to support advanced analytic workload on unstructured data
Original title and link: Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others ( ©myNoSQL)
Anil Madan presenting on Hadoop at eBay:
The talk will illustrate how Hadoop has become a critical center piece of infrastructure for eBay, running on thousands of servers. I will also discuss how it fuels our derived data pipeline which in turn affects just about all our services. Attendees will understand how we have integrated Hadoop into our existing data warehouse and how we are leveraging components of the ecosystem like HBase, Pig, and Hive for different research and production use cases.
There was one NoSQL conference that I’ve missed and I was really pissed off: Hadoop World. Even if I’ve followed and curated the Twitter feed, resulting in Hadoop World in tweets, the feeling of not being there made me really sad. But now, thanks to Cloudera I’ll be able to watch most of the presentations. Many of them have already been published and the complete list can be found ☞ here.
Based on the twitter activity on that day, I’ve selected below the ones that seemed to have generated most buzz. The list contains names like Facebook, Twitter, eBay, Yahoo!, StumbleUpon, comScore, Mozilla, AOL. And there are quite a few more …
From ☞ DBMS2:
eBay sees Hadoop as an interesting tool for certain special purposes:
- eBay likes Hadoop for certain tasks such as image analysis.
- eBay doesn’t like Hadoop for anything that requires data movement, such as a join.
- Similarly, eBay doesn’t like HBase.
But based on reports from Hadoop World it looks like eBay usage of Hadoop is quite wide:
- eBay had a 4 node cluster in 2007, a 28 and a 10 node cluster in 2009, a 500+ nodes cluster in 2010
- 4200 processors, 4.3 PB of data on CentOS 1U 48 GB RAM datanodes.
- production cluster will be 8500 procs, 16PB