NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



ebay: All content tagged as ebay in NoSQL databases and polyglot persistence

Results of collaboration on improving the Mean Time to Recovery in HBase

Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:

  • Node/RegionServer failures while writing
  • Node/RegionServer failures while reading

Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase (NoSQL database©myNoSQL)

Graph Based Recommendation Systems at eBay

Slidedeck from eBay explaining how they have implemented a graph based recommendation system based on,—surprise! not a graph database—Cassandra.

Original title and link: Graph Based Recommendation Systems at eBay (NoSQL database©myNoSQL)

eBay's Cassandra Data Modeling Best Practices

Jay Patel (architect at eBay):

Our Cassandra deployment is not huge, but it’s growing at a healthy pace. In the past couple of months, we’ve deployed dozens of nodes across several small clusters spanning multiple data centers. You may ask, why multiple clusters? We isolate clusters by functional area and criticality. Use cases with similar criticality from the same functional area share the same cluster, but reside in different keyspaces.

This first post is focused on two old techniques that have been applied even with relational databases:

  1. model data around query patterns
  2. de-normalize and duplicate for read performance.

Original title and link: eBay’s Cassandra Data Modeling Best Practices (NoSQL database©myNoSQL)


eBay, Wal-Mart Search for Revved-Up Search Engines

Reuters reporting about eBay and Wal-Mart’s work to improve their search engines:

The search engine project takes time because eBay’s online marketplace has so much variable information from millions of listings that are described differently by each seller - something known as unstructured data in the tech world.

This is not much of a NoSQL story, but there’s something I’m reading between the lines: when talking about creating better search solutions making search work at scale is not mentioned, implying this is a solved problem. The focus is on handling unstructured data and creating better relevancy algorithms.

I have no details about the architecture of the new version of eBay search, but I have found this diagram of eBay’s Voyager in a slidedeck by Dan Pritchett from around 2007:

Scaling Search Voyager

Original title and link: eBay, Wal-Mart Search for Revved-Up Search Engines (NoSQL database©myNoSQL)


eBay Exec Urges Hadoop Community...

Darren Bruntz, senior director of e-commerce at eBay:

I think we will stay on our setup of the three platforms for a few more years, but  Hadoop could be a more compelling offering if the open source community and its contributors got some more focus and energy, as you would have a whole community of people working on new tools and features,

Cumulative Lines of Code Contributed to Apache Hadoop Trunk Timeline through June 2011

Cumulative Lines of Code Contributed to Apache Hadoop Trunk Timeline through June 2011 

Where is eBay in this list of Hadoop contributors?

Original title and link: eBay Exec Urges Hadoop Community… (NoSQL database©myNoSQL)


Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others

Shawn Rogers has a short but compelling list of Big Data deployments in his article Big Data is Scaling BI and Analytics. This list also shows that even if there are some common components like Hadoop, there are no blueprints yet for dealing with Big Data.

  • Facebook: Hadoop analytic data warehouse, using HDFS to store more than 30 petabytes of data. Their Big Data stack is based only on open source solutions.

  • Quantcast: 3,000 core, 3,500 terabyte Hadoop deployment that processes more than a petabyte of raw data each day

  • University of Nebraska-Lincoln: 1.6 petabytes of physics data Hadoop cluster

  • Yahoo!: 100,000 CPUs in 40,000 computers, all running Hadoop. Also running a 12 terabyte MOLAP cube based on Tableau Software

  • eBay: has 3 separate analytics environments:

    • 6PB data warehouse for structured data and SQL access
    • 40PB deep analytics (Teradata)
    • 20PB Hadoop system to support advanced analytic workload on unstructured data

Original title and link: Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others (NoSQL database©myNoSQL)

eBay Deploys 100TB of Flash Storage

eBay is a prime example of the benefits of flash. Nimbus Data CEO Thomas Isakovich told me that eBay had only 2.5TB of flash installed six months ago before recently upgrading to 100TB. Within the PayPal division, where Nimbus is deployed, Isakovich said eBay has cut power costs by 78 percent, cut its rack space by half and is able to better meet performance demand overall by spinning up virtual machines even faster.

This probably marks the start of a new trend where flash is used not only for storing hot data.

Original title and link: eBay Deploys 100TB of Flash Storage (NoSQL database©myNoSQL)


Hadoop at eBay

Anil Madan[1] presenting on Hadoop at eBay:

The talk will illustrate how Hadoop has become a critical center piece of infrastructure for eBay, running on thousands of servers. I will also discuss how it fuels our derived data pipeline which in turn affects just about all our services. Attendees will understand how we have integrated Hadoop into our existing data warehouse and how we are leveraging components of the ecosystem like HBase, Pig, and Hive for different research and production use cases.

Videos from Hadoop World

There was one NoSQL conference that I’ve missed and I was really pissed off: Hadoop World. Even if I’ve followed and curated the Twitter feed, resulting in Hadoop World in tweets, the feeling of not being there made me really sad. But now, thanks to Cloudera I’ll be able to watch most of the presentations. Many of them have already been published and the complete list can be found ☞ here.

Based on the twitter activity on that day, I’ve selected below the ones that seemed to have generated most buzz. The list contains names like Facebook, Twitter, eBay, Yahoo!, StumbleUpon, comScore, Mozilla, AOL. And there are quite a few more …

eBay, Hadoop, HBase

From ☞ DBMS2:

eBay sees Hadoop as an interesting tool for certain special purposes:

  • eBay likes Hadoop for certain tasks such as image analysis.
  • eBay doesn’t like Hadoop for anything that requires data movement, such as a join.
  • Similarly, eBay doesn’t like HBase.

But based on reports from Hadoop World it looks like eBay usage of Hadoop is quite wide:

  • eBay had a 4 node cluster in 2007, a 28 and a 10 node cluster in 2009, a 500+ nodes cluster in 2010
  • 4200 processors, 4.3 PB of data on CentOS 1U 48 GB RAM datanodes.
  • production cluster will be 8500 procs, 16PB

Original title and link: eBay, Hadoop, HBase (NoSQL databases © myNoSQL)