Educative post from TellApart explaining how HBase—with the right data modeling and filtering processes—had them covered for the following requirements coming from the need of log analysis:
- Data must be ingested into the system incrementally – one day or so worth of data at a time.
- Data is processed at a variety of time scales. Daily reporting often cares only about one day’s worth of data, while machine learning applications may require digesting several months worth of data to build models.
- Some events are naturally associated with others. An ad click is logged separately from an ad impression, but the two need to be processed together. Some data extraction applications need to process these associated events together, but others only care about the individual events.
- Random-access lookups to track all the actions of a user across time are often helpful. This is a powerful debugging tool to understand how the user interacts with the web at large and with the TellApart system in particular.
Original title and link: HBase at TellApart for Log Event Processing (NoSQL database©myNoSQL)