It used to be that data was generated at human scale. You’d buy or sell something and a transaction record would happen. You’d hire or fire someone and you’d hit the “employee” table in your database.
These days, data comes from machines talking to machines. The servers, switches, routers and disks on your LAN are all furiously conversing. The content of their messages is interesting, and also the patterns and timing of the messages that they send to one another. (In fact, if you can capture all that data and do some pattern detection and machine learning, you have a pretty good tool for finding bad guys breaking into your network.) Same is true for programmed trading on Wall Street, mobile telephony and many other pieces of technology infrastructure we rely on.
and how Hadoop can help:
Hadoop knows how to capture and store that data cheaply and reliably, even if you get to petabytes. More importantly, Hadoop knows how to process that data — it can run different algorithms and analytic tools, spread across its massively parallel infrastructure, to answer hard questions on enormous amounts of information very quickly.