Hadoop can be vital for solving the fraud detection problem because:
- Sampling does not work for rare events since the chance of missing a positive fraud case leads to significant deterioration of model quality.
- Hadoop can solve much harder problems by leveraging multiple cores across thousands of machines and search through much larger problem domains.
- Hadoop can be combined with other tools to manage moderate to low response latency requirements.
Nicely summarized by a commenter:
So the main point is “Cloudera has developed a tool, Flume, that can load billions of events into HDFS within a few seconds and analyze them using MapReduce.”?
And the suggestion to use ALL logs?
Or is there anything deeper that I am missing?
Original title and link for this post: Using Hadoop for Fraud Detection and Prevention (published on the NoSQL blog: myNoSQL)