Sifting through the PRish announcements related to Informatica HParser, what I’ve figured out so far is:
- it is the T in ETL
- a visual tool for creating parsing definitions for formats like web logs, XML, JSON, FIX, SWIFT, HL7, CDR, WORD, PDF, XLS, etc.
- transformations can be accessed from Hadoop MapReduce, Hive, or Pig
- the benefits of using HParser come from being able to share the same parsing definitions/transformations in the context of the Hadoop distributed environment
- HParser tries to provide an optimal transformation solution when streaming, splitting, and processing large files
- HParser is available in two licensing formats: community and commercial
Original title and link: What Is Informatica HParser for Hadoop? ( ©myNoSQL)