Christian Prokopp explaining the advantages of the RCFile storage:
The state-of-the-art solution for Hive is the RCFile. The format has been
co-developed by Facebook, which is running the largest Hadoop and Hive
installation in the world. RCFile has been adopted by the Hive and Pig
projects as the core format for table like data storage. The goal of the
format development was “(1) fast data loading, (2) fast query processing,
(3) highly efficient storage space utilization, and (4) strong adaptivity to
highly dynamic workload patterns,” as can be seen in this PDF from the
- is there any connection between the RCFile and Parquet the new columnar storage format? At first glance, the goals of the two are pretty similar.
- It looks like there’s already a new format that will supersede RCFile: ORC Files. Are all these 3 approaches independent of each other? If yes, then would are the pros and cons of each of them?
Original title and link: RCFile - OCFile - Parquet: Storing Big Data With Hive ( ©myNoSQL)