February 15, 2020

Notes on Parquet and ORC

ORC (Optimized Row Columnar)
  • flattened data
  • light weight index + bloom filter
  • better compression
  • Better with Hive
  • much less GC

Parquet
  • Nested data
  • Better with Spark

Note is in progress

No comments:

Post a Comment