Best BigData file type for our scenaro (thinking of parquet)


 am planning to move our data, from row based (currently hdf5) to column based

i thinking for parquet but i dont know if it is the best solution, here are some info about our scenario:

  • each file have 200 tables

    • each table have up to 10,000,000 records
    • each table have 50 columns
      • each row have a timestamp as unique key
  • an average of 3GB (size varies 1-10GB) per file on hdf5 format

i want a fast method to read columns like this:

get all value for column (timestamp,value) , for specific column our files is used only for read only. to provide fast access to our cluster


i would like to know if parquet is best for my needs or i should go another direction