Support Questions
Find answers, ask questions, and share your expertise

Best BigData file type for our scenaro (thinking of parquet)


 am planning to move our data, from row based (currently hdf5) to column based

i thinking for parquet but i dont know if it is the best solution, here are some info about our scenario:

  • each file have 200 tables

    • each table have up to 10,000,000 records
    • each table have 50 columns
      • each row have a timestamp as unique key
  • an average of 3GB (size varies 1-10GB) per file on hdf5 format

i want a fast method to read columns like this:

get all value for column (timestamp,value) , for specific column our files is used only for read only. to provide fast access to our cluster


i would like to know if parquet is best for my needs or i should go another direction