Support Questions
Find answers, ask questions, and share your expertise

Avantages to have both data in Parquet and HBase


When introducing Cloudera to my client, it tells me :


Why not having the data only inside HBase (so not in Parquet too) ? 


For now, I'm fail to find any arguments to keep the data also in Parquet files. So do you have advantages to store the data both in HBase and in Parquet ?


Is ETL with Spark, will have better performance with HDFS/Parquet or  from/to HBase ?


Thanks !