About marshall_felder

marshall_felder · ‎08-13-2018

Assuming a data pipeline will be loading hive tables as spark dataframes. Which storage format is optimum for training machine learning models and running iterative processes? Row based (text, Avro) or column based (Orc, Parquet) files?

marshall_felder · ‎08-12-2018

Great post Binu! What storage format would you suggest if you plan on storing the hive table into a dataframe and running an iterative process (machine learning algorithm x) against the data? I’m hard pressed to find any kind of discussions on this concept.

Online	Offline
Last Visited	‎12-08-2018 06:44 PM

Member Since	‎12-07-2017 03:15 PM
Last Visited	‎12-08-2018 06:44 PM
Posts	3
Kudos received	2

Cloudera Community

Which storage format is optimum for training machi...

Re: Row vs Columnar Storage For Hive