Member since
12-07-2017
3
Posts
2
Kudos Received
0
Solutions
08-13-2018
02:15 PM
Assuming a data pipeline will be loading hive tables as spark dataframes. Which storage format is optimum for training machine learning models and running iterative processes? Row based (text, Avro) or column based (Orc, Parquet) files?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
08-12-2018
01:50 PM
Great post Binu! What storage format would you suggest if you plan on storing the hive table into a dataframe and running an iterative process (machine learning algorithm x) against the data? I’m hard pressed to find any kind of discussions on this concept.
... View more