Some queries which is currently running on production seem to take about 4 hours to complete. We are using Hive 0.10 on CDH 4.7 and I am looking at changing the file format of the table to parquet.
I did go through the documentation for using Parquet file format on Hive 0.10 but there seem to be some manual config changes that needs to be done.
The documentation says "To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central." (https://cwiki.apache.org/confluence/display/Hive/Parquet).
I need to know if CDH 4.7 would support this change?
If yes, then is there any documentation that could help us move towards performing these configuration changes?