Created 03-31-2025 04:01 AM
In the context of Cloudera CDP, I've noticed that unlike CDH, YARN does not automatically distribute the hive-site.xml file to the nodes. This seems to be a change in configuration practices, but I'm not sure if this is the correct approach or if there's another recommended method to ensure the hive-site.xml file is properly distributed. Can someone confirm if it is necessary to configure it manually in CDP, or if there is another automated process that YARN handles for distributing the configuration file?
Created 04-01-2025 08:44 AM
Hello @mikecolux ,
If you are using CDP and have Hive on Tez set up on the cluster then the Hive on Tez service will take care of this for you, and you will not need to configure anything manually.
Created 04-01-2025 09:27 AM
@mikecolux Adding to what @MGreen explained. Make sure the Hive on TEZ Gateway role is deployed to the nodes where hive-site.xml is needed
Created 04-02-2025 01:48 AM
Thank you @JoseManuel for your answer, unfortunately this solution didn' work. I also add a detail: the hive-site.xml file is not distributed in Spark jobs unless explicitly provided using the --files option in spark-submit (via spark-opts). If there are any additional configuration settings that need to be adjusted, I'd appreciate any guidance as well
Created 04-02-2025 01:57 AM
I have a similar issue, but specifically with Spark 3 jobs launched via Oozie. In my case, the Spark 3 job is unable to find the hive-site.xml file, whereas I don’t encounter this problem when running queries via Hive on Tez or directly using spark3-shell.
The only way I’ve found to resolve this is by explicitly adding the following configuration in the Oozie workflow:
--files /etc/hive/conf/hive-site.xml
Created 04-02-2025 08:37 AM
If your job requires hive-site.xml, it is not necessary to copy the file to /etc/spark/conf. Instead, you can try exporting the following command, which will allow hive-site.xml to be picked from /etc/hive/conf whenever needed:
export HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hadoop/conf:/etc/spark/conf/yarn-conf/*:/etc/hive/conf
You can test this approach in a specific job or session, and once it works, you can update the Spark configuration accordingly.
CM > Spark > Configuration >Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh
Created 04-02-2025 12:47 PM
@leopetr In the Oozie world, Spark3 actions are a different animal than HiveServer2 ones.
Spark actions need to specify the files and as long Hive on TEZ GW role is deployed to all NODEMANAGER nodes, that should work for you
Created 04-03-2025 01:55 AM
@JoseManuel Thank you for the suggestion. I have added the Hive on Tez roles to all NODEMANAGER nodes, but unfortunately, the issue persists. The Spark3 action is still failing.