Support Questions

mikecolux · ‎03-31-2025

In the context of Cloudera CDP, I've noticed that unlike CDH, YARN does not automatically distribute the hive-site.xml file to the nodes. This seems to be a change in configuration practices, but I'm not sure if this is the correct approach or if there's another recommended method to ensure the hive-site.xml file is properly distributed. Can someone confirm if it is necessary to configure it manually in CDP, or if there is another automated process that YARN handles for distributing the configuration file?

MGreen · ‎04-01-2025

Hello @mikecolux ,

If you are using CDP and have Hive on Tez set up on the cluster then the Hive on Tez service will take care of this for you, and you will not need to configure anything manually.

JoseManuel · ‎04-01-2025

@mikecolux Adding to what @MGreen explained. Make sure the Hive on TEZ Gateway role is deployed to the nodes where hive-site.xml is needed

mikecolux · ‎04-02-2025

Thank you @JoseManuel for your answer, unfortunately this solution didn' work. I also add a detail: the hive-site.xml file is not distributed in Spark jobs unless explicitly provided using the --files option in spark-submit (via spark-opts). If there are any additional configuration settings that need to be adjusted, I'd appreciate any guidance as well

leopetr · ‎04-02-2025

I have a similar issue, but specifically with Spark 3 jobs launched via Oozie. In my case, the Spark 3 job is unable to find the hive-site.xml file, whereas I don’t encounter this problem when running queries via Hive on Tez or directly using spark3-shell.

The only way I’ve found to resolve this is by explicitly adding the following configuration in the Oozie workflow:

--files /etc/hive/conf/hive-site.xml

MGreen · ‎04-02-2025

@mikecolux

If your job requires hive-site.xml, it is not necessary to copy the file to /etc/spark/conf. Instead, you can try exporting the following command, which will allow hive-site.xml to be picked from /etc/hive/conf whenever needed:

export HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hadoop/conf:/etc/spark/conf/yarn-conf/*:/etc/hive/conf

You can test this approach in a specific job or session, and once it works, you can update the Spark configuration accordingly.

CM > Spark > Configuration >Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

JoseManuel · ‎04-02-2025

@leopetr In the Oozie world, Spark3 actions are a different animal than HiveServer2 ones.

Spark actions need to specify the files and as long Hive on TEZ GW role is deployed to all NODEMANAGER nodes, that should work for you

leopetr · ‎04-03-2025

@JoseManuel Thank you for the suggestion. I have added the Hive on Tez roles to all NODEMANAGER nodes, but unfortunately, the issue persists. The Spark3 action is still failing.

Cloudera Community

Support Questions

Hive-site.xml Distribution in Cloudera CDP: Manual Configuration vs. YARN Automation