I'm running RHEL and ran into similar problems due to the fun configuration of Python2 and 3 and SCL on RedHat.
The root cause is the /usr/bin/hdp-select script was written to be Python2 compatible.
The differences between Python2 and 3 are causing these issues, the script is unfortunately not compatible with both versions.
To resolve, we had to modify the hdp-select script to be compatible with both versions.
I would attach mine but it might break your environment as there is a lot of hardcoded values such as your HDP component versions. So you'll need to do these steps manually.
1. Make a backup of the file.
sudo cp -p /usr/bin/hdp-select /usr/bin/hdp-select_original
2. As root, edit the file.
3. Add parenthesis around all print statements. Example below. Change all occurrences from:
print "a", "b", var, 123
print ("a","b", var, 123)
Be careful of multi-line print statements, ending with \, or using multi-line strings. Recommend editing in a text editor that supports syntax highlighting to avoid any issues.
Also be aware that Python is sensitive to indentation so you don't want to change any spaces / tabs at the start of a line.
4. Change os.mkdir from:
5. Comment out the packages.sorted()
(There are online tools for converting code from Python2 to 3 but they miss some of the above steps.)
6. Save and close the file
7. Test that hdp-select still works from shell. If so, you should be able to run spark-submit without issue.
A word of caution:
While these changes should be backwards compatible with Python 2, I am not sure what the longer-term impacts of these changes are, it may cause problems with other HDP components (though it seems highly unlikely).
Making changes to scripts outside of Ambari has other risks - Ambari or some other installation or upgrade process might replace the script with the one from your HDP software bundle, so your spark-submit could stop working if/when that happens.
I would file a bug report but we don't have Cloudera support at this time.