Member since
09-15-2022
5
Posts
0
Kudos Received
0
Solutions
10-18-2022
11:37 PM
Hello @vaishaakb , Sadly we did not reach the solution for the main issue yet. Yes, I checked this blog and I also checked every documentation provided by cloudera or others to try resolving this issue but no luck. Also I want to point out that the blog's first demo is not working properly and the cloudera team posted something that shows an error ImportError: No module named numpy Which proves that the docker image didn't work with pyspark properly.
... View more
10-17-2022
03:17 AM
Okay so the mounting issue happened when I changed the configurations of yarn.nodemanager.linux-container-executor.group in yarn-site.xml and container-executor.cfg to be all "hadoop". Which I found out later that its unnecessary and this configs should be applied in older versions. so I reverted the configs to the default, but the mounting issue still persisted. The solution was is to adjust yarn/nm/usercache to be owned by yarn, then delete the specific user folder, in my case I had to delete f.alenezi folder. Sicne yarn auto generates folders/files for each job, we have to make sure the ownership is set properly.
... View more
10-16-2022
12:29 AM
I solved the mount issue, but that took me back to the same main issue which mentioned in the main post. I am still trying to resolve this issue, so any help would be appreciated.
... View more
09-20-2022
03:15 AM
Thank you @vaishaakb for your answer, but sadly non of these sources helped. I was going through the configs of docker on yarn from apache, and they specified that yarn.nodemanager.linux-container-executor.group Should be the same in both yarn-site.xml and container-executor.cfg so I found that the one in yarn-site was "hadoop" and the one in container-executor.cfg was "yarn" so I went and changed the one in container-executor.cfg to "hadoop" as well. This action resulted into another error on starting the job, it cannot recognize the yarn directories mounts [2022-09-20 13:04:35.114]Container exited with a non-zero exit code 29.
[2022-09-20 13:04:35.114]Container exited with a non-zero exit code 29.
For more detailed output, check the application tracking page: https://SERVER/cluster/app/application_1663590757906_0056 Then click on links to logs of each attempt.
. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1663668271871
final status: FAILED
tracking URL: https://SERVER/cluster/app/application_1663590757906_0056
user: f.alenezi
22/09/20 13:04:36 INFO yarn.Client: Deleted staging directory hdfs://SERVER/user/f.alenezi/.sparkStaging/application_1663590757906_0056
22/09/20 13:04:36 ERROR yarn.Client: Application diagnostics message: Application application_1663590757906_0056 failed 2 times due to AM Container for appattempt_1663590757906_0056_000002 exited with exitCode: 29
Failing this attempt.Diagnostics: [2022-09-20 13:04:35.113]Exception from container-launch.
Container id: container_e55_1663590757906_0056_02_000001
Exit code: 29
Exception message: Launch container failed
Shell error output: Could not determine real path of mount '/data01/yarn/nm/usercache/f.alenezi/appcache/application_1663590757906_0056'
Could not determine real path of mount '/data01/yarn/nm/usercache/f.alenezi/appcache/application_1663590757906_0056'
Invalid docker mount '/data01/yarn/nm/usercache/f.alenezi/appcache/application_1663590757906_0056:/data01/yarn/nm/usercache/f.alenezi/appcache/application_1663590757906_0056:rw', realpath=/data01/yarn/nm/usercache/f.alenezi/appcache/application_1663590757906_0056
Error constructing docker command, docker error code=13, error message='Invalid docker mount'
Shell output: main : command provided 4
main : run as user is f.alenezi
main : requested yarn user is f.alenezi
Creating script paths...
Creating local dirs... If we can fix this somehow, or locate to the main cause of the issue it might help with the earlier issue. Any feedback is appreciated.
... View more
09-19-2022
06:21 AM
Hello everyone, I am trying to run a python script in a dockerized enviornment using spark and yarn, the cluster is kerberised and I provided the needed realm and keytab for the spark-submit command. I still face this issue where java only sees user null for some reason. What I tried : spark-submit \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=MyImage \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/etc/krb5.conf:/etc/krb5.conf:ro" \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=MyImage \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/etc/krb5.conf:/etc/krb5.conf:ro" \
--principal MyPrincipal\
--keytab MyKeytab \
Script.py The error resulted from running this : [2022-09-19 16:09:06.612]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
e Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2094)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:2005)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:743)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:693)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:604)
at org.apache.spark.deploy.SparkHadoopUtil.createSparkUser(SparkHadoopUtil.scala:74)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:810)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:2015)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:743)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:693)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:604)
at org.apache.spark.deploy.SparkHadoopUtil.createSparkUser(SparkHadoopUtil.scala:74)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:810)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2094)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:2005)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:743)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:693)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:604)
at org.apache.spark.deploy.SparkHadoopUtil.createSparkUser(SparkHadoopUtil.scala:74)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:810)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2094)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:2005)
... 6 more I tried everything, made sure everything is configured properly and followed the hadoop documentations to make sure everything is properly set. The only thing I didn't know how to do is to set a user near conf before submitting Something similar to : UserGroupInformation.setLoginUser(UserGroupInformation.createRemoteUser("hduser")) Which could solve my issue but I cant add this to the script nor I know the conf command for it.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Docker
-
HDFS
-
Kerberos