Member since
03-13-2018
4
Posts
1
Kudos Received
0
Solutions
03-23-2018
03:28 PM
1 Kudo
Hello Harsh, The following command 'unset HADOOP_HDFS_HOME' did the trick! I am able to run spark-submit without including the hadoop-hdfs jar and also run the command 'hadoop fs -ls' on the local terminal to view the HDFS directories. The problem was in my /etc/environment file, which included the following line: HADOOP_HDFS_HOME="/opt/cloudera/parcels/CDH/lib/hadoop" I think I must have inserted the above line following some installation guide, but it was the cause of this issue. Removing that line from the /etc/environment file permanently fixes the issue. I can open a new terminal and run spark-submit without running 'unset HADOOP_HDFS_HOME' first. Thank you so much for helping me fix this!
... View more
03-21-2018
12:39 PM
Hey Harsh, Here is the requested info: env: XDG_SESSION_ID=6 SHELL=/bin/bash TERM=xterm-256color SSH_CLIENT= SSH_TTY=/dev/pts/13 USER=user LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: PATH=/home/user/.conda/envs/py27/bin:/opt/apache-maven-3.5.2/bin:/usr/anaconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games MAIL=/var/mail/user CONDA_PATH_BACKUP=/opt/apache-maven-3.5.2/bin:/usr/anaconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop CONDA_PREFIX=/home/user/.conda/envs/py27 PWD=/home/user/bitbucket/dl_staging JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre LANG=en_US.UTF-8 PS1=(py27) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$ HOME=/home/user M2_HOME=/opt/apache-maven-3.5.2 SHLVL=1 CONDA_PS1_BACKUP=\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$ LOGNAME=user SSH_CONNECTION= CONDA_DEFAULT_ENV=py27 LESSOPEN=| /usr/bin/lesspipe %s XDG_RUNTIME_DIR=/run/user/1000 LESSCLOSE=/usr/bin/lesspipe %s %s _=/usr/bin/env /etc/hadoop/conf/hadoop-env.sh: # Prepend/Append plugin parcel classpaths if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}} : else # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}} : fi # JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}} export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-map$ export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS" export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
... View more
03-19-2018
10:33 AM
Hello Harsh, Thanks for getting back to me. On the checks: - The host is shown to be commisioned as a Spark Gateway in Cloudera Manager. Under /etc/spark/conf, I see the following files: docker.properties.template, log4j.properties.template, slaves.template, spark-defaults.conf.template, spark-env.sh.template, fairscheduler.xml.template, metrics.properties.template, spark-defaults.conf, spark-env.sh Is there an explicit classpath file that I should see or are you referring to the SPARK_DIST_CLASSPATH variable that is set in spark-env.sh? Should I add the hadoop-hdfs-2.6.0-cdh5.12.0.jar to this classpath? - I don't bundle any project jars in the Spark App. - There were no global environment variables using 'env' that ended in or carried 'CLASSPATH' in their name
... View more
03-13-2018
03:56 PM
Hello Harsh, I ran into the same problem as the OP. I found no /usr/lib/hadoop directories on the machine. The output of hadoop classpath is /etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//* The output of ls -ld /opt/cloudera/parcels/CDH is /opt/cloudera/parcels/CDH -> CDH-5.12.0-1.cdh5.12.0.p0.29 When running Spark jobs, I am able to solve this issue by adding the /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/hadoop-hdfs-2.6.0-cdh5.12.0.jar to --jars flag of spark-submit. Hence, I think for some reason the jar is not being loaded into the dependencies automatically by Cloudera Manager. Would you know of a fix for this?
... View more