Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem with path to parcels

Problem with path to parcels

New Contributor

I'm using CDH 5.14.

 

In the CM in host config I have set Parcel Directory = /opt/cloudera/parcels

in the system: /opt/cloudera -> /data/cloudera

 

In the spark config files, spark-env.sh and spark-defaults everywhere /opt/cloudera is replaced to /data/cloudera.

Is it normal behaviour? Can't parcel dir be configured as symlink path?

 

Thanks!

5 REPLIES 5

Re: Problem with path to parcels

Super Guru

@wsmolak,

 

Interesting.  It appears the Spark2 CSD has the common.sh do this on purpose:

 

# Make sure PARCELS_ROOT is in the format we expect, canonicalized and without a trailing slash.
export PARCELS_ROOT=$(readlink -m "$PARCELS_ROOT")

So this is expected given your description.

"readlink -m will read any number of links and return the actual dir/file.

Re: Problem with path to parcels

New Contributor

Thanks @bgooley 

Should I create common.sh in /etc/spark2/conf ?

I can't locate such file in any spark2 conf folder.

 

Cheers!

 

Re: Problem with path to parcels

Super Guru

@wsmolak,

 

 

No, common.sh does not belong in /etc/spark2/conf

common.sh is located in your CSD

 

jar tvf SPARK2_ON_YARN-2.3.0.cloudera5-SNAPSHOT.jar
     0 Tue Jul 23 23:16:06 PDT 2019 META-INF/
    69 Tue Jul 23 23:16:06 PDT 2019 META-INF/MANIFEST.MF
     0 Tue Jul 23 23:16:06 PDT 2019 descriptor/
 25781 Tue Jul 23 23:16:06 PDT 2019 descriptor/service.sdl
     0 Wed Jul 17 20:46:54 PDT 2019 aux/
     0 Wed Jul 17 20:46:54 PDT 2019 aux/client/
  2224 Wed Jul 17 20:46:54 PDT 2019 aux/client/spark-env.sh
     0 Wed Jul 17 20:46:54 PDT 2019 images/
  3312 Wed Jul 17 20:46:54 PDT 2019 images/icon.png
     0 Wed Jul 17 20:46:54 PDT 2019 scripts/
 19696 Wed Jul 17 20:46:54 PDT 2019 scripts/common.sh
  1884 Wed Jul 17 20:46:54 PDT 2019 scripts/control.sh
     0 Wed Jul 17 23:17:24 PDT 2019 meta/
    24 Wed Jul 17 23:17:24 PDT 2019 meta/version

I'm curious why you are asking this question.  Are you seeing a problem and trying to solve it?

Re: Problem with path to parcels

New Contributor

hi @bgooley 

 

the problem is on one host where spark2 gateway runs we have parcels installed in /data/cloudera/parcels path. On the rest of the cluster, worker nodes included parcels are in /opt/cloudera/parcels. So on that one we've symlinked /opt/cloudera -> /data/cloudera.

 

When we run spark code in yarn mode we get following errors:

 

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, lxspkop010.at.inside, executor 1): org.apache.spark.SparkException: 
Error from python worker:
  /bin/python2: No module named pyspark
PYTHONPATH was:
  /opt/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/jars/spark-core_2.11-2.2.0.cloudera4.jar:/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/lib/py4j-0.10.7-src.zip:/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/::/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/lib/py4j-0.10.7-src.zip:/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/lib/pyspark.zip:/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/lib/py4j-0.10.7-src.zip:/data/cloudera/parcels/SPARK2-2.2.0.cloudera4-1.cdh5.13.3.p0.603055/lib/spark2/python/lib/pyspark.zip
java.io.EOFException

On the other gateways all is fine. The problem appears only for RDD API. Parcel directory is set to /opt/cloudera/parcels on that host configuration.

 

Cheers!

Highlighted

Re: Problem with path to parcels

Super Guru

@wsmolak,

 

Not sure of the elegant solution here, but I wonder if you were to add any paths that traverse the link with the "real" path to the following in the Spark2 config in CM:

 

Extra Python Path

 

In there error, maybe iterate over the PYTHONPATH items and make sure they exit.

 

Apart from that, perhaps an "strace" on your client to find out what it is looking for and where it is not finding it.

I see in your error that the pyspark module cannot be found, but it is not clear why just in that error information (at least to me)