Member since
09-26-2015
5
Posts
14
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2679 | 03-27-2017 07:46 PM | |
26519 | 02-14-2016 07:35 AM |
03-27-2017
07:46 PM
2 Kudos
@dbalasundaran expanding a bit more on @bbihari's answer above, we have two possible ways to define metastores within a HDC cluster skeleton: 1. Register the metastore first, either through CLI or through the UI, this creates a named metastore entry 2. Refer to the named metastore from your cluster creation skeleton The skeleton created by the "SHOW CLI JSON" option assumes this is what you are trying to do and that it what it generates by default. The alternative way is to define a metastore inline within your CLI definition (as @Dominika Bialek illustrates above). I would recommend the first flow for automation since the intent of what you are trying to do is clearer. HTH Ram
... View more
02-14-2016
07:35 AM
3 Kudos
Hi @Goutham Koneru The issue here is we need to pass PYTHONHASHSEED=0 to the executors as an environment variable. One way to do that is to export SPARK_YARN_USER_ENV=PYTHONHASHSEED=0 and then invoke spark-submit or pyspark. With this change, my pyspark repro that used to hit this error runs successfully. export PYSPARK_PYTHON=/usr/local/bin/python3.3 export PYTHONHASHSEED=0 export SPARK_YARN_USER_ENV=PYTHONHASHSEED=0 bin/pyspark --master yarn-client --executor-memory 512m n = sc.parallelize(range(1000)).map(str).countApproxDistinct()
... View more