About rvenkatesh

rvenkatesh · ‎03-27-2017

@dbalasundaran expanding a bit more on @bbihari's answer above, we have two possible ways to define metastores within a HDC cluster skeleton: 1. Register the metastore first, either through CLI or through the UI, this creates a named metastore entry 2. Refer to the named metastore from your cluster creation skeleton The skeleton created by the "SHOW CLI JSON" option assumes this is what you are trying to do and that it what it generates by default. The alternative way is to define a metastore inline within your CLI definition (as @Dominika Bialek illustrates above). I would recommend the first flow for automation since the intent of what you are trying to do is clearer. HTH Ram

rvenkatesh · ‎02-14-2016

Hi @Goutham Koneru The issue here is we need to pass PYTHONHASHSEED=0 to the executors as an environment variable. One way to do that is to export SPARK_YARN_USER_ENV=PYTHONHASHSEED=0 and then invoke spark-submit or pyspark. With this change, my pyspark repro that used to hit this error runs successfully. export PYSPARK_PYTHON=/usr/local/bin/python3.3 export PYTHONHASHSEED=0 export SPARK_YARN_USER_ENV=PYTHONHASHSEED=0 bin/pyspark --master yarn-client --executor-memory 512m n = sc.parallelize(range(1000)).map(str).countApproxDistinct()

Online	Offline
Last Visited	‎12-13-2021 12:53 AM

Member Since	‎09-26-2015 01:47 PM
Last Visited	‎12-13-2021 12:53 AM
Posts	5
Kudos received	5

Cloudera Community

Re: 'SHOW CLI JSON' doesn't have properties from n...

Re: PYSPARK with different python versions on yarn...

Re: 'SHOW CLI JSON' doesn't have properties from n...

Re: PYSPARK with different python versions on yarn...