About rachmaninovquar

rachmaninovquar · ‎12-05-2016

Thanks @Artem Ervits , Ill update this once I figure out what we end up doing

rachmaninovquar · ‎12-05-2016

@Artem Ervits, that doc makes no reference to hawq or gp admin. When I install, I'm blocked because the Ambari Hawq installer tries to create a user gpadmin and /home/gpadmin. This is not allowed.

rachmaninovquar · ‎12-05-2016

Hi, My company's policy is to have all service accounts follow certain standards. A user named "gpadmin", does not meet our standards. Is there a way to have a different system user? I looked at the code and it looks like it could be modified, but generally that would eliminate the option of support.

rachmaninovquar · ‎02-26-2016

@Piotr Kuźmiak What I had to do in order to resolve was clone the latest zeppelin from: https://github.com/apache/incubator-zeppelin Build it using maven and then update my zeppelin-env.sh and put the port number I wanted in zeppelin-site.xml I didn't have to change anything in the Zeppelin GUI. Here is what is set in my zeppelin-env.sh: export MASTER=yarn-client export ZEPPELIN_PORT=8090 export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 -Dspark.yarn.queue=default" export SPARK_HOME=/usr/hdp/current/spark-client/ export HADOOP_CONF_DIR=/etc/hadoop/conf export PYSPARK_PYTHON=/usr/bin/python export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH

rachmaninovquar · ‎02-25-2016

There was a bug in Zeppelin, it was fixed by Mina Lee and then committed a day ago.

rachmaninovquar · ‎02-12-2016

@Neeraj Sabharwal I've also tried adding the pythonpath directly in the interpreter configs from the Zeppeling GUI, by creating a variable zeppelin.pyspark.pythonpath. I even tried exporting the PYTHONPATH variable from the Linux CLI. None of these worked. What bothers me, is that the pythonpath is not changing, and I'm always getting the same error shown above.

rachmaninovquar · ‎02-12-2016

@Neeraj Sabharwal The Jira issue and tutorial in your comments are completely unrelated to my issue. I previously found the link to the Apache mail archives. It's about using pyspark on yarn, which I can do via the CLI. The only problem is with Zeppelin. It ignores the pythonpath in zeppelin-env.sh (the pythonpath is the same as in spark-env.sh).

rachmaninovquar · ‎02-11-2016

Hi, I've been trying unsuccessfully to configure the pyspark interpreter on Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter from Zeppelin without issue. Here are the lines which aren't commented out in my zeppelin-env.sh file: export MASTER=yarn-client export ZEPPELIN_PORT=8090 export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 -Dspark.yarn.queue=default" export SPARK_HOME=/usr/hdp/current/spark-client/ export HADOOP_CONF_DIR=/etc/hadoop/conf export PYSPARK_PYTHON=/usr/bin/python export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH Running a simple pyspark script in the interpreter gives this error: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): org.apache.spark.SparkException: Error from python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar I've tried adding this line to zeppelin-env.sh, which gives the same error above: export PYTHONPATH=/usr/hdp/current/spark-client/python:/usr/hdp/current/spark-client/python/lib/pyspark.zip:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip I've tried everything I could find on Google, any advice for debugging or fixing this problem? Thanks, Ian Also, in case it's useful for debugging here are some commands and outputs below: System.getenv().get("MASTER") System.getenv().get("SPARK_YARN_JAR") System.getenv().get("HADOOP_CONF_DIR") System.getenv().get("JAVA_HOME") System.getenv().get("SPARK_HOME") System.getenv().get("PYSPARK_PYTHON") System.getenv().get("PYTHONPATH") System.getenv().get("ZEPPELIN_JAVA_OPTS") res49: String = yarn-client res50: String = null res51: String = /etc/hadoop/conf res52: String = /usr/jdk64/jdk1.7.0_45 res53: String = /usr/hdp/2.3.2.0-2950/spark res54: String = /usr/bin/python res55: String = /usr/hdp/2.3.2.0-2950/spark/python:/usr/hdp/2.3.2.0-2950/spark/python/build:/usr/hdp/current/spark-client//python/lib/py4j-0.8.2.1-src.zip:/usr/hdp/current/spark-client//python/:/usr/hdp/current/spark-client//python:/usr/hdp/current/spark-client//python/build:/usr/hdp/current/spark-client//python:/usr/hdp/current/spark-client//python/build: res56: String = -Dhdp.version=2.3.2.0-2950

Online	Offline
Last Visited	‎12-20-2016 11:54 PM

Member Since	‎02-11-2016 04:04 PM
Last Visited	‎12-20-2016 11:54 PM
Posts	10
Kudos received	7

Cloudera Community

Re: Can't get Pyspark interpreter to work on Zeppe...

Re: How to install Hawq without using "gpadmin" as...

Re: How to install Hawq without using "gpadmin" as...

How to install Hawq without using "gpadmin" as a u...

Re: Can't get Pyspark interpreter to work on Zeppe...

Re: Can't get Pyspark interpreter to work on Zeppe...

Re: Can't get Pyspark interpreter to work on Zeppe...

Re: Can't get Pyspark interpreter to work on Zeppe...

Can't get Pyspark interpreter to work on Zeppelin