I am using PySpark 1.5 on CDH 5.5.2. I am using Python Jupyter to access Spark. I have it installed on only my Driver Node. I am getting an error when I try to execute jobs now, that I previously didn't get before the move from CDH 5.5 -> CDH 5.5.2. "Error from python worker: /usr/local/bin/python3: No module named 'zlib' "
... View more
I am using CDH 5.5 (Spark 1.5). I am using the Pyspark terminal to try to load data with JDBC from both MySQL and SQL Server. But getting issues with both. In particular, I have run the " GRANT ALL PRIVILEGES ON *.* TO '<user>'@'<HDFS edge NODE IP>' IDENTIFIED BY '<password>' WITH GRANT OPTION; FLUSH PRIVILEGES; " in the MySQL I am trying to load from, but am getting the same error each time. Py4JJavaError : An error occurred while calling o43.load.
: java.sql.SQLException: Access denied for user <user>@ <hdfs edge node IP> I have successfully used telnet from the Edge Node IP where I am SSH'd into and running Pyspark; into the MySQL IP, port 3306. I have also used MySQL command line from the Edge Node IP (through SSH), to successfully connect to the MYSQL remote database and run a query. I have the following in my .bashrc export PYSPARK_SUBMIT_ARGS="--conf spark.executor.extraClassPath="/var/lib/sqoop2/mysql-connector-java.jar" --driver-class-path "/var/lib/sqoop2/mysql-connector-java.jar" --jars "/var/lib/sqoop2/mysql-connector-java.jar" --master yarn --deploy-mode client" export SPARK_CLASSPATH="/var/lib/sqoop2/mysql-connector-java.jar" I am not sure what is left to do?
... View more
I have a Cloudera 5 distribution with Spark 1.3. I installed IPython 1.2.1 to match with the Python 2.6.6. on the CentOS 6. I followed this tutorial (https://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/comment-page-1/#comment-72862) by placing the 2 files (’00-pyspark-setup.py’ & ‘ipython_notebook_config.py’; with proper Spark Directory listed) in my management node home directory and I SSH’d into the management node and first created the environment variable for SPARK_HOME. Then I launched “ipython notebook –profile=pyspark”. Although after launching Python 2 in Ipython Notebook browser, “from pyspark import SparkConf, SparkContext” and “sc” commands both were not recognized at all and was given import errors. I tried a different way by first declaring ” export SPARK_HOME=’/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p898.573/lib/spark’ PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH export PYSPARK_DRIVER_PYTHON_OPTS=”notebook –profile=pyspark” ” And launching with “pyspark” This gave me a more promising error. “ImportError: No module named ‘SocketServer'” ….when trying to run “from pyspark.context import SparkContext” I think that obviously I'm not starting pyspark when I launch Ipython but I'm not sure why? Here is what I have in my '00-pyspark-setup.py' file < import os import sys spark_home = os.environ.get('SPARK_HOME', None) if not spark_home: raise ValueError('SPARK_HOME environment variable is not set') sys.path.insert(0, os.path.join(spark_home, 'python')) sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip')) execfile(os.path.join(spark_home, 'python/pyspark/shell.py')) >
... View more