28804
DISCUSSIONS
102195
MEMBERS
3161
ARTICLES
Created 10-23-2017 01:22 PM
Dear community,
I'm getting an error in Pycharm (CDH 5.8.0 and Spark 1.6.2) with the following code, that was working on my Mac in a standalone mode:
#!/usr/bin/env python # Imports import sys import os import logging # Path for spark source folder os.environ['SPARK_HOME'] = "/opt/cloudera/parcels/CDH/lib/spark" os.environ['JAVA_HOME'] = "/opt/jdk1.8.0_101/bin/" os.environ['PYSPARK_SUBMIT_ARGS'] = "--master yarn pyspark-shell" # Append pyspark to Python Path sys.path.append("/opt/cloudera/parcels/CDH/lib/spark/python/") sys.path.append("/opt/jdk1.8.0_101/bin/") try: from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import HiveContext logging.info("Successfully imported Spark Modules") except ImportError as e: logging.error("Can not import Spark Modules", e) sys.exit(1) # CONSTANTS APP_NAME = "Spark Application Template" # Main functionality def main(sc): logging.info("string main program: ") rdd = sc.parallelize(range(10000), 10) print rdd.mean() if __name__ == "__main__": # Configure OPTIONS conf = SparkConf().setAppName(APP_NAME) conf = conf.setMaster("yarn") sc = SparkContext(conf=conf) # set the log-level sc.setLogLevel("ERROR") # Execute Main functionality main(sc)
The Spark context cannot be created.
Using the pyspark shell, everything is working!!!
Thanks for the suggestions.
Cheers
Gerd