Welcome to the Cloudera Community

TY · ‎10-22-2018

Hello,

On top of my ubuntu based docker image (Python 3.6, Spark 2.2.2) I installed Livy following https://github.com/cloudera/livy, and trying to create a spark server in the local mode (to start with).

In the building process, I specified my Spark version like following :

export SPARK_VERSION=2.2.2

mvn -DskipTests -Dspark.version=$SPARK_VERSION clean package

It works perfectly with "Spark Example" in https://github.com/cloudera/livy.

However, "PySpark Example" does not work giving the message "Interpreter died" (Please see below)

I tried different settings in the file under /usr/local/livy/conf folder without success.

Can anyone tell me what can be the cause, and how to debug ? (It does not say anything in the Spark log...)

I'd very much appreciate your help.

Here is a snippet of the client :

import json, pprint, requests, textwrap
host = 'http://localhost:8998'
data = {'kind': 'pyspark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
r.json()

{'id': 0, 'appId': None, 'owner': None, 'proxyUser': None, 'state': 'starting', 'kind': 'pyspark', 'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None}, 'log': []}

session_url = host + r.headers['location']
r2 = requests.get(session_url, headers=headers)
r2.json()

{'id': 0,
'appId': None,
'owner': None,
'proxyUser': None,
'state': 'idle',
'kind': 'pyspark',
'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None},
'log': []}

statements_url = session_url + '/statements'

data = {
  'code': textwrap.dedent("""
    import random
    NUM_SAMPLES = 100000
    def sample(p):
      x, y = random.random(), random.random()
      return 1 if x*x + y*y < 1 else 0

    count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b)
    print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))
    """)
}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
pprint.pprint(r.json())

{'id': 0, 'output': None, 'progress': 0.0, 'state': 'waiting'}

r = requests.get(statements_url, headers=headers)
pprint.pprint(r.json())

{'statements': [{'id': 0, 'output': {'ename': 'Error', 'evalue': 'Interpreter died:\n', 'execution_count': 0, 'status': 'error', 'traceback': []}, 'progress': 1.0, 'state': 'available'}], 'total_statements': 1}

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

Livy "Interpreter died" with PySpark