Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Livy "Interpreter died" with PySpark

avatar
New Contributor

Hello,

 

 

On top of my ubuntu based docker image (Python 3.6, Spark 2.2.2) I installed Livy following https://github.com/cloudera/livy, and trying to create a spark server in the local mode (to start with).

 

In the building process, I specified my Spark version like following :  

export SPARK_VERSION=2.2.2

mvn -DskipTests -Dspark.version=$SPARK_VERSION clean package

 

It works perfectly with "Spark Example" in https://github.com/cloudera/livy. 

However, "PySpark Example" does not work giving the message "Interpreter died" (Please see below)

 

I tried different settings in the file under  /usr/local/livy/conf folder without success.

 

Can anyone tell me what can be the cause, and how to debug ? (It does not say anything in the Spark log...)

 

I'd very much appreciate your help.

 

 

Here is a snippet of the client :

 

import json, pprint, requests, textwrap
host = 'http://localhost:8998'
data = {'kind': 'pyspark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
r.json()
{'id': 0, 'appId': None, 'owner': None, 'proxyUser': None, 'state': 'starting', 'kind': 'pyspark', 'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None}, 'log': []}
 
 
 
session_url = host + r.headers['location']
r2 = requests.get(session_url, headers=headers)
r2.json()
{'id': 0,
'appId': None,
'owner': None,
'proxyUser': None,
'state': 'idle',
'kind': 'pyspark',
'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None},
'log': []}
 
 
statements_url = session_url + '/statements'

data = {
  'code': textwrap.dedent("""
    import random
    NUM_SAMPLES = 100000
    def sample(p):
      x, y = random.random(), random.random()
      return 1 if x*x + y*y < 1 else 0

    count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b)
    print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))
    """)
}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
pprint.pprint(r.json())
 
{'id': 0, 'output': None, 'progress': 0.0, 'state': 'waiting'}
 
 
 
r = requests.get(statements_url, headers=headers)
pprint.pprint(r.json())
 
{'statements': [{'id': 0, 'output': {'ename': 'Error', 'evalue': 'Interpreter died:\n', 'execution_count': 0, 'status': 'error', 'traceback': []}, 'progress': 1.0, 'state': 'available'}], 'total_statements': 1}
 
 
 
Who agreed with this topic