Reply
s_l
New Contributor
Posts: 3
Registered: ‎04-18-2018

PriviledgedActionException from pyspark

[ Edited ]

I am tryting to run a sample code to use a python file for helper functions. I am able to add the file in notebook, but when it tries to run the transform it gives PriviledgedActionException

 

My Notebook

 

from pyspark.sql import SparkSession, SQLContext, HiveContext
import os
os.environ['SPARK_HOME'] = "/cloudera/parcels/SPARK2/lib/spark2/"
os.environ['PYSPARK_PYTHON'] = "/cloudera/parcels/Anaconda/bin/python2.7"
os.environ['PYTHONPATH'] = "/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip:/cloudera/parcels/SPARK2/lib/spark2/python/"
def create_sparksession():
spark = None
spark = SparkSession.builder.master('yarn-client')
spark.appName('Notebook')
spark.config('spark.yarn.principal', 'xxxx')
spark.config('spark.yarn.keytab', 'xxxx.keytab')
spark.enableHiveSupport()
return spark.getOrCreate()
try:
spark.stop()
except:
spark = None

spark = create_sparksession()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
hiveContext = HiveContext(sc)

 

spark.sparkContext.addPyFile("/home/python/transform.py") #This is local file
from transform import test
rdd = sc.parallelize(range(10))

 

rdd.map(lambda x : test(x)).take(5)

 

transform.py

# text manipluation

from pyspark.sql.functions import udf, log
from pyspark.sql.types import *

def test(string):
  return string.tolower()

udf_test = udf(test, StringType())

 

Log Output

8/04/18 15:09:01 INFO client.TransportClientFactory: Successfully created connection to c370bdc.intqa.bigdata.int.thomsonreuters.com/10.204.138.62:7337 after 0 ms (0 ms spent in bootstraps)
18/04/18 15:09:01 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(12, c370bdc.intqa.bigdata.int.thomsonreuters.com, 44993, None)
18/04/18 15:09:01 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 25
18/04/18 15:09:01 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 25)
18/04/18 15:09:01 INFO executor.Executor: Fetching spark://10.204.75.147:40863/files/transform.py with timestamp 1524081925434
18/04/18 15:09:01 INFO client.TransportClientFactory: Successfully created connection to /10.204.75.147:40863 after 0 ms (0 ms spent in bootstraps)
18/04/18 15:09:01 INFO util.Utils: Fetching spark://10.204.75.147:40863/files/transform.py to /data/1/yarn/nm/usercache/m0162109/appcache/application_1521224960293_80110/spark-26e68340-58c3-400a-8acb-de49d6323b63/fetchFileTemp1202050016760207947.tmp
18/04/18 15:09:01 INFO util.Utils: Copying /data/1/yarn/nm/usercache/m0162109/appcache/application_1521224960293_80110/spark-26e68340-58c3-400a-8acb-de49d6323b63/-16735339791524081925434_cache to /data/12/yarn/nm/usercache/m0162109/appcache/application_1521224960293_80110/container_e166_1521224960293_80110_01_000021/./transform.py
18/04/18 15:09:01 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 3
18/04/18 15:09:01 INFO client.TransportClientFactory: Successfully created connection to /10.204.75.147:37267 after 0 ms (0 ms spent in bootstraps)
18/04/18 15:09:01 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.8 KB, free 912.3 MB)
18/04/18 15:09:01 INFO broadcast.TorrentBroadcast: Reading broadcast variable 3 took 80 ms
18/04/18 15:09:01 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.0 KB, free 912.3 MB)
WARNING: User-defined SPARK_HOME (/hadoop/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2) overrides detected (/hadoop/cloudera/parcels/SPARK2/lib/spark2).
WARNING: Running spark-class from user-defined location.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation

 

 

If I define the function within the notebook, it works fine. Separating it into another file is when the issue happens

Posts: 454
Topics: 13
Kudos: 75
Solutions: 41
Registered: ‎09-02-2016

Re: PriviledgedActionException from pyspark

@s_l

 

This is a kerberos issue, kinit once and try again

s_l
New Contributor
Posts: 3
Registered: ‎04-18-2018

Re: PriviledgedActionException from pyspark

DId that, didn't help!

Expert Contributor
Posts: 93
Registered: ‎01-08-2018

Re: PriviledgedActionException from pyspark

I sedond to that this is an issue with Kerberos.

If kinit didn't help then try to use:

  --principal PRINCIPAL       Principal to be used to login to KDC, while running on
                              secure HDFS.
  --keytab KEYTAB             The full path to the file that contains the keytab for the
                              principal specified above. This keytab will be copied to
                              the node running the Application Master via the Secure
                              Distributed Cache, for renewing the login tickets and the
                              delegation tokens periodically.

Moreover, some more infor regarding how do you use pyspark etc. would be helpful.

s_l
New Contributor
Posts: 3
Registered: ‎04-18-2018

Re: PriviledgedActionException from pyspark

It is definately not kerberos issue.

 

I am running a notebook from edge node.

 

This below code works fine using RDD, if it was kerberos this should fail as well. However using DF causes issues.

 

#This is in transform.py

import re

def test(string):
    string = re.sub(r"[^A-Za-z0-9(),!\.?\'\`]", " ", string)
    return string.strip().lower()

def apply_spacy(partition):
    import spacy
    #spacy_lib = 'en_core_web_lg' #Large English library
    #nlp = spacy.load(spacy_lib)
    nlp = spacy.blank('en')
    for row in partition:
            out = test(row[0])
            yield nlp(out)

In the notebook shell

%reload_ext autoreload
%autoreload 2
from pyspark.sql import SparkSession, SQLContext, HiveContext
import os


os.environ['SPARK_HOME'] = "/cloudera/parcels/SPARK2/lib/spark2/"
os.environ['PYSPARK_PYTHON'] = "./xxx/bin/python" 
os.environ['PYSPARK_PYTHON_DRIVER'] = "/home/xxx/python/xxx/bin/python"
os.environ['PYTHONPATH'] = "/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip:/hadoop/cloudera/parcels/SPARK2/lib/spark2/python/"

def create_sparksession():
    spark = None
    spark = SparkSession.builder.master('yarn-client')
    spark.appName('Notebook')
    spark.config('spark.yarn.principal', 'xxx@DOMAIN')
    spark.config('spark.yarn.keytab', '/home/xxx/xxx.keytab')
    spark.config("spark.yarn.dist.archives","hdfs://xxx.zip#xxx")
    spark.config("spark.yarn.appMasterEnv.PYSPARK_PYTHON","./xxx/bin/python")
    spark.config("spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON","/home/xxx/python/xxx/bin/python")
    spark.enableHiveSupport()
    return spark.getOrCreate()    
    
    
try: 
    spark.stop()
except:
    spark = None

spark = create_sparksession()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
hiveContext = HiveContext(sc)

input = spark.read.parquet("/myfile/")

spark.sparkContext.addPyFile("/home/xxx/python/transform.py")
import transform 

data = input.rdd.mapPartitions(transform.apply_spacy)
data.take(10)
Highlighted
Expert Contributor
Posts: 93
Registered: ‎01-08-2018

Re: PriviledgedActionException from pyspark

According to your logs 

18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation

Expert Contributor
Posts: 93
Registered: ‎01-08-2018

Re: PriviledgedActionException from pyspark

According to your logs

 

18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation: PriviledgedActionException as:m0162109 (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/04/18 15:09:05 WARN security.UserGroupInformation

 

 

This is a Kerberos issue.

If you are using HUE notebook, Kerberos authentication has been done from hue user which is also allowed to impersonate your account "m0162109". If you are using other Notebook, then probably you do the authentication when you start the notebook.

Posts: 454
Topics: 13
Kudos: 75
Solutions: 41
Registered: ‎09-02-2016

Re: PriviledgedActionException from pyspark

@s_l

 

There are two possibilities for this issue

1. Kerberos - but you are sure that it is not related to kerberos

2. Environment variable set to a wrong path (or) old version - I can see from your code that you have used few environment variables, go to the below mentioned path and make sure the (binary) file that you are referring is available in the path, if you have upgraded any of your software then it may maintain multiple versions, so speicify the correct one. I've included the JAVA_HOME as well

 

 

'SPARK_HOME' = "/cloudera/parcels/SPARK2/lib/spark2/"
'PYSPARK_PYTHON' = "./xxx/bin/python" 
'PYSPARK_PYTHON_DRIVER' = "/home/xxx/python/xxx/bin/python"
'PYTHONPATH' = "/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip:/hadoop/cloudera/parcels/SPARK2/lib/spark2/python/"


JAVA_HOME=/usr/java

 

Announcements