Support Questions

BrianChan · ‎03-28-2023

Hi all, I am exploring the features in my CDP cluster.

I added Spark service to the cluster, when I try to study Spark and run pyspark in terminal, I got the following error:

Type "help", "copyright", "credits" or "license" for more information.
Warning: Ignoring non-Spark config property: hdfs
Warning: Ignoring non-Spark config property: ExitCodeException
Warning: Ignoring non-Spark config property: at
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/29 02:47:40 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/03/29 02:47:43 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/03/29 02:47:49 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/home/asl/2023-03-28 23:17:30,775 WARN [TGT Renewer for asl@MY.CLOUDERA.LAB] security.UserGroupInformation (UserGroupInformation.java:run(1026)) - Exception encountered while running the renewal command for asl@MY.CLOUDERA.LAB. (TGT end time:1680069424000, renewalFailures: 0, renewalFailuresTotal: 1) does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:755)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1044)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:745)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
at org.apache.spark.deploy.history.EventLogFileWriter.requireLogBaseDirAsDirectory(EventLogFileWriters.scala:76)
at org.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:220)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:84)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:536)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
23/03/29 02:47:49 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
23/03/29 02:47:49 WARN spark.SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
23/03/29 02:47:49 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/03/29 02:47:54 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/home/asl/2023-03-28 23:17:30,775 WARN [TGT Renewer for asl@MY.CLOUDERA.LAB] security.UserGroupInformation (UserGroupInformation.java:run(1026)) - Exception encountered while running the renewal command for asl@MY.CLOUDERA.LAB. (TGT end time:1680069424000, renewalFailures: 0, renewalFailuresTotal: 1) does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:755)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1044)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:745)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
at org.apache.spark.deploy.history.EventLogFileWriter.requireLogBaseDirAsDirectory(EventLogFileWriters.scala:76)
at org.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:220)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:84)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:536)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
23/03/29 02:47:54 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/shell.py", line 41, in <module>
spark = SparkSession._create_shell_session()
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/session.py", line 583, in _create_shell_session
return SparkSession.builder.getOrCreate()
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/context.py", line 369, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/context.py", line 136, in __init__
conf, jsc, profiler_cls)
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/context.py", line 198, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/context.py", line 308, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
answer, self._gateway_client, None, self._fqn)
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: File file:/home/asl/2023-03-28 23:17:30,775 WARN [TGT Renewer for asl@MY.CLOUDERA.LAB] security.UserGroupInformation (UserGroupInformation.java:run(1026)) - Exception encountered while running the renewal command for asl@MY.CLOUDERA.LAB. (TGT end time:1680069424000, renewalFailures: 0, renewalFailuresTotal: 1) does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:755)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1044)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:745)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
at org.apache.spark.deploy.history.EventLogFileWriter.requireLogBaseDirAsDirectory(EventLogFileWriters.scala:76)
at org.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:220)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:84)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:536)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

I can't figure out the cause of this issue. Please kindly help me out of this. Thank you.

nikhilm · ‎03-29-2023

Hi @BrianChan, This is a known CM issue, which incurs a bad spark-defaults.conf generated by the CM Agent after some users (or applications) logged the hosts as root and kinited as some user (for example, hdfs), and left that ticket cache around.

To avoid the issue, do the following:

Navigate to Cloudera Manager > Spark > Configuration.
Add the following in the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf.
Ensure a correct value is set:
```
spark.eventLog.dir=hdfs://nameserviceXYZ/user/spark/applicationHistory
```
Deploy the client configuration

If this helps to resolve the issue, please accept this as a solution. Thanks.

View solution in original post

RangaReddy · ‎03-30-2023

Hi @BrianChan

If your cluster is enabled HDFS HA cluster then you will get the namespace from hdfs-site.xml file.

If your cluster is not enabled HDFS HA then simply you can specify like below

spark.eventLog.dir=/user/spark/applicationHistory

View solution in original post

BrianChan · ‎03-30-2023

Thank you @RangaReddy, I managed to solve the problem using your advice. Thank you very much.

View solution in original post

nikhilm · ‎03-29-2023

Hi @BrianChan, This is a known CM issue, which incurs a bad spark-defaults.conf generated by the CM Agent after some users (or applications) logged the hosts as root and kinited as some user (for example, hdfs), and left that ticket cache around.

To avoid the issue, do the following:

Navigate to Cloudera Manager > Spark > Configuration.
Add the following in the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf.
Ensure a correct value is set:
```
spark.eventLog.dir=hdfs://nameserviceXYZ/user/spark/applicationHistory
```
Deploy the client configuration

If this helps to resolve the issue, please accept this as a solution. Thanks.

BrianChan · ‎03-29-2023

@nikhilm Thank you for your reply.

May I know what should I input for nameserviceXYZ? Please give some example for me if possible.

BrianChan · ‎03-30-2023

Thank you @nikhilm, your advice works.

RangaReddy · ‎03-30-2023

Hi @BrianChan

If your cluster is enabled HDFS HA cluster then you will get the namespace from hdfs-site.xml file.

If your cluster is not enabled HDFS HA then simply you can specify like below

spark.eventLog.dir=/user/spark/applicationHistory

BrianChan · ‎03-30-2023

Thank you @RangaReddy, I managed to solve the problem using your advice. Thank you very much.

Cloudera Community

Support Questions

Fail to start pyspark session