Support Questions

Govins · ‎03-03-2020

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/

Using Python version 2.7.14 (default, Dec 7 2017 17:05:42)
SparkSession available as 'spark'.
>>>
>>> df=spark.sql('select * from sws_dev.vw_dlx_rpr_ordr_dtl_base limit 1').show()
[Stage 0:=====================> (18 + 28) / 46]20/03/03 07:01:08 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:217)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1386)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/3c/temp_shuffle_8450fcd1-d97c-4c34-ac52-196e03030bf9 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/21/temp_shuffle_19e93f90-4de2-43c9-a715-c8668e96d793 (Too many open files)

Govins · ‎03-03-2020

Fixed:

This is what i infered, while running spark the mode is made as client as you see below:

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

When we use --master yarn this gets success !! .

View solution in original post

Govins · ‎03-03-2020

Tried verbose mode and still finding this issues !!

Govins · ‎03-03-2020

Fixed:

This is what i infered, while running spark the mode is made as client as you see below:

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

When we use --master yarn this gets success !! .

Cloudera Community

Support Questions

SPARK Throwing error while using pyspark on sql context

Spark (PySpark) to extract from SQL Server

Spark Structured Streaming with NiFi and Kafka (us...

JSON to SQL using Spark

Pyspark - Spark SQL

Providing tracing with Spark Caller Context

pyspark using Spark 2.3

Analyze Resource Manager REST API stats using Spar...

Pyspark Streaming Wordcount Example

Hive query with where clasue throwing error using ...

Spark SQL error when using parquet table