Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SPARK Throwing error while using pyspark on sql context

avatar
Contributor

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/

Using Python version 2.7.14 (default, Dec 7 2017 17:05:42)
SparkSession available as 'spark'.
>>>
>>> df=spark.sql('select * from sws_dev.vw_dlx_rpr_ordr_dtl_base limit 1').show()
[Stage 0:=====================> (18 + 28) / 46]20/03/03 07:01:08 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:217)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1386)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/3c/temp_shuffle_8450fcd1-d97c-4c34-ac52-196e03030bf9 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/21/temp_shuffle_19e93f90-4de2-43c9-a715-c8668e96d793 (Too many open files)

1 ACCEPTED SOLUTION

avatar
Contributor

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .

View solution in original post

2 REPLIES 2

avatar
Contributor

Tried verbose mode and still finding this issues !! 

avatar
Contributor

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .