Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

SPARK Throwing error while using pyspark on sql context

Explorer

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/

Using Python version 2.7.14 (default, Dec 7 2017 17:05:42)
SparkSession available as 'spark'.
>>>
>>> df=spark.sql('select * from sws_dev.vw_dlx_rpr_ordr_dtl_base limit 1').show()
[Stage 0:=====================> (18 + 28) / 46]20/03/03 07:01:08 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:217)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1386)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/3c/temp_shuffle_8450fcd1-d97c-4c34-ac52-196e03030bf9 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/21/temp_shuffle_19e93f90-4de2-43c9-a715-c8668e96d793 (Too many open files)

1 ACCEPTED SOLUTION

Explorer

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .

View solution in original post

2 REPLIES 2

Explorer

Tried verbose mode and still finding this issues !! 

Explorer

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.