Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SPARK Throwing error while using pyspark on sql context

Solved Go to solution

SPARK Throwing error while using pyspark on sql context

Explorer

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/

Using Python version 2.7.14 (default, Dec 7 2017 17:05:42)
SparkSession available as 'spark'.
>>>
>>> df=spark.sql('select * from sws_dev.vw_dlx_rpr_ordr_dtl_base limit 1').show()
[Stage 0:=====================> (18 + 28) / 46]20/03/03 07:01:08 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/03/temp_shuffle_280c5065-f954-4ec8-b3d0-7c1f5c18b581 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:217)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1386)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/3c/temp_shuffle_8450fcd1-d97c-4c34-ac52-196e03030bf9 (Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20/03/03 07:01:08 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.io.FileNotFoundException: /tmp/blockmgr-c5bcbbe3-8da0-44a0-8025-1b183c81d532/21/temp_shuffle_19e93f90-4de2-43c9-a715-c8668e96d793 (Too many open files)

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: SPARK Throwing error while using pyspark on sql context

Explorer

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .

View solution in original post

2 REPLIES 2
Highlighted

Re: SPARK Throwing error while using pyspark on sql context

Explorer

Tried verbose mode and still finding this issues !! 

Highlighted

Re: SPARK Throwing error while using pyspark on sql context

Explorer

Fixed:

 

This is what i infered, while running spark the mode is made as client as you see below:

 

Parsed arguments:
master local[*]
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark2-client/conf/spark-defaults.conf
driverMemory 4g
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
driverExtraJavaOptions null
supervise false
queue default
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource pyspark-shell
name PySparkShell
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true

 

 

 

 

When we use --master yarn this gets success !! .

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here