About srowen

PandurangB · ‎01-18-2022

For my case, I observed the spark job was working fine on some hosts and hitting the above exception for a couple of worker hosts. Found that the issue with spark-submit --version on hosts. working hosts spark-submit version was version 2.4.7.7.1.7.0-551 and non-working hosts spark-submit version was version 3.1.2 I created the symbolic link with the correct spark-submit version file and the issue got resolved. ``` [root@host bin]# cd /usr/local/bin [root@hostbin]# ln -s /etc/alternatives/spark-submit spark-submit ```

JavaSpark · ‎12-10-2019

What were the memory limits you changed in YARN configuration? Please post them.. It will be helpful for me solving similar issue in my application..

AKR · ‎07-26-2019

Hi Pal, Can you grep for the particular application ID in the folder /user/spark/applicationHistory to make sure whether the job has been successfully completed or still in .inprogress state? Thanks AKR

SiddhSpark · ‎09-05-2018

Thanks for this clarification. I also had the same qurery ragrding memory issue while loading data. Here you cleared doubt about file loading from HDFS. I have a similar question but the source is a local server or Cloud storage where the data size is more than driver memory ( let's say 1 GB in this case where the driver memory is 250 MB). If I fire command val file_rdd = sc.textFile("/path or local or S3") shoud Spark load the data or as you mentioned above will throgh exception? Also, is there a way to print driver available memroy in Terminal? Many Thanks, Siddharth Saraf

ashwarg · ‎04-24-2018

Can you expand on this? Am pretty new to spark and this is marked as the solution. Also, since dynamicAllocation can handle this why would an user not want to enable that instead?

arpitbh · ‎05-01-2017

Hi, i am facing the below mentioned issue. Please help me to solve it 17/05/02 11:07:13 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b java.io.IOException: Failed to delete: C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

thomeme · ‎11-29-2016

we had an similiar problem running Accumulo 1.7.2 (parcel based) on CDH5. Unfortunately CDH5 bundles Accumul 1.6.0 jars by default. Our workaround was to modify SPARK_DIST_CLASSPATH via Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh – Spark (Service-Wide) SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-core.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-fate.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-start.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-trace.jar:$SPARK_DIST_CLASSPATH export SPARK_DIST_CLASSPATH This way you can add or redefine SPARK_DIST_CLASSPATH

srowen · ‎08-26-2016

It has always been documented in "Known Issues": https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html Generally speaking, there aren't differences. Not supported != different. However there are some pieces that aren't shipped like the thrift server and SparkR. Usually differences crop up when upstream introduces a breaking change and it can't be followed in a minor release. For example: default in CDH is for the "legacy" memory config parameters to be active so that default memory config doesn't change in 1.6. Sometimes it relates to other stuff in the platfrom that can't change, like I think the Akka version is (was) different because other stuff in Hadoop needed a different version. The biggest example of this IMHO is Spark Streaming + Kafka. Spark 1.x doesn't support Kafka 0.9+ but CDH 5.7+ had to move to it to get security features. So CDH Spark 1.6 will actually only work with 0.9+ because the Kafka differences are mutually incompatible. Good in that you can use recent Kafka, but, a difference! Most of it though are warnings about incompatibilities between what Spark happens to support and what CDH ships in other components.

butkiz · ‎04-20-2016

It is ok to See no spark worker and Master roll in CM?

srowen · ‎03-07-2016

It includes an implementation of classification using random decision forests. Decision forests actually support both categorical and numeric features. However, for text classification, you're correct that you typically transform your text into numeric vectors via TF-IDF first. This is something you'd have to do separately. Yes, the dimensionality is high. Decision forests can be fine with this, but, they're not the most natural choice for text classification. You may see what I mean that Oryx is not a tool for classification, but a tool for productionizing, which happens to have an implementation of a classifier. In 2.x, you also have an implementation of decision forests, and also don't have magic TF-IDF built in or anything. However the architecture is much more supportive of putting your own Spark-based pipeline and model build into the framework. 1.x did not support this.

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: java.lang.IllegalArgumentException: java.net.U...

Re: Getting "Job cancelled because SparkContext wa...

Re: No Completed Application Found in Spark Histor...

Re: Memory Issues in while accessing files in Spar...

Re: Idle Spark Shells

Re: Spark program in eclipse

Re: Override libraries for spark

Re: CDH 5.6

Re: Spark on Yarn Vs Stand alone?

Re: How to use Oryx 1 to detect spam email