About srowen

Pramatha · ‎06-12-2017

@srowen Thank you so much for the quick response.

rotciv · ‎05-16-2017

You're right, the reason is that I didn't initialize a SparkContext until receiving a message from kafka.

srowen · ‎05-13-2017

Yes, been available for a while, but it's a separate parallel install so as to not replace Spark 1.x https://www.cloudera.com/downloads/spark2/2-1.html

kenji · ‎05-08-2017

Since Spark 1.6.1 spark-submit takes no wait option. I bet many people faced the same problem 🙂 --conf spark.yarn.submit.waitAppCompletion=false

jpayne1 · ‎04-19-2017

In case anyone else has this issue, the documentation for CDH 5.10 is incorrect. https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25 It says to set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. I imagine this would be correct if you run Spark in stand-alone mode. However, if you run in yarn-client or yarn-cluster, the PYSPARK_PYTHON variable has to be set in YARN. The driver variable isn't relevant. It appears to be only relvent if you want to run it through a notebook. I didn't have to do any of the steps the docs say to do for yarn-cluster either. YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve) PYSPARK_PYTHON="/usr/bin/python" http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488

tarekabouzeid91 · ‎04-05-2017

yes you have to

wkupersa · ‎04-03-2017

I tried this and while my job is still running, it looks like it has gotten farther than it has in the past. Thanks!

mtrepanier · ‎03-31-2017

Odd, hdfs dfs -ls etc. all seem to be working. As well, if I run through the "Getting Started" tutorial I don't seem to encounter any issues. In regards to the IOException when attempting to write to disk (see below), is this just tied to the user behind Spark2 not having write priviledges to that location? Error summary: IOException: Mkdirs failed to create file:/home/cloudera/Documents/hail-workspace/source/out.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311444_0001_m_000000_3

lorenz984b · ‎03-13-2017

Ok, Thanks, Lorenzo

srirocky · ‎03-03-2017

@srowen i don't think the upstream transformations and getting the stream's are not causing any delays ...as highlighted in the pic below .....only thing that's in minutes is foreachRDD (Eventhough there is no code in in it) Stage Execution times

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Oryx 2 recommenadtion || increase Throughput

Re: Spark2.1 in 5.11 can't start a yarn cluster jo...

Re: Does Spark 2 supported CDH 5.11.0 ?

Re: What is the correct way to start/stop spark st...

Re: ImportError: No module named numpy (after re-d...

Re: [CDH 5.3] Spark -hive integration issue

Re: Can't specify 2G Heap

Re: Spark2 Unable to write to HDFS (Or Local)

Re: Build failed for spark thriftserver - CDH5.10....

Re: Spark : How to speedup foreachRDD?