Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2985 | 01-26-2018 04:02 AM | |
6293 | 12-22-2017 09:18 AM | |
3021 | 12-05-2017 06:13 AM | |
3284 | 10-16-2017 07:55 AM | |
9316 | 10-04-2017 08:08 PM |
06-12-2017
06:07 AM
@srowen Thank you so much for the quick response.
... View more
05-16-2017
03:29 AM
You're right, the reason is that I didn't initialize a SparkContext until receiving a message from kafka.
... View more
05-13-2017
01:54 AM
Yes, been available for a while, but it's a separate parallel install so as to not replace Spark 1.x https://www.cloudera.com/downloads/spark2/2-1.html
... View more
05-08-2017
10:53 AM
1 Kudo
Since Spark 1.6.1 spark-submit takes no wait option. I bet many people faced the same problem 🙂 --conf spark.yarn.submit.waitAppCompletion=false
... View more
04-19-2017
10:41 AM
1 Kudo
In case anyone else has this issue, the documentation for CDH 5.10 is incorrect. https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25 It says to set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. I imagine this would be correct if you run Spark in stand-alone mode. However, if you run in yarn-client or yarn-cluster, the PYSPARK_PYTHON variable has to be set in YARN. The driver variable isn't relevant. It appears to be only relvent if you want to run it through a notebook. I didn't have to do any of the steps the docs say to do for yarn-cluster either. YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve) PYSPARK_PYTHON="/usr/bin/python" http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488
... View more
04-05-2017
10:45 AM
yes you have to
... View more
04-03-2017
02:23 PM
I tried this and while my job is still running, it looks like it has gotten farther than it has in the past. Thanks!
... View more
03-31-2017
02:49 PM
1 Kudo
Odd, hdfs dfs -ls etc. all seem to be working. As well, if I run through the "Getting Started" tutorial I don't seem to encounter any issues. In regards to the IOException when attempting to write to disk (see below), is this just tied to the user behind Spark2 not having write priviledges to that location? Error summary: IOException: Mkdirs failed to create file:/home/cloudera/Documents/hail-workspace/source/out.vds/rdd.parquet/_temporary/0/_temporary/attempt_201703311444_0001_m_000000_3
... View more
03-13-2017
05:43 AM
Ok, Thanks, Lorenzo
... View more
03-03-2017
11:27 AM
@srowen i don't think the upstream transformations and getting the stream's are not causing any delays ...as highlighted in the pic below .....only thing that's in minutes is foreachRDD (Eventhough there is no code in in it) Stage Execution times
... View more