About ed_day

Srini_hi · ‎01-26-2020

This worked! I already made these changes prior to running the last command. hdp-select status hadoop-client Set a couple of parameters export HADOOP_OPTS="-Dhdp.version=2.6.1.0-129” export HADOOP_CONF_DIR=/etc/hadoop/conf Source-in the environment source ~/get_env.sh Included last two lines to $SPARK_HOME/conf/spark-defaults.conf spark.driver.extraJavaOptions -Dhdp.version=2.6.1.0-129 spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.1.0-129 Added Hadoop version under Ambari / Yarn / Advanced / Custom: hdp.version=2.6.1.0-129 Ensure this runs okay yarn jar hadoop-mapreduce-examples.jar pi 5 5 Run spark pi example under yarn cd /home/spark/spark-2.4.4-bin-hadoop2.7 spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 5 --executor-cores 2 --conf spark.authenticate.enableSaslEncryption=true --conf spark.network.sasl.serverAlwaysEncrypt=true --conf spark.authenticate=true examples/jars/spark-examples_2.11-2.4.4.jar 100

falbani · ‎05-07-2018

@ed day If you found this answer helped address your question, please take a moment to login and click the "accept" link on the answer.

ed_day · ‎12-06-2017

@Jay Kumar SenSharma Thanks! Sorry forgot to say I am trying to run Spark 2.2 as an independent service that uses HDP2.6. I assum this won't work for it.

sandyy006 · ‎12-18-2017

@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately. Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations

tsharma · ‎11-30-2017

Check whether SPARK_HOME in interpreter settings points to correct pyspark. Is it set to below value? SPARK_HOME /usr/hdp/current/spark2-client/ Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread: https://issues.apache.org/jira/browse/ZEPPELIN-295 Do spark.driver.memory=4G, spark.driver.cores=2. Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796 Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.

ed_day · ‎11-29-2017

Wow thanks. I'll try these tomorrow when my latest slow job finishes.

ed_day · ‎11-24-2017

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

ed_day · ‎07-28-2017

It was a setting in tez.lib.uris. Changed it to: /hdp/apps/${hdp.version}/tez/tez.tar.gz,hdfs://master.royble.co.uk:8020/jars/json-serde-1.3.7-jar-with-dependencies.jar (Note: no space after comma and hdfs path).

ed_day · ‎07-26-2017

Did the job, thanks!

ed_day · ‎07-05-2017

Here is how you do it: Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11. Edit the spark2 interpreter and add the name. Save it and allow it to restart. In Zeppelin: %spark.dep z.reset() z.load("databricks:spark-corenlp:0.2.0-s_2.11")

Online	Offline
Last Visited	‎05-08-2018 12:07 PM

Member Since	‎06-23-2016 11:52 AM
Last Visited	‎05-08-2018 12:07 PM
Posts	136
Kudos received	8

Cloudera Community

Re: Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: How do I add github dependency to spark?

Re: Can someone confirm these repos exist for HDP ...

Re: Hive INSERT OVERWRITE struct NoMatchingMethodE...

Re: "bad substitution" error running Spark on Yarn

Re: Ho do I test Kafka is working in HDP2.6?

Re: Which jars are needed for ApplicationMaster?

Re: How do I run Spark 2.2 on YARN?

Re: Why is my Spark job stuck?

Re: Where are tez logs?

Re: Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: Permissions issue Spark R

Re: How do I add github dependency to spark?