About Yuexin Zhang

Yuexin Zhang · ‎10-17-2018

Hi Continuum ships Anaconda parcel and Cloudera does not have control on which python version it installs. Please use the OS package management tool to install python 3.5 on the servers in the CDH cluster, once that is done, please follow this doc to set python for your pyspark job: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_python.html#spark_python__section_ark_lkn_25

Yuexin Zhang · ‎08-14-2018

Okay, since the process owner is cloudera-scm, one way to fix the issue is adding cloudera-scm user to 'spark' group on all nodes.

Yuexin Zhang · ‎08-14-2018

You may need to make sure the process owner of the Spark2 history server (by default it is spark user as well), belongs to the group "spark". So that the spark2 history server process would be able to read all the spark2 event log files. You can check the process owner with " ps -ef |grep java| grep SPARK2" on the node where spark2 history server runs on.

Yuexin Zhang · ‎08-14-2018

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api: https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

Yuexin Zhang · ‎07-12-2018

This is usually caused by not having proper HADOOP or SPARK CONF on the node. You need to assign spark2 gateway role to this node, and deploy spark2 client configureations, then re-launch spark2-shell.

Yuexin Zhang · ‎07-12-2018

There is a complete sample in the Cloudera engineering blog [1], note the requirements mentioned there. You will need to provide the jaas file with java options, see [2], notice the options used in spark2-submit: --driver-java-options "-Djava.security.auth.login.config=./spark_jaas.conf"... --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./spark_jaas.conf" along with distributed files via "--files". You will also need to set the SSL parameters when initlizing the kafka client, see [3]. [1] https://blog.cloudera.com/blog/2017/05/reading-data-securely-from-apache-kafka-to-apache-spark/ [2] https://github.com/markgrover/spark-secure-kafka-app [3] https://github.com/markgrover/spark-secure-kafka-app/blob/master/src/main/java/com/cloudera/spark/examples/DirectKafkaWordCount.scala#L60

Yuexin Zhang · ‎12-14-2017

It is usually caused by a wrong solr url configured somewhere in HUE, you can try to correct it: 1) Check if any one has a safety value configured in HUE -> Configuration -> "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini" 2) Make sure Solr is selected as a depency service in HUE -> Configuration -> Solr. 3) If above still does not resovle your issue, try to overide solr url with a correct (protocal, host, port) by following : a) Verify the solr URL, Solr -> Instance -> any one of the Solr instance -> Solr Server -> Solr Server Web UI. Note the url, and we will use it in next step. b) Use a safety value for solr url in HUE -> Configuration -> search for "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini", add: [search] ## URL of the Solr Server solr_url=http://hostname:port/solr Restart your HUE server.

Yuexin Zhang · ‎09-16-2017

Regarding how to make Spark work with Kerberos enabled Kafka, please refer to Cloudera engineering blog: https://blog.cloudera.com/blog/2017/05/reading-data-securely-from-apache-kafka-to-apache-spark/ There are explainations on prerequisites, solution and sample code.

Yuexin Zhang · ‎09-16-2017

It's a spark side configuraion. So you can always specify it via "--conf" option with spark-submit, or you can set the property globally on CM via "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf", so CM will include such setting for you via spark gateway client configuration.

Yuexin Zhang · ‎08-28-2017

#1) Do you have spark gateway installed and client configuration deployed on this host? #2) Do you have Spark selected as one of the dependent service on CM oozie configuration page? Usgually the client should be able to source the log4j file from /etc/spark/conf/log4j.properties, instead of the NodeManager process dir.

Online	Offline
Last Visited	‎03-31-2025 10:20 PM

Member Since	‎03-01-2016 05:20 PM
Last Visited	‎03-31-2025 10:20 PM
Posts	617
Kudos received	9

Cloudera Community

Re: Incompatible Kafka Version

Re: CDS 3.3 support in CDP 7.1.7

Re: Time rules on capacity scheduler of YARN

Re: Is it possible to reserve whole nodes for excl...

Re: how to trigger sample spark and pyspark jobs ...

Re: Configuring CDH cluster with Python 3

Re: Unable to see completed application in Spark 2...

Re: Unable to see completed application in Spark 2...

Re: Pyspark issue AttributeError: 'DataFrame' obje...

Re: Unable to start Spark2-shell successfully

Re: Issue in running spark streaming job in yarn c...

Re: SolrException: Error trying to proxy request f...

Re: Integrate kerberos with kafka in hbase coproce...

Re: spark.yarn.executor.memoryOverhead

Re: Log4j.properties permission denied