About AutoIN

AutoIN · ‎07-16-2019

@rssanders3 Thanks for your interest in the upcoming CDSW release >Has a more specific date been announced yet? Not yet publicly (but should be out very soon) >Specifically, will it run on 7.6? Yes

AutoIN · ‎06-15-2019

Hello @Data_Dog Welcome! What you are trying to achieve is not there yet in the existing versions (latest is 1.5 as of writing). But, the good news is it will be there in the upcoming CDSW 1.6 version which provides support for local editors (eg PyCharm which supports SSH) allowing remote execution on CDSW and also file sync from local editors to Cloudera DataScience Workbench over SSH. CDSW 1.6 also provides lot of other enhancements including support for 3rd party editors. If you'd like to know more about the upcoming release please see https://www.cloudera.com/about/events/webinars/virtual-event-ml-services-cdsw.html Thank you, Amit

AutoIN · ‎04-05-2019

Hello @Baris There is no such limitations from CDSW. If a node has spare resources - kubernetes could use that node to launch the pod. May I ask how many nodes are there in your CDSW cluster? What is the CPU and Memory footprint on each node, what version of CDSW are you running? And what error you are getting when launching the session with > 50% memory? You can find out how much spare resources are there cluster wide using the CDSW homepage (Dashboard). If you want to find out exactly how much spare resources are there on each node, you can find that out by running $ kubectl describe node on the CDSW master server. Example: In the snip below you can see that out of 4CPU (4000m), 3330m was used and similarly out of 8GB RAM, around 6.5 GB was used. This means if you try to launch a session with 1CPU or 2GB RAM it will not work. $ kubectl describe nodes Name: host-aaaa Capacity: cpu: 4 memory: 8009452Ki Allocatable: cpu: 4 memory: 8009452Ki Allocated: CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 3330m (83%) 0 (0%) 6482Mi (82%) 22774Mi (291%) Do note that a session can only spin an engine pod on one node. This means for eg if you have three nodes with 2 GB RAM left on each of them, it might give you an assumption that you've 6GB of free RAM and that you can launch a session with 6GB memory but because a session can't share resources across nodes you'd eventually see an error something like this "Unschedulable: No nodes are available that match all of the predicates: Insufficient memory (3)"

AutoIN · ‎07-05-2018

@Rod No, it is unsupported (as of writing) in both CDH5 and CDH6. https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_unsupported_features.html#spark Spark SQL CLI is not supported

AutoIN · ‎06-11-2018

Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g_5db In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because Cloudera Manager does not automatically create the associated lineage log directory (/var/log/spark2/lineage) on all required cluster hosts. Note that this feature is enabled by default in CDS 2.3 release 2. Implement one of the following workarounds to continue running Spark jobs. Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them. However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager. To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role. For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance Workaround 2 - Disable Spark Lineage Collection To disable the feature, log in to Cloudera Manager and go to the Spark 2 service. Click Configuration. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection. Click Save Changes.

AutoIN · ‎06-07-2018

Hi @JSenzier Right, this won't work in client mode. It's not about the compatibility of Spark1.6 with CDH version, but the way deploy mode 'client' works. spark-shell on Cloudera installs runs in yarn-client mode by default. Given the use of file:/// (which is generally used for local disks) we recommend running the app in local mode for such local testing or you can turn your script (using maven or sbt) into a jar file and execute this using spark-submit in cluster mode. $ spark-shell --master local[*]

AutoIN · ‎05-11-2018

Hi @sim6 Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. It looks like the rpc times out waiting for resource getting available on spark side. Given that it is random indicates that this error might be happening when the cluster does not have enough resource and nothing permanently wrong with the cluster as such. For testing you can explore the following timeout values and see if that helps: hive.spark.client.connect.timeout=30000ms (default 1000ms) hive.spark.client.server.connect.timeout=300000ms (default 90000ms) You'd need to set it up in the Hive Safety Value using the steps below, so that it takes effect for all the spark queries: Go to Cloudera Manager home page click through "Hive" service click "Configuration" search for "Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml" enter the following in the XML text field: <property> <name>hive.spark.client.connect.timeout</name> <value>30000ms</value> </property> <property> <name>hive.spark.client.server.connect.timeout</name> <value>300000ms</value> </property Restart Hive services to allow changes to take effect then run the query again to test. Let us know how it goes.

AutoIN · ‎05-11-2018

Hi @Nick Yes, you should get a count of the words. Something like this: ------------------------------------------- Time: 2018-05-11 01:05:20 ------------------------------------------- (u'', 160) ... To start with, please let us know if you are using kerberos on either of the clusters? Next, can you help confirm you can read the kafka topic data using a kafka-console-consumer command from the kafka cluster? Next, can you verify (the host from where you are running spark job) that you can reach out to the zookeeper on the kafka cluster (using ping and nc on port 2181). Lastly, please double check that you have the topic name listed correctly and the ZK quorum in the spark(2)-submit command line. For comparison, I am sharing the same exercise from my cluster, one running Spark and other Kafka (however note both are using SIMPLE authentication i.e non kerberized). Kafka-Cluster ========= [systest@nightly511 tmp]$ kafka-topics --create --zookeeper localhost:2181 --topic wordcounttopic --partitions 1 --replication-factor 3 .... Created topic "wordcounttopic". [systest@nightly511-unsecure-1 tmp]$ vmstat 1 | kafka-console-producer --broker-list `hostname`:9092 --topic wordcounttopic Spark- Cluster =========== [user1@host-10-17-101-208 ~]$ vi kafka_wordcount.py from __future__ import print_function import sys from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr) exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount") ssc = StreamingContext(sc, 10) zkQuorum, topic = sys.argv[1:] kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: x[1]) counts = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b) counts.pprint() ssc.start() ssc.awaitTermination() [user1@host-10-17-101-208 ~]$ spark2-submit --master yarn --deploy-mode client --conf "spark.dynamicAllocation.enabled=false" --jars /opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-examples_*.jar kafka_wordcount.py nightly511:2181 wordcounttopic Notice the last 2 arguments are the ZK(hostname/URL) in the kafka cluster and the kafka-topic name in the kafka cluster. 18/05/11 01:04:55 INFO cluster.YarnClientSchedulerBackend: Application application_1525758910545_0024 has started running. 18/05/11 01:05:21 INFO scheduler.DAGScheduler: ResultStage 4 (runJob at PythonRDD.scala:446) finished in 0.125 s 18/05/11 01:05:21 INFO scheduler.DAGScheduler: Job 2 finished: runJob at PythonRDD.scala:446, took 1.059940 s ------------------------------------------- Time: 2018-05-11 01:05:20 ------------------------------------------- (u'', 160) (u'216', 1) (u'13', 1) (u'15665', 1) (u'28', 1) (u'17861', 1) (u'872', 6) (u'3', 5) (u'8712', 1) (u'5', 1) ... 18/05/11 01:05:21 INFO scheduler.JobScheduler: Finished job streaming job 1526025920000 ms.0 from job set of time 1526025920000 ms 18/05/11 01:05:21 INFO scheduler.JobScheduler: Total delay: 1.625 s for time 1526025920000 ms (execution: 1.128 s) Let us know if you find any differences and manage to get it working. If it's still not working, let us know that too. Good Luck!

AutoIN · ‎05-02-2018

Cool. I will feed it back in the internal Jira we are discussing this issue for. Thx for sharing.

AutoIN · ‎05-02-2018

Thanks, Lucas. That's great to hear! Can you please check if toggling it back to /var/log/spark2/lineage followed by redeploying the client configuration helps too? As promised, once the fix is identified I will update this thread.

Online	Offline
Last Visited	‎06-16-2022 06:25 AM

Member Since	‎11-16-2015 10:11 PM
Last Visited	‎06-16-2022 06:25 AM
Posts	195
Kudos received	36

Cloudera Community

Re: Problem starting CDSW sessions after deleting ...

Re: cdsw containers crashing

Re: cdsw -tcp-ingress controller failing..grpc_sta...

Re: CDSW 1.6 Release Date and OS Requirements

Re: Allocating more than 50% of memory in cdsw

Re: CDSW 1.6 Release Date and OS Requirements

Re: How to use local IDE and run Python code in Cl...

Re: Allocating more than 50% of memory in cdsw

Re: Will Spark SQL be officially supported in CDH6...

Re: CDS 2.3 release 2 Lineage File Missing Error

Re: Spark - Cannot mkdir file

Re: How to avoid time out waiting for client conne...

Re: Not able to read kafka topic data in spark job

Re: CDS 2.3 release 2 Lineage File Missing Error

Re: CDS 2.3 release 2 Lineage File Missing Error