About bikas

bikas · ‎12-12-2016

Your standby RM (rm1) must be the first RM in the configured list of RMs. So its tried first and that results in exceptions.

bikas · ‎12-10-2016

Either the Tez application did not start - in which case you will not find any YARN application for this. OR Your Tez application started but did not receive any DAG to run and timed out. You will find the exact reason in the YARN application master log for this job. OR Your Tez application started and crashed unexpectedly. You will find the exact reason in the YARN application master log for this job.

bikas · ‎12-10-2016

Yes. If max capacity for a queue is 50% then it will not be allocated resources > 50% even if the cluster is idle. Obviously this can waste free capacity. Hence this is set to 100%. Hence preemption becomes important for timeliness of giving resources to other queues. Your configs look ok at first glance but you should check here and here about the configs. You may have to play around with the configs before you get the desired response times for your preemption. If you do not see preemption happening even if its properly configured then you should open a support case (in case you have a support relationship).

bikas · ‎12-09-2016

There are multiple things here that may be at play given your HDP version To be clear you have setup a YARN queue for Spark with 50% capacity but Spark jobs can take up more than that (up to 100%) and since these are long running executors, the cluster is locked up until the job finishes. Is that correct? If yes, then lets see if the following helps. This might be verbose to help other users (in case you already know about these things :)) 1) YARN schedulers, fair/capacity, will allow jobs to go to max capacity if resources are available. Given your spark queue is configured to have max=100% this is allowed. So that explains why Spark jobs can take over your cluster. The difference between fair and capacity is that for concurrent jobs that ask for resources at the same time, capacity will prefer the first job and fair will share across all jobs. However if a job has already taken over the cluster, neither will be able to give other queues resources until the job itself returns resources back. This is if preemption is not enabled. 2) YARN schedulers, fair/capacity, support cross queue preemption. So if queue 2 is over its capacity and queue 1 needs resources then resources between max-capacity and capacity will be preempted from 2 and give to 1. Have you enabled preemption in the scheduler for your queues. That should trigger preemption of excess capacity from the Spark queue to other queues when needed. IIRC, this is how it should behave regardless of fair vs capacity scheduling if new small jobs come in after an existing job has taken over the cluster. Perhaps you could compare your previous fair settings vs new capacity settings to check if preemption is enabled in the former but not in the latter.

bikas · ‎12-09-2016

No I dont think Spark will uncache a different data set when a new one is cached. How are you going to load balance or failover from one STS to another?

bikas · ‎12-09-2016

Dont forget to uncache the old data 🙂 Also, each STS has its own SparkContext which will be lost if that STS is lost. So there is no way currently to have availability of the cache inside an STS if that STS goes down. Having 2 identical STS instances with identical caches is possibly the only solution. Assuming your cache creation code is consistent.

bikas · ‎12-09-2016

To confirm, the issue is that hbase conf was not available to spark. You can also check the Spark HBase Connector we support at https://github.com/hortonworks-spark/shc. It has many features but also documents the configuration for Spark Hbase access and security aspects too.

bikas · ‎12-08-2016

Need to set new Python location via env variable PYSPARK_PYTHON

bikas · ‎12-07-2016

Spark Thrift Server (STS) runs its SparkContext within the STS JVM daemon. That Spark Context is not available to external clients (with or without spark submit). The only way to access that Spark context is via the JDBC connection to STS. After your external processing has completed, you could submit a cache refresh operation to your STS.

bikas · ‎10-14-2016

Yes. It means encrypting all network transfers within the Spark job. There are no other avenues for wire encryption within Spark. Starting Spark 2.0 enabling wire encryption also enables https on the history server UI for browsing historical job data.

Online	Offline
Last Visited	‎09-23-2018 04:18 AM

Member Since	‎10-09-2015 06:38 PM
Last Visited	‎09-23-2018 04:18 AM
Posts	76
Kudos received	33

Cloudera Community

Re: Dataframe Insert into ORC table is slow compar...

Re: Spark map vs foreachRdd

Re: How to configure spark-log4j-properties in Amb...

Re: Spark 2 Technical preview with patches

Re: Spark Hbase connector latest version for Spark...

Re: Failed to connect to server (port 8032) retri...

Re: Error with Tez jobs

Re: Spark executors relationship to yarn container...

Re: Spark executors relationship to yarn container...

Re: Integration Spark with Tableau

Re: Integration Spark with Tableau

Re: Java Spark Program and Hive table backed by HB...

Re: How Install and set anaconda instead of built-...

Re: Integration Spark with Tableau

Re: Confusion in documentation : Configuring Spark...