About BillM.

jamshid · ‎04-03-2017

So this is no longer supported? Why -- technical or business reasons? Anyway, maybe you could update it to make that clear? https://hub.docker.com/r/cloudera/quickstart/ How about clusterdock, Cloudera blogged about it last year: http://blog.cloudera.com/blog/2016/08/multi-node-clusters-with-cloudera-quickstart-for-docker/ Will this image be supported as an easy Production-like environment? https://hub.docker.com/r/cloudera/clusterdock/ Thanks, Jamshid

Dat Pham · ‎09-14-2016

The new error message related to RejectedExecutionException is most likely due to the fact that threads are being submitted to executor that have been lost/killed hence you see these type of messages (a good read on RejectedExecutionException can be found here [1]): === java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable@6143b0a3 rejected from java.util.concurrent.ThreadPoolExecutor@4eef216[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1170] === [1] https://examples.javacodegeeks.com/core-java/util/concurrent/rejectedexecutionexception/java-util-concurrent-rejectedexecutionexception-how-to-solve-rejectedexecutionexception

Dat Pham · ‎08-18-2016

1) Increase the executor and executor memory to 5GB and 3GB respectively to fix the OutOfMemory issue 2) Change two properties so that the retry will not be on the same node: a. spark.scheduler.executorTaskBlacklistTime= 3600000 b. spark.task.maxFailures=10

Dat Pham · ‎05-23-2016

One way to workaround the 2GB limitation is to increase the number of partitions.

Dat Pham · ‎03-14-2016

1) currently, spark does not migrate to a different node (it may, but that would just be by chance). There is work in-progress to add node blacklisting, but that hasn't been committed yet (https://issues.apache.org/jira/browse/SPARK-8426) 2) Task failure - some exception encounter while running a task, e.g user code throws exception or external to the task such as spark cannot read from HDFS, etc… Job failure - if a particular task fails 4 times then Sparks gives up and cancels the whole job Stage failure (this is the trickiest) - this happens when a task attempts to read the *shuffle* data from another node. If it fails to read that shuffle data then it assumes that the remote node is dead (failure may happen due to bad disk, network error, bad node, node overload with other tasks and not responding fast enough, etc…). This is when Spark thinks it needs to regenerate the input data so Spark mark the stage as failed and return the previous stage that generate the input data. If the stage retry fails 4 times, Spark will give up assuming there the cluster has issue. 3) No great answer to this one. The best answer is really just using “yarn logs —applicationId ” to get all the logs in one file so it’s a bit easier to search through to find errors (rather than having to click the log one by one) 4) No, you don’t need any setting for that. Spark should be resilient to single node failures. With that said, there could be bugs in this area. if you encounter that is not the case, please provide the applicationId and cluster information so that I can collect logs and pass it on to our Spark team to analyze.

Dat Pham · ‎12-04-2015

NA since customer opted to use Scalding to implement the solution instead of Spark.

Dat Pham · ‎08-12-2015

This support is currently planned for C6 timeframe which is early 2016.

Dat Pham · ‎08-04-2015

The following steps can be done to get/set configurations: ==== Oozie ActionService Executor Extension Classes ==== >>> from cm_api.api_client import ApiResource >>> print ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].get_config(view='full')['oozie_executor_extension_classes'] : oozie_executor_extension_classes = none >>> print ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].update_config({'oozie_executor_extension_classes':'oozie_test.class'}) >>> print ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].get_config(view='full')['oozie_executor_extension_classes'] : oozie_executor_extension_classes = oozie_test.class ==================== ==== Oozie SchemaService Workflow Extension Schemas ==== >>> from cm_api.api_client import ApiResource >>> print ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].get_config(view='full')['oozie_workflow_extension_schemas'] : oozie_workflow_extension_schemas = ssh-action-0.1.xsd,hive-action-0.3.xsd,sqoop-action-0.3.xsd,shell-action-0.2.xsd,shell-action-0.1.xsd >>> ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].update_config({'oozie_workflow_extension_schemas':'ssh-action-0.1.xsd,hive-action-0.3.xsd,sqoop-action-0.3.xsd,shell-action-0.2.xsd,shell-action-0.1.xsd,oozie-test-action.xsd'}) >>> print ApiResource('nightly54-1.vpc.cloudera.com').get_all_clusters()[0].get_all_services()[4].get_all_roles()[0].get_config(view='full')['oozie_workflow_extension_schemas'] : oozie_workflow_extension_schemas = ssh-action-0.1.xsd,hive-action-0.3.xsd,sqoop-action-0.3.xsd,shell-action-0.2.xsd,shell-action-0.1.xsd,oozie-test-action.xsd =================== Hardcoded value used for method such as "get_all_clusters()[0]" for brevity. A for-loop would be needed to parse for specific value and return the object for the next call, etc... [1]. For future reference, all the modules can be found at ".../cm_api/endpoints." [1] http://cloudera.github.io/cm_api/docs/python-client

BillM. · ‎07-24-2015

I'm not sure when I need to access the configuration from a role (or a group) as opposed to using the service directly, but based on a separate communication I used the role. The snippet of code below is based on a working version (on a quickstart vm), and looks like: # peforms the post install oozie service management for the red orbit module def main(): cmhost = 'quickstart.cloudera' theUserName = "admin" thePassword = "admin" apiResource = ApiResource(cmhost, username=theUserName, password=thePassword) # A kludgey way of specifying the cluster, but there is only one here clusters = apiResource.get_all_clusters() # TODO: This is a kludge to resolve the cluster if (len(clusters) != 1): print "There should one cluster, but there are " + repr(len(clusters)) + " clusters" sys.exit(1) cluster = clusters[0] # TODO: These parameters are appropriate for the quick start vm, are they right for our deployment? hueServiceName = "hue" oozieServiceName = "oozie" oozieServiceRoleName = 'oozie-OOZIE_SERVER' oozieService = cluster.get_service(oozieServiceName) hueService = cluster.get_service(hueServiceName) oozieServiceRole = oozieService.get_role(oozieServiceRoleName) # TODO: Is this always going tobe the oozie service role's name? We may need to use the Role Type to be safe originalOozieConfig = oozieServiceRole.get_config(view='full') oozieConfigUpdates = { "oozie_executor_extension_classes" : "org.apache.oozie.action.hadoop.MyCustomActionExecutor", "oozie_workflow_extension_schemas" : "my-custom-action-0.1.xsd" } for configUpdateKey in oozieConfigUpdates: if (configUpdateKey in originalOozieConfig): print repr(configUpdateKey) + " before update has oozie configured value = " + str(originalOozieConfig[configUpdateKey]) else: print repr(configUpdateKey) + " not previously configured in oozie" # stop the oozie service after stoping any services depending on oozie (i.e. hue) for service in [hueService, oozieService]: print "stopping service " + repr(service.name) service.stop().wait() # synchronous stop print "service " + repr(service.name) + " stopped" # update the configuration while the servers are quiescent updatedOozieConfig = oozieServiceRole.update_config(oozieConfigUpdates) print "updatedOozieConfig = " + repr(updatedOozieConfig) for configUpdateKey in oozieConfigUpdates: print 'Config after update for key = ' + repr(configUpdateKey) + " has value = " + repr(updatedOozieConfig[configUpdateKey]) # restart the oozie service before restarting any services depending on oozie (i.e. hue) for service in [oozieService, hueService]: print "retarting service " + repr(service.name) service.restart().wait() # synchronous restart print "service " + repr(service.name) + " restarted" # Done! return

Dat Pham · ‎07-07-2015

1. all classes related to the custom action would need to be in /var/lib/oozie 2. all main and its dependencies would need to be in the sharelib [1] directory [1] http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/

Online	Offline
Last Visited	‎03-20-2017 02:21 PM

Member Since	‎05-15-2015 05:10 PM
Last Visited	‎03-20-2017 02:21 PM
Posts	27

Cloudera Community

Re: Does Cloudera Manager 5.4.0 Python Client API ...

Re: Cloudera 5.4.x Oozie Custom Action python to c...

Re: Can Yarn on CDH 5.7 launch (docker) containers...

Re: CDH 5.7.0 Spark 1.6.0 Has numerous RejectedExe...

Re: CDH 5.5 Spark 1.5 has sporadic failures includ...

Re: CDH 5.5.0 Spark 1.5.0 Scalability issue: coale...

Re: CDH 5.5, Spark 1.5.0 Executor fails due to ope...

Re: Spark 1.3.0 on Cloudera 5.4 fails on loading d...

Re: Does Cloudera Manager 5.4.0 Python Client API ...

Re: Oozie custom action deployment on Cloudera 5.4...

Re: Cloudera 5.4.x Oozie Custom Action python to c...

Re: Cloudera 5.4.x oozie Custom Actions, how to us...