About Harsh J

Harsh J · ‎05-07-2019

The simplest way is through Cloudera Hue. See http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/ That said, if you've attempted something and have run into issues, please add more details so the community can help you on specific topics.

Harsh J · ‎05-07-2019

It would help if you add along some description of what you have found or attempted, instead of just a broad question. What load balancer are you choosing to use? We have some sample HAProxy configs at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html#tut_proxy for Impala that can be repurposed for other components. Hue also offers its own pre-optimized Load Balancer as roles in Cloudera Manager that you can add and have it setup automatically: https://www.cloudera.com/documentation/enterprise/latest/topics/hue_perf_tuning.html

Harsh J · ‎05-05-2019

> So If i want to fetch all defined mapreduce properties,can i use this Api or it does have any pre-requisites? Yes you can. The default role group mostly always exists even if role instances do not, but if not (such as in a heavily API driven install) you can create one before you fetch. > Also does it require any privileges to access this api? A read-only user should also be able to fetch configs as a GET call over API. However, if there are configs marked as secured (such as configs that carry passwords, etc.) then the value retrieval will require admin privileges - they will otherwise appear redacted.

Harsh J · ‎05-05-2019

@priyanka2, > But in the yarn, there is no role of type Gateway for my cluster. > So is there any other way to fetch mapreduce properties? There may still be a role config group for it. You can use the roleConfigGroups endpoint to access its configs: Something like `curl -u auth:props -v http://cm-host.com:7180/api/v15/clusters/MyClusterName/services/YARN-1/roleConfigGroups/YARN-1-GATEWAY-BASE/config?view=full` > Could you please explain what could be the reason for that? The NodeManagers do not require MR client-side properties, just properties related to services it may need to contact and the MR shuffle service plugin configs. The NM is not involved in the MR app-side framework execution, so its mapred-site.xml only carries a subset as you've observed. @mikefisch, IIUC, you are looking for a way to assign roles to specific hosts? Use the POST call described here, for each service endpoint: https://cloudera.github.io/cm_api/apidocs/v19/path__clusters_-clusterName-_services_-serviceName-_roles.html -- Specifically, the roles list needs a structure that also requires a host reference ID that you can grab from the cluster hosts endpoint prior to this step. There's a simpler auto-assign feature also available: https://cloudera.github.io/cm_api/apidocs/v19/path__clusters_-clusterName-_autoAssignRoles.html

Harsh J · ‎04-10-2019

As @Tomas79 explains, there will be no consequence whatsoever of making that change (for your described problem) as these files are not deleted by the writer (in the same way regular service log files are). You'll need to delete older log files on your own, regardless of what you specify the maximum file sizes to be for each rolled log. You can consider using something like logrotate on Linux to automate this.

Harsh J · ‎04-10-2019

One possibility could be the fetch size (combined with some unexpectedly wide rows). Does lowering the result fetch size help? >From http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#idp774390917888 : --fetch-size Number of entries to read from database at once. Also, do you always see it fail with the YARN memory kill (due to pmem exhaustion) or do you also observe an actual java.lang.OutOfMemoryError occasionally? If it is always the former, then another suspect would be some off-heap memory use done by the JDBC driver in use, although I've not come across such a problem.

Harsh J · ‎04-09-2019

To add on: If you will not require audits or lineage at all for your cluster, you can also choose to disable their creation: Impala - Configuration - "Enable Impala Lineage Generation" (uncheck) Impala - Configuration - "Enable Impala Audit Event Generation" (uncheck) If you are using Navigator with Cloudera Enterprise, then these audits and lineage files should be sent automatically to the Navigator services. If they are not passing through, it may be an indicator of problem in the pipeline - please raise a support case if this is true.

Harsh J · ‎04-03-2019

Is the job submitted to the source cluster, or the destination? The DistCp jobs should only need to contact the NodeManagers of the cluster it runs on, but if the submitted cluster is remote then the ports may need to be opened. The HDFS transfer part does not involve YARN service communication at all, so it is not expected to contact a NodeManager. It would be helpful if you can share some more logs leading up to the observed failure.

Harsh J · ‎04-03-2019

For CDH / CDK Kafka users, the command is already in your PATH as "kafka-consumer-groups".

Harsh J · ‎04-01-2019

Could you share the full log from this failure, both from the Oozie server for the action ID and the action launcher job map task logs? The 8042 port is the NodeManager HTTP port, useful in serving logs of live containers among other status details over REST. It is not directly used by DistCp in its functions, but MapReduce and Oozie diagnostics might be invoking it as part of a response to a failure, so it is a secondary symptom. Note though that running DistCp via Oozie requires you to provide appropriate configs that ensures delegation tokens for both kerberized clusters are acquired. Use "mapreduce.job.hdfs-servers" with a value such as "hdfs://namenode-cluster-1,hdfs://namenode-cluster-2" to influence this on the Oozie server's delegation token acquisition phase. This is only relevant if you use Kerberos on both clusters.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: how to run a sample hive query using oozie?

Re: how to configure and setup load balancer for H...

Re: How to fetch mapreduce properties using cm res...

Re: How to fetch mapreduce properties using cm res...

Re: Impala Deamon Logs

Re: Sqoop virtual memory error

Re: Impala Deamon Logs

Re: DistCp over Oozie .vs. from shell

Re: kafka-consumer-groups.sh is missing missing

Re: DistCp over Oozie .vs. from shell