About Harsh J

Harsh J · ‎07-29-2018

There was a known issue in Cloudera Manager up until 5.14.4 and 5.15.1 that can cause the necessary container executor configuration files to not be deployed on new or recommissioned NodeManager hosts. The fix for this is noted at https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_fixed_issues.html#OPSAPS-24398 A better workaround than changing ownership (which is a red herring because of missing symlinks causing the executor binary to look at the wrong path) is to simply add a YARN Gateway role to all NodeManager hosts and perform a 'Deploy Client Configuration' under YARN. Upgrading to 5.15.1 or higher when it comes out should help resolve this issue.

Harsh J · ‎07-29-2018

The error quotes a missing function that has been present in Oozie since CDH 5.5.0. It therefore appears that somehow your environment is keeping or passing around an older jar of 'oozie-sharelib-oozie' artifact that is without this added function. If its your sharelib that's carrying a bad file, you can inspect it via: # hadoop fs -ls -R /user/oozie/ | grep sharelib-oozie The above should return only a single jar file size and the version of the filename should match what you are running. If you get 3 or more files in the output, consider redeploying your ShareLib via https://www.cloudera.com/documentation/enterprise/latest/topics/admin_oozie_sharelib.html#concept_i2f_r5t_2r If you just get one version of the jar instead, then perhaps some application jar of your project(s) is assembling a fat jar that includes Oozie Sharelib dependencies in it, albeit from a non-CDH version, or a very old CDH version (< 5.5.0). You can inspect suspect jars by running: # jar tf filename.jar | grep LauncherMain Repack all such Oozie-including jars to exclude Oozie dependencies in them, as the system classpath will already provide the dependencies and of the right version.

Harsh J · ‎07-29-2018

The error merely indicates that the DataNode the client contacted for the replica wasn't able to perform the read operation requested. The actual I/O error behind the OP_READ_BLOCK error response will be logged on the DataNode host specified by the remote=x.x.x.x information in the log message printed. On a related note, given the intermittency, what is your 'mapreduce.client.submit.file.replication' configuration set to? If it is higher than the HDFS DataNode count, set it lower. Cloudera Manager's auto configuration rules for this property is detailed at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_autoconfig.html#concept_v4y_vb3_rn__section_pdy_d3w_d4: """ let x be Number of DataNodes, and y be the configured HDFS replication factor, then: mapreduce.client.submit.file.replication = max(min(x, y), sqrt(x)) """

Harsh J · ‎07-23-2018

Please see this prior post comment on AM ranges: http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Where-is-the-setting-for-the-port-range-used-by-org-apache/m-p/38131/highlight/true#M2081 As to firewalls, the general practice I've observed is to setup rules at points of external access into the cluster (such as from user or other cluster networks) but leave the intra-cluster network open for the services within. Our port range has a classification of internal/external if that would help you build your rules: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_ports.html

Harsh J · ‎07-22-2018

> How many vCores allocated for Tasks within the Executors? Tasks run inside pre-allocated Executors, and do not cause further allocations to occur. Read on below to understand the relationship between tasks and executor from a resource and concurrency viewpoint: """ Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark.executor.cores property in the spark-defaults.conf file or on a SparkConf object. Similarly, the heap size can be controlled with the --executor-memory flag or the spark.executor.memory property. The cores property controls the number of concurrent tasks an executor can run. --executor-cores 5 means that each executor can run a maximum of five tasks at the same time. """ Read more at http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

Harsh J · ‎07-16-2018

> Do we need to delete older snapshot (which was created when rep is 3) and create a new snapshot at this time when rep is 2. Yes, that is correct.

Harsh J · ‎07-15-2018

Thank you for confirming the verification over NameNode host(s). The PartialGroupNameException will particularly trigger when the 'id -gn username && id -Gn username' returns some output but does not exit with a return code of 0. This is usually observed when the id command is unable to fully resolve all presented groups, which is likely what's happening. - Do any of the outputs in the groups command you run return pure numeric results, instead of actual string names? - What's the exit code after you execute 'id -gn username' for the affected user? You may run 'echo $?' to grab exit code after the command. - Please paste the full stack trace, which should include a trace of an IOException after the log message as an underlying 'Caused by'. This would explain the reason behind why the partial group resolution further fails. - Is there any particular difference to this username vs. others? For ex., does it start with a special character instead of alpha-num, etc.?

Harsh J · ‎07-11-2018

More specifically, what does 'groups username' report on all your NameNode hosts? Per the earlier post, the other hosts won't matter for a 'hdfs groups' command check, only (all) your NameNode hosts' outputs would matter. P.s. This is assuming you're using the shell based plugin in NameNode configuration.

Harsh J · ‎07-10-2018

Where are you executing this in your cluster? The way 'hdfs groups' works is by sending an RPC request with the username to one of the NameNodes. When using the default ShellBasedUnixGroups plugin, the NameNode that received the request will run a 'id -gn username' command as a forked process on its own host and collect the output. The key point here is that the groups check is not done on your host of invocation, as that'd be insecure to perform, it is done on the host of the service that is required to authorize a given request. It is therefore critical that all hosts in the cluster report consistently the same group results for any given username. You can typically use a centralized identity management system with SSSD on Linux to achieve this (there are other ways too), instead of using local Linux /etc/passwd and /etc/group files to manage it (can get hairy to keep synced as the cluster grows). For more behind the basics of auth(z), read http://blog.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop/

Harsh J · ‎07-03-2018

Immediately after the config change and restart, the existing HFiles will stay as is (on V2), only newly flushed HFiles will be V3. But when the table/region undergoes a major compaction, all HFiles will be rewritten to V3. You can force a rewrite with the HBase Shell 'major_compact' command to have it immediately rewrite all files.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Failed to initialize new NodeManager with Clou...

Re: JA018 java.lang.NoSuchMethodError: org.apache...

Re: YARN - occasional Error message

Re: Port range for ApplicationMaster in YARN

Re: How many vCores allocated for Tasks within the...

Re: after -setrep from 3 to 2, Over-replicated blo...

Re: User not returning any groups for hdfs groups ...

Re: User not returning any groups for hdfs groups ...

Re: User not returning any groups for hdfs groups ...

Re: HBase Cell level TTL does not work when after ...