About Harsh J

Harsh J · ‎11-14-2015

What API end point have you tried, specifically? It appears you are looking for this one http://archive.cloudera.com/cdh5/cdh/5/oozie/WebServicesAPI.html#Job_Information

Harsh J · ‎11-14-2015

This is an expected side-effect of loading data from a DN host. While there's no 'even distribution' tool today, you can perhaps try to get a more random effect going by raising the replication factor (to 4 or 5) and then lowering it back again.

Harsh J · ‎11-11-2015

The only way I can think of to do that is to spawn a whole new application, impersonating the user, from within your AM. That may work, although I've never tried it.

Harsh J · ‎11-10-2015

To evaluate a responseTooSlow, look at the reported parameters: "processingtimems":11063, "call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ MultiRequest)", "client":"10.125.122.203:41488", "starttimems":1447206125517, "queuetimems":1, "class":"HRegionServer", "responsesize":199, "method":"Multi" To summarise: The "Multi" type request from a client on 10.125.122.203 to the region server in question (where you picked this log from) waited 1ms (queue-time-ms) in the queue to get picked up for processing (which isn't bad at all) but took 11063ms (processing-time-ms) or ~11s to complete being processed and transmitted back. The transmitted response was 199 bytes (response-size). Your RS found such an overall time to be too high (more than a couple seconds for ex.) so it decided to log a warning. To evaluate why it took so long though, depends on knowledge of what multi(…) (or other types) request was the client on 10.125.122.203 attempting to send, and did it carry a lot of rows, thereby requiring a lot more time to process (a simple question). If the client's not at suspect, then the RS needs to be investigated on why it took so long to process this request, i.e. Did it suffer GC pauses? Did some operations within multi(…) require to wait on region locks? Was there ongoing blockage due to flushing that may have held some form of locks?

Harsh J · ‎11-09-2015

A YARN NM does not persist any vital data. While it does persist some running container states (for restart recovery purposes), most of these containers can be retried by their applications (such as MR2 and Spark) so their transient data directories are not important to keep across restarts. > Can I just remove the first partition from the config and restart the service to force it to use the other without breaking stuff? Yes, this would be safe to perform.

Harsh J · ‎11-08-2015

If by 'hard to analyse' you mean to parse/process it, you can consider using the Java API to fetch block location info too: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

Harsh J · ‎11-08-2015

You can check the file on NN Web UI's File Browser (shows all block IDs and their locations), or run 'hdfs fsck /path/to/file -files -blocks -locations'

Harsh J · ‎11-08-2015

Could you attach the 'sudo -u hdfs hdfs dfsadmin -report' command output? Per the reported log, the Balancer seems to think all of the 'DFS Used%' values of each DN are well within 10% of each another, and it considers the DNs balanced. (P.s. Yes, it works on % used, not raw byte balancing).

Harsh J · ‎11-08-2015

Your understanding seems right, but note that none of the 'splitting' is automatic. At its simplest form, federation is a way to have multiple distinct NameNodes powered by a common set of DataNodes. Effectively, its running and managing 2 or more *separate* namespaces on top of the same storage space. If you deploy two federated NameNodes, say hdfs://host-nn1/ and hdfs://host-nn2, then they will have nothing in common except the Live DN hostnames they share. A 'hadoop fs -ls' done on each will return absolutely independent results.

Harsh J · ‎11-08-2015

But it doesn't work. Could you please always clarify on what doesn't work, specifically, along with exact error messages? This statement is vague when you seek help troubleshooting from others. Looking over your code snippet, it seems like you're trying to run a container as a different user than the AM runs as, by trying to influence the token sent along with the ContainerLaunchContext. This isn't gonna work, cause the token the NM looks for (for its true-user-determination) is in the allocated Container object (obtained from the AMRMClient, which you pass onto the NMClient), not from the ContainerLaunchContext (these tokens are for use inside the Container if needed, but not for its launch checks). Since the RM will grant tokens only to the app-ID requesting it (and the true owner thereof), you cannot also run an App with the AM as one user and Containers as another. Is this what you are trying to attempt? If your intention is simpler, i.e. running the containers just as the app (and AM) user, then you only need to configure LinuxContainerExecutor and need no code changes (the framework handles container token handling for NMs). Remember also, if you are using LinuxContainerExecutor without Kerberos auth enabled, then it falls back into a secured state of impersonating only one user 'nobody'. See config 'yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users' under http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users to switch off this protection.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Oozie Workflow: Get running action name

Re: Force block redistribution for some particular...

Re: How to set user in LinuxContainerExecutor from...

Re: hbase warning response too slow

Re: Changing YARN local directory safely

Re: File distribution on HDFS

Re: File distribution on HDFS

Re: Balancer: number of nodes to be included = 0

Re: HDFS Federation understanding

Re: How to set user in LinuxContainerExecutor from...