About Harsh J

Harsh J · ‎09-06-2018

Thank you for following up here! [1] Glad to hear you were able to chase down the cause. [1] - https://xkcd.com/979/

Harsh J · ‎09-06-2018

There are a few cons to raising your block size: - Increased cost of recovery during write failures When a client is writing a new block into the DataNode pipeline and one of the DataNode fails, there is a enabled-by-default recovery feature that will attempt to refill the gap in the replicated pipeline by transferring the partially written block from one of the remaining good DataNodes to a new DataNode. When this happens, the client is blocked (the outstream,write(…) caller is blocked in the API code). With increased block size, the time waited will also increase greatly depending on how much of the partial block data was written before the failure occurred. A worst-case wait example would involve the time required for network-copying 1.99 GiB for a 2 GiB block size because an involved DN may have failed at that specific point. - Cost of replication caused by DataNode loss or decommission When a DataNode is lost or is being decommissioned, the system has to react by re-filling the gaps in replica counts it creates. With smaller block sizes this activity is easy to spread randomly across the cluster, as several different nodes overall can take part in the re-replicate process. With larger blocks, only a few DNs can participate, and another consequence could be more lopsided space usage across DNs. That said, use of 1-2 GiB is not unheard of and I've seen a few large clusters apply that as their default block size. Its just worth being aware of the cons, looking out for such impact and tuning accordingly as you go. HDFS certainly functions at its best for large sized files, and your usage seems in accordance with that.

Harsh J · ‎09-06-2018

This is related to the JobHistoryServer log reported earlier. Please ensure/perform the following items for JHS and job completions to thoroughly work: First: Ensure that 'mapred' and 'yarn' are part of the 'hadoop' group in common: ~> hdfs groups mapred ~> hdfs groups yarn Both command must include 'hadoop' in their outputs. If not, ensure they are added to that group name. Second, all files and directories under HDFS /tmp/logs aggregation dir (or whatever you've reconfigured it to use) and /user/history/* have their group set to 'hadoop' and not anything else: ~> hadoop fs -chgrp -R hadoop /user/history /tmp/logs ~> hadoop fs -chmod -R g+rwx /user/history /tmp/logs Note: ACLs suggested earlier are not required to resolve this problem. The group used on these dirs is what matters in the default state, and the group setup described above is how YARN and JHS daemon users share information and responsibilities with each other. You may remove any ACLs set, or leave them be as they are still permissive.

Harsh J · ‎09-06-2018

Before anything else, I'd strongly recommend upgrading all your CDH packages to the same latest version. Your HBase service may be running on 4.7, while some client hosts are on 4.0. To get a definitive view on what version the service uses, please visit the HBase Master Web UI to see what the version value on its homepage shows. That's the version you can assume the cluster uses. Here's what I'd do, perhaps go over to see if it'd be safe for you to follow in your environment: - Check the # of rows on some critical, large table with a RowCounter job. Keep this info for data check after the operation below. - Login to a host with the most recent HBase version, run 'hbase shell' and then 'list_snapshots'. If any snapshots show up, and you do not need them, delete them away with 'delete_snapshot' commands. Once done, wait a few minutes and see if the used space begins to reduce due to HFiles from snapshots being cleaned away. If it does, no further actions are needed, and the rest of the points no longer apply to you. - If there are no snapshots, or there's no such command then stop HBase, MOVE (NOT DELETE, not yet) the .archive directory to /tmp. - Restart HBase, and if it comes up, run a RowCounter again on the same table to check if the counts are still the same/very close to the prior counting done above. - If HBase comes up and the counts on your critical tables are the same as before, then proceed with deleting the .archive directory you've moved. - If HBase does not come up, or the counts vary greatly, then place back the .archive directory in its previous path. This directory cannot be deleted as it is in use by HBase if this is the case, and you'll need to think of an alternative strategy of increasing space (deleting rows in HBase, dropping tables, expanding cluster, etc.) Does this help?

Harsh J · ‎09-06-2018

> I am little bit confused, so the WebHDFS REST API is listening on the same port as the NameNode's UI? Yes this is correct. The HTTP(S) serving port of the NameNode does multiple things: Serves the UI for browsers on / and a few other paths, serves a REST API on /webhdfs/*, etc. WebHDFS on HDFS service is used by contacting the currently configured web port of the NameNode and DataNode(s) (the latter by following redirects, not directly). In your case, the cluster is set to use HTTPS (TLS security) so you need to use the 50470 port, swebhdfs:// (notice the s-prefix for security) in place of webhdfs:// and https:// in place of http:// when following any WebHDFS tutorial.

Harsh J · ‎09-05-2018

A HA HDFS installation requires you to run Failover Controllers on each of the NameNode, along with a ZooKeeper service. These controllers take care of transitioning NameNodes such that only one is active and the other becomes standby. It appears that you're using a CDH package based (non-CM) installation here, so please follow the guide starting at https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_hag_hdfs_ha_intro.html#topic_2_1_3__section_jnx_jzp_15, following instructions that are under the 'Command-Line' parts instead of Cloudera Manager ones. @phaothu wrote: But the problem is how start the namenode which I had stop again ? I do the following : sudo -u hdfs hdfs namenode -bootstrapStandby -force /etc/init.d/hadoop-hdfs-namenode start With above process sometime namenode start ok with standby mode , but sometime it start with active mode and then I have 2 active node (split brain !!) So what I have wrong , what is the right process to start a namenode had stop again Just simply start it up. The bootstrap command must only be run if its a fresh new NameNode, not every restart of a previously running NameNode. Its worth noting that Standby and Active are just states of the very same NameNode. The StandbyNameNode is not a special daemon, its just a state of the NameNode.

Harsh J · ‎09-05-2018

This is a shell behaviour, not Sqoop. "When referencing a variable, it is generally advisable to enclose its name in double quotes." "Use double quotes to prevent word splitting. An argument enclosed in double quotes presents itself as a single word, even if it contains whitespace separators." - http://tldp.org/LDP/abs/html/quotingvar.html

Harsh J · ‎08-28-2018

The .archive directory support was certainly not present in our original CDH 4.0.x sources: https://github.com/cloudera/hbase/tree/cdh4.0.0-release I'm therefore unsure who/what created it in your environment.

Harsh J · ‎08-27-2018

CDH 4.2 onwards bought snapshots which introduced the concept of the /hbase/.archive directory for active snapshot-referenced store-file data [1], but this feature did not exist in HBase from CDH 4.0.1. Are you certain your version is CDH 4.0.1? Or perhaps was there a rollback of upgrade from a higher CDH4 version down to CDH 4.0.1 in past? Or otherwise, is there another CDH cluster remotely copying its snapshots into your cluster via ExportSnapshot, where this other cluster is from a higher CDH4 version? If you are absolutely sure that nothing in your CDH4 version accesses the unused /hbase/.archive directory (you can check via NameNode audit logs over a period of time where HBase is actively in use), and no snapshots appear to exist ('list_snapshots' command in HBase shell, if it is available), then you can try removing the /hbase/.archive directory by first moving the .archive path outside (to /tmp/ maybe) and then deleting after ensuring HBase is not affected. Note: HBase will not retain data unnecessarily. The archive directory retains data still referenced by tables and/or snapshots, and are cleaned up otherwise automatically. No part of that data is 'unused' so do not delete it without checking first. [1] - https://blog.cloudera.com/blog/2013/03/introduction-to-apache-hbase-snapshots/

Harsh J · ‎08-23-2018

If you are using Cloudera Manager and have re-added the host/role after deleting it, Cloudera Manager should've marked your HBase service with a 'Stale Configuration' icon, indicating a restart is required for your HBase service (and client configuration deployment for its gateways) to see the changes in the ZK client configuration. There's no way to live-refresh the configuration at runtime, but you can consider performing a rolling restart to eliminate availability issues.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: User not returning any groups for hdfs groups ...

Re: HDFS Block size 1Gb/2GB

Re: HDFS or HIVE Replication

Re: I want to reduce disk usage.

Re: NameNode running but 50070 not listening

Re: Process to Start StandBy NameNode

Re: Sqoop import using last-value from a hdfs file

Re: I want to reduce disk usage.

Re: I want to reduce disk usage.

Re: How to refresh hbase's zookeeper quorum