About kkanchu

kkanchu · ‎01-15-2019

Do you mean what is the use/need ? If so, for anyone who is a platform admin, if they would like to know how many hive tables are there in all the hive DBs put together, this query can help them answer. This was asked by a customer in one of the calls we had and thought it would help others with similar request. There is also a similar thread as this, feel free to refer this.

kkanchu · ‎01-12-2019

The number can be found in the database which is used to store hive meta data. If MySql is the backend db, select count(*) from TBLS;

kkanchu · ‎11-09-2018

Hadoop archives is one of the methodology which is followed to reduce the load on the Namenode by archiving the files and referring all the archives as a single file via har reader. Testing: To understand the behavior of the HAR, we try following example. 1. Create test folders harSourceFolder2 : Where the initial set of small files are stored. Ex. (In HDFS ) /tmp/harSourceFolder2 harDestinationFolder2 : Where the final archived files are stored. Ex. (In HDFS) /tmp/harDestinationFolder2 2. Ingest small files in the source folder. sudo -u hdfs hadoop fs -copyFromLocal /tmp/SampleTest1.txt /tmp/harSourceFolder2 NOTE: thid command shows one file "SampleTest1", however in our example we used five files with index extending till 5 (SampleTest5.txt) 3. Capture fsck report across the "/" and NN report after small files are ingested. sudo -u hdfs hdfs fsck / -files > ./fsckWhenFilesCreated.txt 143 files and directories, 48 blocks = 191 total filesystem object(s). 4. Execute hadoop archive commands sudo -u hdfs hadoop archive -archiveName hartest2.har -p /tmp harSourceFolder2 /tmp/harDestinationFolder2 5. Capture fsck report across the "/" and NN report after after hadoop archives are created. sudo -u hdfs hdfs fsck / -files > ./fsckAfterHARCreated.txt 156 files and directories, 55 blocks = 211 total filesystem object(s). 6. Compare the Namenode report and fsck report. 143 files and directories, 48 blocks = 191 total filesystem object(s). 156 files and directories, 55 blocks = 211 total filesystem object(s). Analysis: Upon analyzing the fsck reports that were captured (fsckWhenFilesCreated amd fsckAfterHARCreated) we see that there are multiple files and blocks that are created. In this case, 13 files and folders and 7 blocks. Which can be explained with following output. /app-logs/hdfs/logs-ifile/application_1541612686625_0001 <dir> /app-logs/hdfs/logs-ifile/application_1541612686625_0001/c3187-node3.squadron-labs.com_45454 17656 bytes, 1 block(s): OK /app-logs/hdfs/logs-ifile/application_1541612686625_0001/c3187-node4.squadron-labs.com_45454 6895 bytes, 1 block(s): OK /mr-history/done/2018/11 <dir> /mr-history/done/2018/11/07 <dir> /mr-history/done/2018/11/07/000000 <dir> /mr-history/done/2018/11/07/000000/job_1541612686625_0001-1541618133969-hdfs-hadoop%2Darchives%2D2.7.3.2.6.5.0%2D292.jar-1541618159397-1-1-SUCCEEDED-default-1541618141722.jhist 33597 bytes, 1 block(s): OK /mr-history/done/2018/11/07/000000/job_1541612686625_0001_conf.xml 149808 bytes, 1 block(s): OK /tmp/harDestinationFolder2/hartest2.har <dir> /tmp/harDestinationFolder2/hartest2.har/_SUCCESS 0 bytes, 0 block(s): OK /tmp/harDestinationFolder2/hartest2.har/_index 619 bytes, 1 block(s): OK /tmp/harDestinationFolder2/hartest2.har/_masterindex 23 bytes, 1 block(s): OK /tmp/harDestinationFolder2/hartest2.har/part-0 120 bytes, 1 block(s): OK Above list comprises of the the new 13 files/folders that are added. Except for the "harDestinationFolder2/hartest2.har" and its content, rest of the data are temporary which are triggered as a result of the MapReduce job that is triggered as a result of hadoop archive command shown above. Also, we see that there are seven occurrences of "1 block(s):" in the above output which contributes to the total block increase. Out of these, three are permanent and rest are temporary. Also, at this point of time, the source small files can be deleted as there is a new archive for these files. Since, there are constant number of blocks (_index, _masterindex, part-0) that are created for each archives, it would be worthy to consider archiving large number of small files instead for small datasets, which can have negative effect. It can also be noted that in the fsck report executed after creating the archive file, we do not see the source files(SampleTest[1-5].txt) inside the directory "hartest2.har" which could be seen when we list it via a regular "hadoop fs -lsr har:" command. This shows that HDFS does not consider the initial source files once it is archived via HAR. This helps to answer that even though source text files could be seen, they do not add to the load on the Namenode. hadoop fs -lsr har:///tmp/harDestinationFolder2/hartest2.har lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - hdfs hdfs 0 2018-11-07 18:49 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2 -rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest1.txt -rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest2.txt -rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest3.txt -rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest4.txt -rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:49 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest5.txt

kkanchu · ‎10-29-2018

The idea of this article is to help admins detect artifacts(files/folders) in the cluster that are older than certain days. Also, in certain cases, there may be empty directories that are lying in the cluster which are not used and hence contribute to the small file issue. Hence, we have the attached script which performs, 1. Identifies files older than X days. 2. Identifies folders older than X days. 3. Deletes empty folders. Script Execution Script name is "findAll.sh" which expects 2 parameters which is 1. Age of the artifact (file/folder) in terms of days. 2. Actual location of the artifact (file/folder) in HDFS. Based upon the type of artifact and kind of operation you would have to choose one of the three operations. NOTE: 1. Please make sure the user running the script has permissions to execute the command on the artifacts that is passed as parameter to script. 2. Also, running this script once may take some time based upon the size/hierarchy of the folders. But once the list is procured, you can act upon it as per need. Hence, I would recommend you to test the script in lower ENV and run it in PROD when the load on HDFS is less. 3. Please exercise caution on the folders on which you run the scripts. Example executions: Execution 1: To list the old folders: [hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive Please select your option 1. Identify folders/directories that are older than 9 days 2. Identify files that are older than 9 days 3. Delete empty folders 1 Please check the output in ./OldFolders-202054.txt ; Execution 2: To list the old files: [hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive Please select your option 1. Identify folders/directories that are older than 9 days 2. Identify files that are older than 9 days 3. Delete empty folders 2 Please check the output in file ./Oldfiles.txt-202148 Execution 3 : To delete empty folders [hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive Please select your option 1. Identify folders/directories that are older than 9 days 2. Identify files that are older than 9 days 3. Delete empty folders 3 rmdir: `/tmp/hive/hive/_tez_session_dir': Directory is not empty Please feel free to tweak and extend the functionalities of the script.findall.tar.gz

kkanchu · ‎09-24-2018

Great article @Dinesh Chitlangia Just to add to the details mentioned in the blog here, the property ("upgrade.parameter.nn-restart.timeout") change during the upgrade may not take effect even if Ambari servers are restarted. So have the property in place before the upgrade starts.

kkanchu · ‎08-30-2018

JSTACK and JMAP Collection Jstack Collection Step 1: Switch as the service user that started the process. #su - <service-user-who-started-the-process> Step 2: Capture the process ID #ps -ef | grep <process-name> #ps -ef | grep hive hive 21887 1 0 Aug01 ? 00:58:04 /usr/jdk64/jdk1.8.0_112/bin/java -Xmx1024m -Dhdp.version=2.6.5.0-292 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.5.0-292 -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx1024m -Xmx2048m -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf/conf.server/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/2.6.5.0-292/hive/lib/hive-service-1.2.1000.2.6.5.0-292.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar -hiveconf hive.metastore.uris= -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive From above output, Parent service account is hive Process ID is 21887 Java version used is /usr/jdk64/jdk1.8.0_112/bin/java Step 3: Capture the java used by the process to start the service. From the above output it is /usr/jdk64/jdk1.8.0_112/bin/java Step 4: (In the order of priority) NOTE: We would need to consider running the command multiple times (min 5 times ) separated by 20-30 seconds. 4.1: Simple jstack for a responding process #<jstack-used-by-process>/jstack -l <pid> > <location-to-redirect-the-output>/jstack.out 4.2: Use kill for a hung process #kill -3 <pid> Corresponding output is captured in .out file of the process. 4.3: Use -F for a hung process #<jstack-used-by-process>/jstack -F <pid> > <location-to-redirect-the-output>/jstack.out JMap Collection Step 1: #su - <service-user-who-started-the-process> Step 2: Capture the process ID Step 3: Capture the java used by the process to start the service. Step 4: Determining the appropriate flag to be used, We use "-heap" option to determine if it is needed to use "-dump" option. #<jmap-used-by-process>/jmap -heap <pid> > jmapHEAP.out Upon multiple executions of the above command, if the percentage used is above 90% then we use the -dump flag as below, Sample output of above command is, Attaching to process ID 21887, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.112-b15 using thread-local object allocation. Parallel GC with 8 thread(s) Heap Configuration: MinHeapFreeRatio = 0 MaxHeapFreeRatio = 100 MaxHeapSize = 2147483648 (2048.0MB) NewSize = 87031808 (83.0MB) MaxNewSize = 715653120 (682.5MB) OldSize = 175112192 (167.0MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 141557760 (135.0MB) used = 36859416 (35.151878356933594MB) free = 104698344 (99.8481216430664MB) 26.038428412543404% used From Space: capacity = 5767168 (5.5MB) used = 4211840 (4.0167236328125MB) free = 1555328 (1.4832763671875MB) 73.0313387784091% used To Space: capacity = 5767168 (5.5MB) used = 0 (0.0MB) free = 5767168 (5.5MB) 0.0% used PS Old Generation capacity = 277872640 (265.0MB) used = 161075720 (153.61377716064453MB) free = 116796920 (111.38622283935547MB) 57.9674630794885% used From the above output, 57% of heap is being used. The two general flags that are used while collecting the heapdumps are “-dump” and “-histo”. While former gives the heapdump in the form of binary file with the collection of objects at a particular time, latter provides the details of live objects in a text format. #<jmap-used-by-process>/jmap -dump:file=<location-to-redirect-the-output>/heapdump.hprof,format=b <PID> If histo label needs to be used, #<jmap-used-by-process>/jmap -histo <pid> > jmap.out NOTE: 1. Jmap/Jstack is high CPU intensive process, so please use it with caution. 2. Please try not to use -F as much as possible as critical data are missed with this option. If -F option needs to be used by any of the commands, Example: #/usr/jdk64/jdk1.8.0_112/bin/jmap -dump:file=/tmp/jmap21887.hprof,format=b -F 21887 #/usr/jdk64/jdk1.8.0_112/bin/jmap -histo -F 21887 > /tmp/jmaphistoF.out Thanks @Rajkumar Singh @Vinod Bonthu and @Kevin Wong for reviewing.

kkanchu · ‎05-08-2018

As an extension to the article mentioned here we are using custom Ambari alerts to monitor the current state of the Journal Node edits health. With the default monitoring that is present in the Ambari, we would not be alerted about the failure of edits that may happen in the one of the JN quorum. In typical HDFS HA env, there are three Journal node daemons that are deployed. If any one of the daemons fails to maintain the edits, then we are at risk of failovers and eventual cluster outage if another journal node hits similar issue as other journal node (Because, if quorum of edits are not maintained, then NN fails to be up). Hence, we need to have necessary alerting mechanism put in place for such failures. Journal Nodes may not get updated due to various reasons such as, 1. Disk getting full. 2. Corrupt Permissions. 3. Exhausted HDFS handlers in JN host, etc.. Attached are the artifacts, which contains, 1. alerts-test.json 2. jn_edits_tracker.py jn_edits_tracker.py have preconfigured values, OK_CEIL = 9 WARN_FLOOR = 10 WARN_CEIL = 19 CRITICAL_FLOOR = 20 Which defines the corresponding time ranges in seconds for alerts to be triggered. This would alert in Ambari, if the "edits_inprogress" file is not updated for above configured time interval. Steps to configure the alert 1. Copy the jn_edits_tracker.py to /var/lib/ambari-server/resources/host_scripts 2. Now restart the Ambari-Server. 3. Run the following command to list all the existing alerts: curl -u admin:admin -i -H 'X-Requested-By:ambari' -X GET http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions 4. Install the custom alert using Curl command as following: curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerts-test.json http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions Attachments : jneditsarchive.zip

kkanchu · ‎04-12-2018

@Neil Tu By disabling it, znode creation and cleanup will not be performed and for the fact that registry being disabled, the load in parsing the ZK hierarchy is relieved.

kkanchu · ‎04-11-2018

By "Service record", do you mean the znode in ZK service?

kkanchu · ‎04-03-2018

@Saikiran Parepally Please accept the answer if it has helped you.

Online	Offline
Last Visited	‎02-10-2022 05:21 PM

Member Since	‎06-16-2016 08:04 PM
Last Visited	‎02-10-2022 05:21 PM
Posts	43
Kudos received	22

Cloudera Community

Re: Count total number of hive tables across all d...

Count total number of hive tables across all datab...

How HAR ( Hadoop Archive ) works

Old data in HDFS

Re: How to increase timeout value for Namenode res...

How/When to take thread dump (Jstack) and heap dum...

HDFS Journal Node edits health checker

Re: RM frequent failover

Re: RM frequent failover

Re: When is NiFi v1.6 is planned to release as par...