Member since
06-16-2016
43
Posts
22
Kudos Received
0
Solutions
08-03-2023
01:29 PM
Hi @kkanchu Can you please accept the answer if the above steps helps to change the log level of the HBase daemon on the fly without any restart.
... View more
01-15-2019
08:20 PM
Do you mean what is the use/need ? If so, for anyone who is a platform admin, if they would like to know how many hive tables are there in all the hive DBs put together, this query can help them answer. This was asked by a customer in one of the calls we had and thought it would help others with similar request. There is also a similar thread as this, feel free to refer this.
... View more
11-09-2018
03:20 AM
2 Kudos
Hadoop archives is one of the methodology which is followed to reduce the load on the Namenode by archiving the files and referring all the archives as a single file via har reader. Testing:
To understand the behavior of the HAR, we try following example.
1. Create test folders
harSourceFolder2 : Where the initial set of small files are stored. Ex. (In HDFS ) /tmp/harSourceFolder2
harDestinationFolder2 : Where the final archived files are stored. Ex. (In HDFS) /tmp/harDestinationFolder2
2. Ingest small files in the source folder. sudo -u hdfs hadoop fs -copyFromLocal /tmp/SampleTest1.txt /tmp/harSourceFolder2 NOTE: thid command shows one file "SampleTest1", however in our example we used five files with index extending till 5 (SampleTest5.txt) 3. Capture fsck report across the "/" and NN report after small files are ingested. sudo -u hdfs hdfs fsck / -files > ./fsckWhenFilesCreated.txt 143 files and directories, 48 blocks = 191 total filesystem object(s). 4. Execute hadoop archive commands sudo -u hdfs hadoop archive -archiveName hartest2.har -p /tmp harSourceFolder2 /tmp/harDestinationFolder2 5. Capture fsck report across the "/" and NN report after after hadoop archives are created. sudo -u hdfs hdfs fsck / -files > ./fsckAfterHARCreated.txt 156 files and directories, 55 blocks = 211 total filesystem object(s). 6. Compare the Namenode report and fsck report. 143 files and directories, 48 blocks = 191 total filesystem object(s).
156 files and directories, 55 blocks = 211 total filesystem object(s).
Analysis: Upon analyzing the fsck reports that were captured (fsckWhenFilesCreated amd fsckAfterHARCreated) we see that there are multiple files and blocks that are created. In this case, 13 files and folders and 7 blocks. Which can be explained with following output. /app-logs/hdfs/logs-ifile/application_1541612686625_0001 <dir>
/app-logs/hdfs/logs-ifile/application_1541612686625_0001/c3187-node3.squadron-labs.com_45454 17656 bytes, 1 block(s): OK
/app-logs/hdfs/logs-ifile/application_1541612686625_0001/c3187-node4.squadron-labs.com_45454 6895 bytes, 1 block(s): OK
/mr-history/done/2018/11 <dir>
/mr-history/done/2018/11/07 <dir>
/mr-history/done/2018/11/07/000000 <dir>
/mr-history/done/2018/11/07/000000/job_1541612686625_0001-1541618133969-hdfs-hadoop%2Darchives%2D2.7.3.2.6.5.0%2D292.jar-1541618159397-1-1-SUCCEEDED-default-1541618141722.jhist 33597 bytes, 1 block(s): OK
/mr-history/done/2018/11/07/000000/job_1541612686625_0001_conf.xml 149808 bytes, 1 block(s): OK
/tmp/harDestinationFolder2/hartest2.har <dir>
/tmp/harDestinationFolder2/hartest2.har/_SUCCESS 0 bytes, 0 block(s): OK
/tmp/harDestinationFolder2/hartest2.har/_index 619 bytes, 1 block(s): OK
/tmp/harDestinationFolder2/hartest2.har/_masterindex 23 bytes, 1 block(s): OK
/tmp/harDestinationFolder2/hartest2.har/part-0 120 bytes, 1 block(s): OK
Above list comprises of the the new 13 files/folders that are added. Except for the "harDestinationFolder2/hartest2.har" and its content, rest of the data are temporary which are triggered as a result of the MapReduce job that is triggered as a result of hadoop archive command shown above. Also, we see that there are seven occurrences of "1 block(s):" in the above output which contributes to the total block increase. Out of these, three are permanent and rest are temporary. Also, at this point of time, the source small files can be deleted as there is a new archive for these files. Since, there are constant number of blocks (_index, _masterindex, part-0) that are created for each archives, it would be worthy to consider archiving large number of small files instead for small datasets, which can have negative effect. It can also be noted that in the fsck report executed after creating the archive file, we do not see the source files(SampleTest[1-5].txt) inside the directory "hartest2.har" which could be seen when we list it via a regular "hadoop fs -lsr har:" command. This shows that HDFS does not consider the initial source files once it is archived via HAR. This helps to answer that even though source text files could be seen, they do not add to the load on the Namenode. hadoop fs -lsr har:///tmp/harDestinationFolder2/hartest2.har
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - hdfs hdfs 0 2018-11-07 18:49 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2
-rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest1.txt
-rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest2.txt
-rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest3.txt
-rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:48 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest4.txt
-rw-r--r-- 3 hdfs hdfs 24 2018-11-07 18:49 har:///tmp/harDestinationFolder2/hartest2.har/harSourceFolder2/SampleTest5.txt
... View more
Labels:
10-29-2018
06:13 PM
2 Kudos
The idea of this article is to help admins detect artifacts(files/folders) in the cluster that are older than certain days. Also, in certain cases, there may be empty directories that are lying in the cluster which are not used and hence contribute to the small file issue. Hence, we have the attached script which performs,
1. Identifies files older than X days. 2. Identifies folders older than X days.
3. Deletes empty folders. Script Execution Script name is "findAll.sh" which expects 2 parameters which is 1. Age of the artifact (file/folder) in terms of days.
2. Actual location of the artifact (file/folder) in HDFS.
Based upon the type of artifact and kind of operation you would have to choose one of the three operations. NOTE:
1. Please make sure the user running the script has permissions to execute the command on the artifacts that is passed as parameter to script.
2. Also, running this script once may take some time based upon the size/hierarchy of the folders. But once the list is procured, you can act upon it as per need. Hence, I would recommend you to test the script in lower ENV and run it in PROD when the load on HDFS is less. 3. Please exercise caution on the folders on which you run the scripts. Example executions: Execution 1: To list the old folders:
[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
1
Please check the output in ./OldFolders-202054.txt ;
Execution 2: To list the old files:
[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
2
Please check the output in file ./Oldfiles.txt-202148
Execution 3 : To delete empty folders
[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
3
rmdir: `/tmp/hive/hive/_tez_session_dir': Directory is not empty
Please feel free to tweak and extend the functionalities of the script.findall.tar.gz
... View more
Labels:
08-30-2018
12:27 AM
2 Kudos
JSTACK and JMAP Collection Jstack Collection Step 1: Switch as the service user that started the process. #su - <service-user-who-started-the-process> Step 2: Capture the process ID #ps -ef | grep <process-name> #ps -ef | grep hive
hive 21887 1 0 Aug01 ? 00:58:04 /usr/jdk64/jdk1.8.0_112/bin/java -Xmx1024m -Dhdp.version=2.6.5.0-292 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.5.0-292 -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx1024m -Xmx2048m -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf/conf.server/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/2.6.5.0-292/hive/lib/hive-service-1.2.1000.2.6.5.0-292.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar -hiveconf hive.metastore.uris= -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive From above output, Parent service account is hive Process ID is 21887 Java version used is /usr/jdk64/jdk1.8.0_112/bin/java Step 3: Capture the java used by the process to start the service. From the above output it is /usr/jdk64/jdk1.8.0_112/bin/java Step 4: (In the order of priority) NOTE: We would need to consider running the command multiple times (min 5 times ) separated by 20-30 seconds. 4.1: Simple jstack for a responding process #<jstack-used-by-process>/jstack -l <pid> > <location-to-redirect-the-output>/jstack.out 4.2: Use kill for a hung process #kill -3 <pid> Corresponding output is captured in .out file of the process. 4.3: Use -F for a hung process #<jstack-used-by-process>/jstack -F <pid> > <location-to-redirect-the-output>/jstack.out JMap Collection Step 1: #su - <service-user-who-started-the-process> Step 2: Capture the process ID Step 3: Capture the java used by the process to start the service. Step 4: Determining the appropriate flag to be used, We use "-heap" option to determine if it is needed to use "-dump" option. #<jmap-used-by-process>/jmap -heap <pid> > jmapHEAP.out Upon multiple executions of the above command, if the percentage used is above 90% then we use the -dump flag as below, Sample output of above command is, Attaching to process ID 21887, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.112-b15
using thread-local object allocation.
Parallel GC with 8 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 87031808 (83.0MB)
MaxNewSize = 715653120 (682.5MB)
OldSize = 175112192 (167.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 141557760 (135.0MB)
used = 36859416 (35.151878356933594MB)
free = 104698344 (99.8481216430664MB)
26.038428412543404% used
From Space:
capacity = 5767168 (5.5MB)
used = 4211840 (4.0167236328125MB)
free = 1555328 (1.4832763671875MB)
73.0313387784091% used
To Space:
capacity = 5767168 (5.5MB)
used = 0 (0.0MB)
free = 5767168 (5.5MB)
0.0% used
PS Old Generation
capacity = 277872640 (265.0MB)
used = 161075720 (153.61377716064453MB)
free = 116796920 (111.38622283935547MB)
57.9674630794885% used
From the above output, 57% of heap is being used. The two general flags that are used while collecting the heapdumps are “-dump” and “-histo”. While former gives the heapdump in the form of binary file with the collection of objects at a particular time, latter provides the details of live objects in a text format. #<jmap-used-by-process>/jmap -dump:file=<location-to-redirect-the-output>/heapdump.hprof,format=b <PID> If histo label needs to be used, #<jmap-used-by-process>/jmap -histo <pid> > jmap.out NOTE: 1. Jmap/Jstack is high CPU intensive process, so please use it with caution. 2. Please try not to use -F as much as possible as critical data are missed with this option. If -F option needs to be used by any of the commands, Example: #/usr/jdk64/jdk1.8.0_112/bin/jmap -dump:file=/tmp/jmap21887.hprof,format=b -F 21887 #/usr/jdk64/jdk1.8.0_112/bin/jmap -histo -F 21887 > /tmp/jmaphistoF.out Thanks @Rajkumar Singh @Vinod Bonthu and @Kevin Wong for reviewing.
... View more
05-08-2018
02:16 AM
5 Kudos
As an extension to the article mentioned here we are using custom Ambari alerts to monitor the current state of the Journal Node edits health.
With the default monitoring that is present in the Ambari, we would not be alerted about the failure of edits that may happen in the one of the JN quorum. In typical HDFS HA env, there are three Journal node daemons that are deployed. If any one of the daemons fails to maintain the edits, then we are at risk of failovers and eventual cluster outage if another journal node hits similar issue as other journal node (Because, if quorum of edits are not maintained, then NN fails to be up). Hence, we need to have necessary alerting mechanism put in place for such failures. Journal Nodes may not get updated due to various reasons such as, 1. Disk getting full.
2. Corrupt Permissions.
3. Exhausted HDFS handlers in JN host, etc..
Attached are the artifacts, which contains, 1. alerts-test.json
2. jn_edits_tracker.py
jn_edits_tracker.py have preconfigured values, OK_CEIL = 9
WARN_FLOOR = 10
WARN_CEIL = 19
CRITICAL_FLOOR = 20
Which defines the corresponding time ranges in seconds for alerts to be triggered. This would alert in Ambari, if the "edits_inprogress" file is not updated for above configured time interval. Steps to configure the alert 1. Copy the jn_edits_tracker.py to /var/lib/ambari-server/resources/host_scripts 2. Now restart the Ambari-Server. 3. Run the following command to list all the existing alerts: curl -u admin:admin -i -H 'X-Requested-By:ambari' -X GET http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions 4. Install the custom alert using Curl command as following: curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerts-test.json http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions Attachments : jneditsarchive.zip
... View more
Labels:
03-08-2018
09:05 PM
1 Kudo
A Hive Database can contain both transactional and non transactional table. Hence, if we are doing some quick checks to determine if the table is ACID enabled, please run the following command. # hive -e "describe extended <Database>.<tablename>;" | grep "transactional=true" If you get an output with the string that you grep for, then the table is transactional. Example: #hive -e "describe extended default.hello_acid;" | grep "transactional=true"
... View more
Labels:
12-26-2017
05:00 PM
1 Kudo
Yes, one Ambari view server can manage multiple remote clusters. Please refer the article : https://hortonworks.com/blog/introduction-ambari-views-2-4-new-feature-remote-cluster-configuration/
... View more
09-14-2017
09:28 PM
It is expected in large clusters where node count ranges to few hundreds, the master services tend to be busy. One such master service is Namenode. Some of the critical activities that NN does includes, 1. Addressing client requests which includes verifying proper permissions, auth checks for HDFS resources. 2. Constant block report monitoring from all the Datanodes. 3. Updating the service and audit logs. are to name a few. In certain situations when there are rogue applications which tries to access multiple resources in HDFS or a data ingestion that is trying to load high data volumes, NN tends to be very busy. In such situations and cluster like these NN FSImage tends to be in $$GB. Hence, operations such as checkpointing would consume considerable bandwidth across the two Namenodes. Hence, high volume of edits sync along with loggings would cause high disk utilization which can lead to NameNode instability. Hence, it is recommended to have dedicated disks for service logs and edit logs. We can monitor the IO on the disks using `iostat` output.
... View more
Labels: