Member since
06-17-2016
6
Posts
17
Kudos Received
0
Solutions
07-19-2017
08:14 AM
1 Kudo
As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) : https://logging.apache.org/log4j/extras/download.html Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the location: /etc/hadoop/conf/secure/ 2. Add the property -Dhadoop.root.logger=INFO,ZIPRFA to export AMBARI_JVM_ARGS parameter in /var/lib/ambari-server/ambari-env.sh: export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Dhadoop.root.logger=INFO,ZIPRFA' 3. Uncomment/Add the server classpath in /var/lib/ambari-server/ambari-env.sh: export SERVER_CLASSPATH=/etc/hadoop/conf/secure 4. Added the following property to log4j: #log4j.rootLogger=INFO,filelog4j.rootLogger=INFO,ZIPRFA Comment out the following values: # Direct log messages to a log file
#log4j.appender.file=org.apache.log4j.RollingFileAppender
#log4j.appender.file.File=${ambari.log.dir}/${ambari.log.file}
#log4j.appender.file.MaxFileSize=80MB
#log4j.appender.file.MaxBackupIndex=60
#log4j.appender.file.layout=org.apache.log4j.PatternLayout
#log4j.appender.file.layout.ConversionPattern=%d{DATE} %5p [%t] %c{1}:%L - %m%n Add the following values: #### New Appender to Zip the Log Files Based on Daily Rotation
log4j.appender.ZIPRFA=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.ZIPRFA.File=${ambari.log.dir}/${ambari.log.file}
log4j.appender.ZIPRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%
log4j.appender.ZIPRFA.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.ZIPRFA.rollingPolicy.ActiveFileName=${ambari.log.dir}/${ambari.log.file}
log4j.appender.ZIPRFA.rollingPolicy.FileNamePattern=${ambari.log.dir}/${ambari.log.file}-.%d{yyyyMMdd}.log.gz
5. Run ps -ef | grep ambari and check for -Dhadoop.root.logger=INFO,EWMA,RFA 6. Now restart ambari server and check for -Dhadoop.root.logger again, it should have been changed to -Dhadoop.root.logger=INFO,ZIPRFA Now when the date changes, ambari server logs would be rolled over. -rw-r--r--. 1 root root 301 Sep 8 12:33 ambari-server.log-.20170907.log.gz
-rw-r--r--. 1 root root 921 Sep 9 20:23 ambari-server.log-.20170908.log.gz
-rw-r--r--. 1 root root 249 Sep 10 03:05 ambari-server.log-.20170909.log.gz
-rw-r--r--. 1 root root 304 Sep 11 05:43 ambari-server.log-.20170910.log.gz
-rw-r--r--. 1 root root 247 Sep 12 17:55 ambari-server.log-.20170911.log.gz
-rw-r--r--. 1 root root 28867 Sep 13 01:42 ambari-server.log-.20170912.log.gz
-rw-r--r--. 1 root root 1608 Sep 14 00:35 ambari-server.log-.20170913.log.gz
-rw-r--r--. 1 root root 1873 Sep 15 00:08 ambari-server.log-.20170914.log.gz Reference Article: https://community.hortonworks.com/articles/50058/using-log4j-extras-how-to-rotate-as-well-as-zip-th.html
... View more
Labels:
04-27-2017
12:11 PM
13 Kudos
hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a useful command to quickly view how much disk is available, how many DataNodes are running, corrupted blocks etc. Note: This article explains the disk space calculations as seen by the HDFS. Command: Run the command with sudo -u hdfs prefixed to ensure you don't get a permission denied error. sudo -u hdfs hdfs dfsadmin -report You will see an output similar to: Configured Capacity: 270082531328 (251.53 GB)
Present Capacity: 190246318080 (177.18 GB)
DFS Remaining: 143504465920 (133.65 GB)
DFS Used: 46741852160 (43.53 GB)
DFS Used%: 24.57%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (4):
Name: 123.45.678.910:50010 (kharearpit4.local)
Hostname: kharearpit4.local
Rack: /rack4
Decommission Status : Normal
Configured Capacity: 20063055872 (18.69 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 5971144704 (5.56 GB)
DFS Remaining: 14091870208 (13.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 70.24%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:56 UTC 2017
Name: 123.45.678.909:50010 (kharearpit3.local)
Hostname: kharearpit3.local
Rack: /rack3
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580618752 (14.51 GB)
Non DFS Used: 22774845440 (21.21 GB)
DFS Remaining: 44984360960 (41.89 GB)
DFS Used%: 18.70%
DFS Remaining%: 53.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
Name: 123.45.678.908:50010 (kharearpit1.local)
Hostname: kharearpit1.local
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580672000 (14.51 GB)
Non DFS Used: 31497687040 (29.33 GB)
DFS Remaining: 36261466112 (33.77 GB)
DFS Used%: 18.70%
DFS Remaining%: 43.51%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
Name: 123.45.678.907:50010 (kharearpit2.local)
Hostname: kharearpit2.local
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580520448 (14.51 GB)
Non DFS Used: 19592536064 (18.25 GB)
DFS Remaining: 48166768640 (44.86 GB)
DFS Used%: 18.70%
DFS Remaining%: 57.80%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
This article aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used, DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk. A detailed explanation of these parameters are as follows: 1. Configured Capacity It is the total capacity available to HDFS for storage. It is calculated as follows: Configured Capacity = Total Disk Space - Reserved Space
Reserved space is the space which is allocated for OS level operations. Reserved space can be configured using the parameter dfs.datanode.du.reserved which can be added/updated from hdfs-site.xml. Replication factor is irrelevant in the case of Configured Capacity.
2. Present Capacity It is the total amount of storage space which is actually available for storing the files after allocating some space for metadata and open-blocks (Non DFS Used space). So, the difference of Configured Capacity and Present Capacity is used for storing file system metadata and other information. When DataNodes sends report to the NameNode, it also has a Present Capacity parameter which is sent to the NameNode for the NameNode to track it and aggregate it from all the DataNodes, which gets displayed when hdfs dfsadmin -report command is run. Thus, Present Capacity may vary and it depends on the usage of other Non-HDFS directories, however, Configured Capacity remains same until you add/remove volume/disks from the HDFS. 3. DFS Used It is the storage space that has been used up by HDFS. In order to get the actual size of the files stored in HDFS, divide the 'DFS Used' by the replication factor. The replication factor can be found in the hdfs-site.xml config file configured under dfs.replication parameter. So if the DFS Used is 90 GB, and your replication factor is 3, the actual size of your files in HDFS will be 90/3 = 30 GB. 4. DFS Remaining It is the amount of storage space still available to the HDFS to store more files. If you have 90 GB remaining storage space, that mean you can still store up to 90/3 = 30 GB of files without exceeding your Configured Capacity and assuming replication factor is 3.
So after understanding DFS Used and DFS Remaining we can say that: Present Capacity = DFS Used + DFS Remaining
5. Non DFS Used Non DFS used is any data in the filesystem of the data node(s) that isn't in \dfs.datanode.data.dir. The term 'Non DFS Used' means that "How much of Configured Capacity is being occupied for Non DFS Use". Non DFS Used = Configured Capacity - DFS Remaining - DFS Used
VALIDATING THE OUTPUT
Present Capacity = Sum of [ DFS Used + DFS Remaining ] for all the Data Nodes In the output shared above after running the command, we have 4 DataNode Present Capacity = [ 40KB + 13.12 GB ] + [ 14.51 GB + 41.89 GB ] + [ 14.51 GB + 33.77 GB ] + [ 14.51 GB + 44.86 GB ] = 177.18 GB This is what we got when we ran the command. Configured Capacity = Sum of Configured Capacity for all the Data Nodes
= 18.69 GB + 77.62 GB + 77.62 GB + 77.62 GB = 251.55 GB Another way for checking the Configured Capacity is, Configured Capacity = Present Capacity + Non DFS Used on all the Data Nodes = 177.18 GB + [ 5.56 GB + 21.21 GB + 29.33 GB + 18.25 GB ]
= 251.53 GB
... View more
Labels:
03-17-2017
06:33 PM
1 Kudo
@Guy Riems Use the following command to get more details about live, dead, decommissioned data nodes along with their respective respective configured capacity and DFS/Non-DFS usage etc.
hdfs dfsadmin -report -live -dead -decommissioning
... View more
03-14-2017
05:37 PM
3 Kudos
It is recommended to place data transfer utilities like Sqoop on anything but an edge node, as the high data transfer volumes could risk the ability of Hadoop services on the same node to communicate.
It is also recommended to minimize the deployment of administrative tools on master and slave nodes to ensure that critical Hadoop services like the NameNode have as little competition for resources as possible.
... View more