Member since
03-22-2019
46
Posts
8
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1822 | 07-20-2016 07:28 PM | |
460 | 07-16-2016 07:19 PM | |
469 | 06-30-2016 04:54 AM |
12-06-2018
09:39 PM
Introduction Performance of a cluster
running on Hadoop can be impacted by the OS partitioning. This document is
intended to understand the best practices to setup the “/var” folder/partition
with optimum size. Lets
try to approach this problem by asking some important questions.
What is “/var” used for?
How can the “/var” folder run out of disk
space?
Common issue to expect on a Hadoop cluster if
“/var” is out of disk space.
How is the current setup of “/var” in my
cluster ? Question 1 - What is
/var used for? From
OS perspective, “/var” is commonly used for constantly changing files i.e.
variable. The short form of which is “var”. Example
of such files could be the log file, mail, transient file, the printer spool,
temporary files, cached data, etc. For
example - “/var/tmp” holds the temporary files between system reboots. On any node (Hadoop or
non-Hadoop), /var directory holds content for a number of applications. It also
is used to store downloaded update packages on a temporary basis. The PackageKit update software
downloads updated packages to /var/cache/yum/ by default. /var/ partition
should be large enough to download package updates. An example of application which
uses /var is MySql, which by default uses “/var/lib/mysql” as the
MySql directory location. Question 2 - How can
/var folder run out of disk space? /var is much more susceptible to filling up - by accident
or by attack. Some of the directories which
can be affected by this is “/var/log”, “/var/tmp”, “/var/crash” etc. If there is a serious OS issue,
the logging can increase tremendously. If the disk space is set too low, like
10GB, this excessive logging can fill in the “disk” space for /var. Question 3 - Common
issue to expect on a Hadoop cluster if “/var” is out of disk space. /var
has been seen to be easily filled by a (possibly misbehaved) application, and
that if it wasn't separate from /, the filling of / could cause a kernel panic. “/var” folder has some very
important file/folders locations which are used by default by many kernel and
OS applications. For example –
“/var/run” is used for all the running
process to keep their PIDs and system information. If “/var” is full due
to low disk space configuration, then the application will fail to run.
“/var/lock” is the folder which contains
locks of the running applications for the files/devices they have locked
on. If the disk space runs out the lock is not possible and the
existing/new applications will fail.
“/var/lib” holds all the dynamic data
libraries and files for the applications. If there is no device space
left, the application will fail to work. “/var” is very important from
Hadoop perspective to keep all the service running. Running out of Disk space
on “/var” can cause Hadoop and dependent services to fail to run on that node. Question 4 - How is the
setup of “/var” in the clusters on my cluster?
Are the “Hadoop” separated from the “/var” folder location.
Are the huge sized logs or huge number of OS
logs still located on the “/var” location, example - “/var/log/messages”
and “/var/crash”.
If the Kdump is configured to capture the
crashdump logs, then risk increases, since these logs are usually huge
file sizes - sometime 100 GB or more.
The default configuration of the kdump logs
use the directory location “/var/crash”.
These days, the size of Physical Memory can
easily be 500GB ot 1TB, which would spill the kdump logs of huge size (
*note* - kdump logs can be compressed) The size of “/var” therefore plays important role
if /var/crash can be too low for saving the “crashdump” logs. If there is a OS crash (Kernel
Panic etc.) then the crashdump will never be captured complete, since the size
of “/var” is too low i.e. 10 GB or 50GB. Without the complete crashdump logs,
there can never be a complete analysis of the cause of Kernel Crash. Answer - Recommendations
on the optimum setup of “/var”. Increase the size of “/var” to 50GB at least
for all the nodes and have a uniform size across the clusters. Change the location of log file for the
“kdump”. Existing log file location is “/var/crash”. Kdump can be
configured to put the logs on any other local disk with a size of
around 300 - 500GB or as a best measure it can be dumped over
network to a remote disk. /var should by default should be separated from
the root partition. Depending on the requirement, the “/var/log” and
“/var/log/audit” can also be created as a separate partitions. /var should
be mounted on a LVM disk to allow increasing the sizes with ease if
required. All the
Hadoop Services logs should be separated from /var. The Hadoop Logs ideally
should be placed in a separate Disk. This disk should be used only for Logs
(from Hadoop and Dependent Applications Like MySql etc) and not for anything
else. This Log location should never be shared with the core Hadoop Services
like HDFS,YARN,ZOOKEEPER directory locations One way to achieve this could
be by creating a symlink of "/var/<hadoop_logs> to separate LVM
disks.
... View more
- Find more articles tagged with:
- FAQ
- Hadoop Core
- hadoop-maintenance
Labels:
08-08-2018
08:36 PM
@Victor
L
This can be done via Ambari UI. For each component (example HBase) there are configuration options for log file size, number, rotation etc. This is done via Log4j. For example for HBase, the default is: hbase.log.maxfilesize=256MB
hbase.log.maxbackupindex=20 HBase uses, the DRFA "org.apache.log4j.DailyRollingFileAppender". If you want the log rotation to happen based on filesize, then you should consider using - "org.apache.log4j.RollingFileAppender". You can tune the Log4j as per your exact requirement. Hope this helps. Regards Ravi
... View more
08-06-2018
01:54 PM
@Tusar Mohanty Clearly you have some major issue which has caused all services to fail to start, but the screenshot is not enough to start any troubleshooting here. Few question here: 1. I understand that this cluster is build on AWS instance. Can you make sure all the firewall/network setting are appropriate for AWS ? This is the first step. 2. Can you try to start ZooKeeper Server and HDFS services manually via command line from their respective nodes? While doing this, capture their Logs. 3. Provide the ambari.server, and ambari-agent logs. 4. Are the nodes in this cluster able to communicate with each other? 5. What is the ambari and HDP version? The above will help to get more idea on what is happening on this cluster?
... View more
08-03-2018
06:34 AM
What do you see in the HBase Master logs during the time of these logs in RegionServer?
... View more
08-03-2018
06:33 AM
@Prashant Verma Seems the Split Task is failing. 2018-08-0310:49:47,695 WARN [RS_LOG_REPLAY_OPS-CHMCISPRBDDN01:16020-0]
regionserver.SplitLogWorker: log splitting of
WALs/chmcisprbddn08.chm.intra,16020,1503542010122-splitting/chmcisprbddn08.chm.intra%2C16020%2C1503542010122.default.1503542017092
failed, returning error java.io.IOException:Cannotget log reader Also there seems to be issue with the Class. Caused by: java.lang.UnsupportedOperationException:Unable to find org.apache.hadoop.hbase.regionserver.wal.WALCellCodec,org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.wal.WALCellCodec,org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec RegionServer seems to be unable to find the class. What is the HDP, Ambari and HBase version? Share the rpm -qa if that is okay. Also, can you check the content of the WAL : # hdfs dfs -cat /apps/hbase/data/WALs/chmcisprbddn08.chm.intra,16020,1503542010122-splitting/chmcisprbddn08.chm.intra%2C16020%2C1503542010122.default.1503542017092
... View more
08-03-2018
06:01 AM
@Guozhen
Li
What is the HDP and Ambari Version? Kindly provide the rpm -qa output for Ambari, HDFS, YARN, HIVE. Would it be possible for you to upload the blueprint configuration?
... View more
08-03-2018
04:01 AM
/usr/hdp/current/hive/bin/schematool -info -dbType mysql -userName hive -passWord <password> -verbose Try following command and share the output.
... View more
08-03-2018
03:34 AM
@Harry Li Try following: create user 'hive'@'msl-dpe-perf74.msl.lab' identified by 'hive';
grant all privileges on *.* to 'hive'@'msl-dpe-perf74.msl.lab';
SHOW GRANTS FOR 'hive'@'msl-dpe-perf74.msl.lab';
flush privileges;
commit;
quit; Set the password of 'hive'@'msl-dpe-perf74.msl.lab' same as what you have given via Ambari. Other thing I could see from the error you mentioned in the beginning on this post was : org.apache.hadoop.hive.metastore.HiveMetaException:Failed to get schema version Can you check if following works for you: /usr/hdp/current/hive/bin/schematool -initSchema -dbType mysql
... View more
08-03-2018
02:59 AM
@Guozhen
Li
Also, let me know following: 1. Have you enabled this cluster with ResourceManager HA? 2. If not please check if the value for the property yarn.resourcemanager.ha.enabled is set to true?
... View more
08-03-2018
02:51 AM
@Guozhen
Li
The issue looks to be due to Yarn (on node 10.100.1.161) not being reachable at port 8141. There could be few possibilities here: 1. Yarn process is not running (seems unlikely from the details you provided). 2. Port 8141 has some issues. 3. Node 10.100.1.161 is not reachable to other nodes in the cluster. Can you verify following on node 10.100.1.161: # netstat -plan | grep 8141 # ps -ef | grep resourcemanager
# telnet 10.100.1.161 8141 Check the firewall as well.
... View more
08-01-2018
08:06 PM
@Shrikant BM Can you provide output of the explain and explain formatted: explain CREATE TEMPORARY TABLE SACC_MASTER.CC_ASSETS_TS AS SELECT * FROM (SELECT
ASSET_ID,SHIPD_DTS_GMT,CNTRCT_END_DTS_GMT,ACCNT_ID,ACCNT_NAME,PROD_LOB,PROD_TYPE,PROD_LINE,WARRANTY_CODE,WARRANTY_DESC,ACTIVE_FLG,ENRICHED_FLG,VALIDATE_FLG,SUPPORT_ASSIST_FLG,GROUP_ID,GROUP_NAME,SITE_ID,SITE_NAME,RGN_ABBR,RGN_DESC,SACC_LAST_UPDATED,
RANK() OVER (PARTITION BY ASSET_ID ORDER BY SACC_LAST_UPDATED DESC) AS
RANK FROM SACC_HISTORY.ASSETS_HISTORY ) RANKED WHERE RANKED.RANK=1 AND
RANKED.ACCNT_ID IN (6083); Can you try to change the execution engine to MR and run the same query? Provide the output of following: describe SACC_MASTER; describe CC_ASSETS_TS; Also the hive-logs.txt appear to the client logs. Can you upload the hiveserver2.log file ?
... View more
08-01-2018
07:15 PM
@Harry Li Can you try to login mysql as "hive" user: mysql -u hive -passWord <Password> -h <FQDN>
... View more
08-01-2018
07:11 PM
@Daniel Müller Are these inserts into hive, Single Inserts or Batch Inserts? If Single Inserts then it will take time for the 1000 inserts to complete. If these are batch inserts then we need to take a look into the HS2 Logs to idenitfy where is the query spending most of its time.
... View more
07-31-2018
09:31 PM
# cat /etc/default/useradd GROUP=100
HOME=/home
INACTIVE=30
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes # vi /etc/default/useradd GROUP=100
HOME=/hadoop
INACTIVE=30
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes Make any other default changes which you like.
... View more
07-31-2018
09:27 PM
@Harry Li This
is not something where ambari needs to be instructed. From ambari, all
it is doing is execurting the OS command - "useradd". The useradd command of Linux create the directory at the default locaiton which is /home. I
assume here that you are using the Linux mostly - RHEL or CentOS. If
so, then in order to change the default directory location from /home to
/hadoop, you need to edit the OS files - "/etc/default/useradd". # cat /etc/default/useradd
GROUP=100
HOME=/home
INACTIVE=30
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes
The file "/etc/default/useradd" as seen from above has the default HOME directory location set as /home. You can edit it to make it as # vi /etc/default/useradd
GROUP=100
HOME=/hadoop
INACTIVE=30
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes Hope this helps.
... View more
01-15-2018
12:49 PM
@n c I see that you have mentioned : " We have a hive database in one cluster. I want to have a copy of that database in a different cluster." One way to do this is to take a back of the hive database in MySql. 1. Stop the Hive Services. This is done to make sure that there is no new metadata update to the MetaStore. 2. On the node running the MySql, do following mysqldump hive > /backup_folder/hive_backup.sql 3. Start the services of Hive Again. On the other node, where you want to have the MySql backup to be restored, you need to install and configure MySql. Refer following document ( this is for HDP 2.5.6): https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.6/bk_command-line-installation/content/meet-min-system-requirements.html#metastore-db-requirements-getting-started Once MySql is setup, you can create a database in MySql called "hive" : mysql> create database hive; Now dump the DB which was backed up earlier: mysql hive < /backup_folder/hive_backup.sql You can use this Node to run another instance of Hive services, which can be effective as HA or simply use the MySql on the node to be as a Backup location. Make a regular backup by mysqldump and restore it on the other node. The other way of achieving your requirement could be to setup a HA for MySql database.
... View more
09-21-2017
03:36 PM
The HiveServer2 and HiveMetaStore can be configured for captured the GC logs based on Timestamp. This is useful in a production cluster, where having a timestamp on the log file add clarity and also avoids overwritting. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following : if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
fi
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- hivemetastore
- hiveserver2
Labels:
09-21-2017
03:31 PM
By Default, HiveServer2 and HiveMetastore does not have configuration for HeapDump on OOM. Production clusters have OOM and since the HeapDump on OOM is not configured, root cause analysis of the issue is obstructed. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following :
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/
$HADOOP_CLIENT_OPTS"
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/ -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
fi
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- hive-metastore
- hiveserver2
- OOM
Labels:
08-12-2017
02:00 AM
@abilgi I tried the above on two different Ambari versions (2.4.x and 2.5.x) with both Kerberised and Non-Kerberised environments. It does not work. The Step to Register the Remote Cluster Fails. I see following in the logs: 12 Aug 2017 01:59:09,098 ERROR [ambari-client-thread-33] BaseManagementHandler:67 - Bad request received: Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator. 2017-08-12T01:52:39.377Z, User(admin), RemoteIp(10.42.80.140), RequestType(POST), url(http://172.26.114.132:8080/api/v1/remoteclusters/HDP02), ResultStatus(400 Bad Request), Reason(Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator.) **Note : The user I have used is "admin" and is cluster administrator. Am I missing something? Also, what is the API way of getting this done? Is there any API way of registering a Remote Cluster?
... View more
07-04-2017
06:31 PM
There are lot of articles for NameNode heap calculation, but none on DataNode. 1. How to calculate the DataNode heap size? 2. How to calculate the object size of each Object in the DataNode Heap? 3. What does the Metadata of the DataNode heap contains? It cannot be similar to NameNode (as it does not have replication details etc. ), also, it should have metadata for checksum stored etc, so how does metadata of DataNode looks like. How is it different from NameNode Metadata?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
11-03-2016
04:48 AM
@Saurabh Try doing : set hive.exec.scratchdir=/new_dir
... View more
07-21-2016
08:13 AM
@Saurabh Kumar You are welcome. For the issue with Java heap space , its due to Java_Heap for Solr Process. By default Solr process is started with only 512MB. We can increase this by editing the Solr config files or via solr command line options as: /opt/lucidworks-hdpsearch/solr/bin/solr -m 2g create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2 This will resolve the Java heap space issue.
... View more
07-20-2016
07:28 PM
@Saurabh Kumar The error which are you getting is : "Unable to create core [test_shard1_replica1] Caused by: Direct buffer memory"} " Looks to me that you have set up the Direct Memory ( to enable Block Cache ) as true in the "solrconfig.xml" file i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> From your "solrconfig.xml", I see the config as: <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str>
<str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory> I will suggest to turn off the Direct Memory if you do not plan to use it for now and then try the creation of collection. To disable it, edit the "solrconfig.xml" and looks for property - "solr.hdfs.blockcache.direct.memory.allocation". Make the value of this property to "false" i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> The final "solrconfig.xml" will therefore look like : <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.blockcache.write.enabled">false</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory>
... View more
07-17-2016
08:01 PM
@Saurabh Kumar Looks from the error that the configuration file -"solrconfig.xml" is not properly configured for schema -"data_driven_schema_configs". Try to see if the "solrconfig.xml" is properly configured. If you need help, then, please upload the "solrconfig.xml" which you are presently using.
... View more
07-17-2016
07:46 PM
@Saurabh Kumar 1. Solr does not follow Master - Slave model, rather its Leader - Follower model. Each Solr node therefore will be used for Indexing/Query, in SolrCloud. Considering that you have 5 nodes, the Solr Collection creation therefore, can be done with 2 Shards and RF (Replication Factor ) of 2. This will allow to use 4 nodes for Solr. 2. Each node which is supposed to be used for Solr, need to be installed with "lucidworks-hdpsearch". 3. Resource usage depends on the Size of Index ( present and estimated growth of index ). Refer following for further understanding on resource usage: https://wiki.apache.org/solr/SolrPerformanceProblems
... View more
07-16-2016
07:19 PM
1 Kudo
@Ted Yu
Ambari does not automatically adjust memory for any components. You should use the companion scripts to calculate and tune the heap memory for each component. Also, you should try using Smartsense, which can identify signs of potential issues and provide recommendations for better tuning of HDP components. Neither Ambari, nor Smartsense will make any automatic adjustment to the configuration of HDP components. There are defaults values for the configuration ( which must be manually tuned as per cluster usage ).
... View more
06-30-2016
04:54 AM
1 Kudo
@milind pandit There is no direct utility to find this. The files with different names but same content will have have same checksum. Using checksum option of hdfs , we can verify the same. For example: # hdfs dfs -ls /tmp/tst
Found 6 items
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/okay
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass3
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pre
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pro
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/word
# hdfs dfs -checksum /tmp/tst/okay
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pass
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pre
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
# hdfs dfs -checksum /tmp/tst/pro
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
From the above, the files "/tmp/tst/okay" and "/tmp/tst/pass" are holding same content, but the filenames are different. You can see from above that both files have same checksum. Similarly for "/tmp/tst/pro" and "/tmp/tst/pre". To check the checksum of files on a folder ( in this case "/tmp/tst" ) , following can be done: # hdfs dfs -checksum /tmp/tst/*
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass3 MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/word MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 Also, you can use "hdfs find" to make a large search: # hdfs dfs -checksum `hdfs dfs -find /tmp -print`
The above command will list checksum of all the files. You can also run with "sort and uniq " as : hdfs dfs -checksum `hdfs dfs -find /tmp -print` | sort | uniq -c | awk '{print $2,$4}'
... View more
06-29-2016
07:34 PM
@Aman Mundra There is not much details why Namenode is failing to start. Can you share the NameNode log when trying to start the namenode service? It will help to identify what is causing the NameNode to fail to start. Meanwhile you can try to test if namenode can be manually started from command line? Run following : /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode or /var/lib/ambari-agent/ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode
... View more
06-28-2016
01:43 PM
3 Kudos
To remove a already installed Grafana, following needs to be done: 1. Stop the AMS service from Ambari UI. 2. Execute following curl API commands to delete Grafana: # curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop HDFS via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop HDFS via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/
# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/hosts/<strong><hostname_of_Grafana_host></strong>/host_components/METRICS_GRAFANA
# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
# curl -u admin:admin -X GET http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
3. Start the AMS service from Ambari UI. Here: Replace following : <ambari-server-hostname> = FullyqualifiedDomain Nameof the node running Ambari Server <CLUSTERNAME> = Name of the Cluster <hostname_of_Grafana_host> = FullyqualifiedDomain Name of the node where Grafana has to be deleted.
... View more
- Find more articles tagged with:
- Cloud & Operations
- grafana
- How-ToTutorial
Labels:
06-28-2016
07:41 AM
@Thomas Larsson The filesystem, when sees any Hard-disk error, should go into Read-Only mode. If it does not goes into read-only mode that is possible due to the mounting options. The filesystem should be mounted with the option errors=remount-ro . This means that if an error is detected, the filesystem will go into read-only. If the filesystem goes into Read-Only mode, then HDFS will also identify the same. Without OS identifying the filesystem as "read-only" HDFS will not be able to identify the same.
... View more