Member since
03-22-2019
46
Posts
8
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5335 | 07-20-2016 07:28 PM | |
1075 | 07-16-2016 07:19 PM | |
1035 | 06-30-2016 04:54 AM |
09-21-2017
03:31 PM
By Default, HiveServer2 and HiveMetastore does not have configuration for HeapDump on OOM. Production clusters have OOM and since the HeapDump on OOM is not configured, root cause analysis of the issue is obstructed. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following :
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/
$HADOOP_CLIENT_OPTS"
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/ -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
fi
... View more
Labels:
08-12-2017
02:00 AM
@abilgi I tried the above on two different Ambari versions (2.4.x and 2.5.x) with both Kerberised and Non-Kerberised environments. It does not work. The Step to Register the Remote Cluster Fails. I see following in the logs: 12 Aug 2017 01:59:09,098 ERROR [ambari-client-thread-33] BaseManagementHandler:67 - Bad request received: Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator. 2017-08-12T01:52:39.377Z, User(admin), RemoteIp(10.42.80.140), RequestType(POST), url(http://172.26.114.132:8080/api/v1/remoteclusters/HDP02), ResultStatus(400 Bad Request), Reason(Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator.) **Note : The user I have used is "admin" and is cluster administrator. Am I missing something? Also, what is the API way of getting this done? Is there any API way of registering a Remote Cluster?
... View more
07-04-2017
06:31 PM
There are lot of articles for NameNode heap calculation, but none on DataNode. 1. How to calculate the DataNode heap size? 2. How to calculate the object size of each Object in the DataNode Heap? 3. What does the Metadata of the DataNode heap contains? It cannot be similar to NameNode (as it does not have replication details etc. ), also, it should have metadata for checksum stored etc, so how does metadata of DataNode looks like. How is it different from NameNode Metadata?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
11-03-2016
04:48 AM
@Saurabh Try doing : set hive.exec.scratchdir=/new_dir
... View more
07-21-2016
08:13 AM
@Saurabh Kumar You are welcome. For the issue with Java heap space , its due to Java_Heap for Solr Process. By default Solr process is started with only 512MB. We can increase this by editing the Solr config files or via solr command line options as: /opt/lucidworks-hdpsearch/solr/bin/solr -m 2g create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2 This will resolve the Java heap space issue.
... View more
07-20-2016
07:28 PM
@Saurabh Kumar The error which are you getting is : "Unable to create core [test_shard1_replica1] Caused by: Direct buffer memory"} " Looks to me that you have set up the Direct Memory ( to enable Block Cache ) as true in the "solrconfig.xml" file i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> From your "solrconfig.xml", I see the config as: <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str>
<str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory> I will suggest to turn off the Direct Memory if you do not plan to use it for now and then try the creation of collection. To disable it, edit the "solrconfig.xml" and looks for property - "solr.hdfs.blockcache.direct.memory.allocation". Make the value of this property to "false" i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> The final "solrconfig.xml" will therefore look like : <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.blockcache.write.enabled">false</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory>
... View more
07-17-2016
08:01 PM
@Saurabh Kumar Looks from the error that the configuration file -"solrconfig.xml" is not properly configured for schema -"data_driven_schema_configs". Try to see if the "solrconfig.xml" is properly configured. If you need help, then, please upload the "solrconfig.xml" which you are presently using.
... View more
07-17-2016
07:46 PM
@Saurabh Kumar 1. Solr does not follow Master - Slave model, rather its Leader - Follower model. Each Solr node therefore will be used for Indexing/Query, in SolrCloud. Considering that you have 5 nodes, the Solr Collection creation therefore, can be done with 2 Shards and RF (Replication Factor ) of 2. This will allow to use 4 nodes for Solr. 2. Each node which is supposed to be used for Solr, need to be installed with "lucidworks-hdpsearch". 3. Resource usage depends on the Size of Index ( present and estimated growth of index ). Refer following for further understanding on resource usage: https://wiki.apache.org/solr/SolrPerformanceProblems
... View more
07-16-2016
07:19 PM
1 Kudo
@Ted Yu
Ambari does not automatically adjust memory for any components. You should use the companion scripts to calculate and tune the heap memory for each component. Also, you should try using Smartsense, which can identify signs of potential issues and provide recommendations for better tuning of HDP components. Neither Ambari, nor Smartsense will make any automatic adjustment to the configuration of HDP components. There are defaults values for the configuration ( which must be manually tuned as per cluster usage ).
... View more
06-30-2016
04:54 AM
1 Kudo
@milind pandit There is no direct utility to find this. The files with different names but same content will have have same checksum. Using checksum option of hdfs , we can verify the same. For example: # hdfs dfs -ls /tmp/tst
Found 6 items
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/okay
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass3
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pre
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pro
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/word
# hdfs dfs -checksum /tmp/tst/okay
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pass
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pre
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
# hdfs dfs -checksum /tmp/tst/pro
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
From the above, the files "/tmp/tst/okay" and "/tmp/tst/pass" are holding same content, but the filenames are different. You can see from above that both files have same checksum. Similarly for "/tmp/tst/pro" and "/tmp/tst/pre". To check the checksum of files on a folder ( in this case "/tmp/tst" ) , following can be done: # hdfs dfs -checksum /tmp/tst/*
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass3 MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/word MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 Also, you can use "hdfs find" to make a large search: # hdfs dfs -checksum `hdfs dfs -find /tmp -print`
The above command will list checksum of all the files. You can also run with "sort and uniq " as : hdfs dfs -checksum `hdfs dfs -find /tmp -print` | sort | uniq -c | awk '{print $2,$4}'
... View more
- « Previous
-
- 1
- 2
- Next »