About PARTOMIA

PARTOMIA · ‎09-21-2017

By Default, HiveServer2 and HiveMetastore does not have configuration for HeapDump on OOM. Production clusters have OOM and since the HeapDump on OOM is not configured, root cause analysis of the issue is obstructed. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following : if [ "$SERVICE" = "metastore" ]; then export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore else export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client fi export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'` -XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/ $HADOOP_CLIENT_OPTS" if [ "$SERVICE" = "hiveserver2" ]; then export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'` -XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/ -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS" fi

PARTOMIA · ‎08-12-2017

@abilgi I tried the above on two different Ambari versions (2.4.x and 2.5.x) with both Kerberised and Non-Kerberised environments. It does not work. The Step to Register the Remote Cluster Fails. I see following in the logs: 12 Aug 2017 01:59:09,098 ERROR [ambari-client-thread-33] BaseManagementHandler:67 - Bad request received: Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator. 2017-08-12T01:52:39.377Z, User(admin), RemoteIp(10.42.80.140), RequestType(POST), url(http://172.26.114.132:8080/api/v1/remoteclusters/HDP02), ResultStatus(400 Bad Request), Reason(Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator.) **Note : The user I have used is "admin" and is cluster administrator. Am I missing something? Also, what is the API way of getting this done? Is there any API way of registering a Remote Cluster?

PARTOMIA · ‎07-04-2017

There are lot of articles for NameNode heap calculation, but none on DataNode. 1. How to calculate the DataNode heap size? 2. How to calculate the object size of each Object in the DataNode Heap? 3. What does the Metadata of the DataNode heap contains? It cannot be similar to NameNode (as it does not have replication details etc. ), also, it should have metadata for checksum stored etc, so how does metadata of DataNode looks like. How is it different from NameNode Metadata?

PARTOMIA · ‎11-03-2016

@Saurabh Try doing : set hive.exec.scratchdir=/new_dir

PARTOMIA · ‎07-21-2016

@Saurabh Kumar You are welcome. For the issue with Java heap space , its due to Java_Heap for Solr Process. By default Solr process is started with only 512MB. We can increase this by editing the Solr config files or via solr command line options as: /opt/lucidworks-hdpsearch/solr/bin/solr -m 2g create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2 This will resolve the Java heap space issue.

PARTOMIA · ‎07-20-2016

@Saurabh Kumar The error which are you getting is : "Unable to create core [test_shard1_replica1] Caused by: Direct buffer memory"} " Looks to me that you have set up the Direct Memory ( to enable Block Cache ) as true in the "solrconfig.xml" file i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> From your "solrconfig.xml", I see the config as: <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str> <str name="solr.hdfs.confdir">/etc/hadoop/conf</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> </directoryFactory> I will suggest to turn off the Direct Memory if you do not plan to use it for now and then try the creation of collection. To disable it, edit the "solrconfig.xml" and looks for property - "solr.hdfs.blockcache.direct.memory.allocation". Make the value of this property to "false" i.e. <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> The final "solrconfig.xml" will therefore look like : <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://m1.hdp22:8020/user/solr</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.blockcache.write.enabled">false</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> </directoryFactory>

PARTOMIA · ‎07-17-2016

@Saurabh Kumar Looks from the error that the configuration file -"solrconfig.xml" is not properly configured for schema -"data_driven_schema_configs". Try to see if the "solrconfig.xml" is properly configured. If you need help, then, please upload the "solrconfig.xml" which you are presently using.

PARTOMIA · ‎07-17-2016

@Saurabh Kumar 1. Solr does not follow Master - Slave model, rather its Leader - Follower model. Each Solr node therefore will be used for Indexing/Query, in SolrCloud. Considering that you have 5 nodes, the Solr Collection creation therefore, can be done with 2 Shards and RF (Replication Factor ) of 2. This will allow to use 4 nodes for Solr. 2. Each node which is supposed to be used for Solr, need to be installed with "lucidworks-hdpsearch". 3. Resource usage depends on the Size of Index ( present and estimated growth of index ). Refer following for further understanding on resource usage: https://wiki.apache.org/solr/SolrPerformanceProblems

PARTOMIA · ‎07-16-2016

@Ted Yu Ambari does not automatically adjust memory for any components. You should use the companion scripts to calculate and tune the heap memory for each component. Also, you should try using Smartsense, which can identify signs of potential issues and provide recommendations for better tuning of HDP components. Neither Ambari, nor Smartsense will make any automatic adjustment to the configuration of HDP components. There are defaults values for the configuration ( which must be manually tuned as per cluster usage ).

PARTOMIA · ‎06-30-2016

@milind pandit There is no direct utility to find this. The files with different names but same content will have have same checksum. Using checksum option of hdfs , we can verify the same. For example: # hdfs dfs -ls /tmp/tst Found 6 items -rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/okay -rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass -rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass3 -rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pre -rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pro -rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/word # hdfs dfs -checksum /tmp/tst/okay /tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 # hdfs dfs -checksum /tmp/tst/pass /tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 # hdfs dfs -checksum /tmp/tst/pre /tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d # hdfs dfs -checksum /tmp/tst/pro /tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d From the above, the files "/tmp/tst/okay" and "/tmp/tst/pass" are holding same content, but the filenames are different. You can see from above that both files have same checksum. Similarly for "/tmp/tst/pro" and "/tmp/tst/pre". To check the checksum of files on a folder ( in this case "/tmp/tst" ) , following can be done: # hdfs dfs -checksum /tmp/tst/* /tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 /tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 /tmp/tst/pass3 MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 /tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d /tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d /tmp/tst/word MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 Also, you can use "hdfs find" to make a large search: # hdfs dfs -checksum `hdfs dfs -find /tmp -print` The above command will list checksum of all the files. You can also run with "sort and uniq " as : hdfs dfs -checksum `hdfs dfs -find /tmp -print` | sort | uniq -c | awk '{print $2,$4}'

Online	Offline
Last Visited	‎11-03-2025 08:14 AM

Member Since	‎03-22-2019 01:55 PM
Last Visited	‎11-03-2025 08:14 AM
Posts	46
Kudos received	8

Cloudera Community

Re: Solr installation

Re: Does Ambari automatically adjust memory settin...

Re: I would like to know if there are any data dup...

Configure HiveMetaStore and HiveServer2 for HeapDu...

Re: An introduction to Ambari Views 2.4 new featur...

HOW TO CALCULATE THE HEAPSIZE FOR DATANODE?

Re: Can we change location of staging data dir in ...

Re: Solr installation

Re: Solr installation

Re: Solr create collection error

Re: Solr installation

Re: Does Ambari automatically adjust memory settin...

Re: I would like to know if there are any data dup...