Member since
02-16-2016
9
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1490 | 03-03-2016 01:02 PM |
06-28-2016
06:44 AM
1 Kudo
Hi Romain, Even after sharing the created dashboard on one of the existing indexes, it is not visible to shared users as per the video(Hue is LDAP integrated). Any comments? Thanks Pravdeep
... View more
04-04-2016
08:03 AM
Hi All, Turns out it was an RHEL bug on the node where report manager role is hosted and not related to the capacity, since as I stated above about HDFS utilisation. We had increased heap for this role number of items and everytime it would still jump resident memory threshold. RHEL Bug “Random JVM Hangs on RHEL 6.6 (Kernels 3.14-3.17)” http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_os_ki.html Thanks & Regards Pravdeep
... View more
04-04-2016
07:21 AM
For the viewers of the post : Few of the potential reasons which were figured out as the reasons for canary monitor failures : 1. Manual restarts of RegionServers. 2. Fetch fail by canary services in case of a remote read(usually related to network hitches). 3. Could also arise due to Hbase-Bug affecting the lower CDH releases (CHD 5.1 or below). These are few of the items that we figured, please feel free to add more causative scenarios. Thanks & Regards Pravdeep
... View more
03-17-2016
09:57 AM
Hi All, In one of our CDH clusters running with CDH 5.1, Hbase component canary is logging a lot of error messages even when the overall Hbase health is good and the jobs are running fine. The only notifications we have from CM are regionservers going in and out of concerning health (one at a time). Hbase canary logs are full of : "ERROR org.apache.hadoop.hbase.tool.Canary: The monitor is running too long (15002) after timeout limit:15000 will be killed itself !!" I understand that Canary keeps checking for overall region and regionserver health in the cluster but appreciate if someone can share their experience to pin point on the reason. Thanks & Regards Pravdeep
... View more
03-03-2016
01:09 PM
Hello, In few of our clusters, datanodes report block counts more than the threshold. As of now we have checked that there is data distributed evenly across datanodes, there are not corrupt blocks. Is it because of too many small files equivalent block size(perhaps parquet format). Please suggest what could be the reason or how about we go ahead to find one. Thanks & Regards Pravdeep
... View more
03-03-2016
01:02 PM
Hi, Thanks for your response, we eventually found out the reason upon checking the tables under nav_audit database which were loaded with getfacl records under them. We changed the regex to avoid logging getfacls and decreased the timepeiord to retain nav audits to 30 instead of 90(default) to bring down the disk usage of this database. Thanks & Regards Pravdeep
... View more
03-03-2016
12:57 PM
Hello, HDFS isn't very large at the moment. Utilisation stands at 25 TB out of total 420 TB. Thanks & Regards Pravdeep
... View more
03-03-2016
07:17 AM
2 Kudos
Hello, We have an issue in one of our environment, where the report manager keeps going down with GC pauses, this scenario is still evident after revising the heap twice for this service. Keen to understand what are the factors which contribute to this service's load which is leading to such a scenario. Also is increasing the heap is the only resort or GC algorithm it uses by default can be revisited? Please let me know your thoughts. Thanks & Regards Pravdeep
... View more
02-17-2016
03:00 AM
Hi,
We have seen few of our clusters getting affected because ibdata1 under mysql grows too large to occupy all the disk. On further checking it was nav_audt DB under CM databases which occupied all the space. Could this be because of BDR setup in the environment and if thats the case how can this be avoided. As a workaround,we are just changing the retention period for nav audits in CM.
Thanks
Pravdeep
... View more
Labels: