Member since
01-26-2016
64
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1192 | 03-07-2020 09:17 PM | |
3726 | 11-14-2019 08:58 PM |
03-07-2020
09:17 PM
There is a work in progress Jira that aims to make all the Cloudera software (including CM agent, Hue etc) use only FIPS-compliant crypto algorithms. Just it is not in place as of this writing.
... View more
11-20-2019
04:25 AM
Rightly said. Above SSSD config change will help here. Along with above SSSD change and restart, don't forget to restart the involved Hadoop daemons (like Nodemanager etc). This is needed to rebuild the in-memory cache which holds the UID -> Username mapping for up to 4 hours without invalidation [1] [1]: Ref: org.apache.hadoop.fs.CommonConfigurationKeys
----
public static final String HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY =
"hadoop.security.uid.cache.secs";
public static final long HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT =
4*60*60; // 4 hours
----
... View more
11-14-2019
09:16 PM
1 Kudo
Yes , it was picking up openJDK 9 which was in the classpath, i pointed to right JDK 11 and it is working now.
... View more
09-04-2019
07:20 AM
Without reviewing the logs, If I can guess on the issue, If you are submitting the Spark job from Oozie Shell action, then I would suspect the problem is with Oozie's behavior of setting the environment variable HADOOP_CONF_DIR behind the scene. There is an internal Jira that tracks this behavior/work. The KB [1] explains a bit on this (even though it is reported for the hive-site.xml, I think it may influence the HBase client conf as well). Try working around the problem by following the instructions on the KB [1] and see if it helps. Thanks Lingesh [1]: https://community.cloudera.com/t5/Customer/How-to-run-a-Spark2-job-using-Oozie-Shell-Action-which/ta-p/73185
... View more
03-04-2019
03:18 AM
The error snippets posted do not indicate a FATAL event that could interrupt with the main thread of HMaster or RS. Do you see any FATAL events in their respective roles before the crash? If not, check the standard output logs of these roles and see if they record any OOM situation.
... View more
03-04-2019
02:47 AM
1 Kudo
Normally Region splits in HBase is lightweight (the major delay could be attributed by a reference link file creation RPC call to Namenode) and hence should be pretty fast unless NN is undergoing performance issue. If client access this region during this timeframe, it may experience the said exception but that should be transient, transparent and non-fatal to the client application. Do you see any fatal errors at your client application? Do you have customized retry attempts in your client?
... View more
02-08-2019
05:06 AM
Not sure which version of CDH you met with this issue. Note that the hbase namespace table is a system table required for HBase to function properly. This table is not handled with a higher priority than other tables as noted in HBASE-14190. If the HBase Master tries to split a WALs on dead or ghost Region Servers then the HBase Master might get stuck trying to split these WALs on invalid Region Servers. The HBase Master can also get stuck trying to split corrupt 83-byte or smaller WAL files on startup in which case just sidelining those blocking WAL files would help. So increasing the hbase.master.namespace.init.timeout may or may not help depending on whether the Master is stuck with any other tasks like WAL splitting. Another workaround is to recreate the hbase:namespace table in which case it will get loaded quickly.
... View more
02-08-2019
04:50 AM
If the exported HFiles are getting deleted in the target and if you can also confirm it's the Master's HFileCleaner thread which is deleting them, then there is some problem at the initial stage of ExportSnapshot where snapshot Manifest/References are copied over. Check if there is any warning/errors reported in the console logs. Also, check the manifest does exist in the target cluster.
... View more
02-06-2019
01:47 AM
Thanks for sharing the steps to resolve the issue. Yes, indeed every NN/DN in each cluster should have access to other cluster's node and vice-versa since the ExportSnapshot is more of the HDFS distcp operation where the majority of the work involves copying the HFiles (associated with the snapshot) in a distributed fashion from the source to target (similar to distcp). It would be helpful if you could share the complete stack trace of the exception which would also help to understand the flow during the failure. Again thanks for taking the time to post the solution.
... View more