Member since
01-26-2016
64
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1434 | 03-07-2020 09:17 PM | |
4524 | 11-14-2019 08:58 PM |
03-07-2020
09:17 PM
There is a work in progress Jira that aims to make all the Cloudera software (including CM agent, Hue etc) use only FIPS-compliant crypto algorithms. Just it is not in place as of this writing.
... View more
11-20-2019
04:25 AM
Rightly said. Above SSSD config change will help here. Along with above SSSD change and restart, don't forget to restart the involved Hadoop daemons (like Nodemanager etc). This is needed to rebuild the in-memory cache which holds the UID -> Username mapping for up to 4 hours without invalidation [1] [1]: Ref: org.apache.hadoop.fs.CommonConfigurationKeys
----
public static final String HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY =
"hadoop.security.uid.cache.secs";
public static final long HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT =
4*60*60; // 4 hours
----
... View more
11-14-2019
09:05 PM
Thanks @samarsimha for sharing the feedback. Just to be on the same, how did you resolve the issue? were you on any of the affected JDK version? Your answer may help other users who face a similar issues.
... View more
11-14-2019
08:58 PM
1 Kudo
Wha is your Java/JDK version in use here? The error java.util.zip.ZipException: zip END header not found seems suspicious. Seems similar bugs reported in certain versions of openJDK 9 at https://bugs.openjdk.java.net/browse/JDK-8172872 or https://bugs.openjdk.java.net/browse/JDK-8170276
... View more
09-04-2019
07:20 AM
Without reviewing the logs, If I can guess on the issue, If you are submitting the Spark job from Oozie Shell action, then I would suspect the problem is with Oozie's behavior of setting the environment variable HADOOP_CONF_DIR behind the scene. There is an internal Jira that tracks this behavior/work. The KB [1] explains a bit on this (even though it is reported for the hive-site.xml, I think it may influence the HBase client conf as well). Try working around the problem by following the instructions on the KB [1] and see if it helps. Thanks Lingesh [1]: https://community.cloudera.com/t5/Customer/How-to-run-a-Spark2-job-using-Oozie-Shell-Action-which/ta-p/73185
... View more
03-04-2019
03:18 AM
The error snippets posted do not indicate a FATAL event that could interrupt with the main thread of HMaster or RS. Do you see any FATAL events in their respective roles before the crash? If not, check the standard output logs of these roles and see if they record any OOM situation.
... View more
03-04-2019
02:47 AM
1 Kudo
Normally Region splits in HBase is lightweight (the major delay could be attributed by a reference link file creation RPC call to Namenode) and hence should be pretty fast unless NN is undergoing performance issue. If client access this region during this timeframe, it may experience the said exception but that should be transient, transparent and non-fatal to the client application. Do you see any fatal errors at your client application? Do you have customized retry attempts in your client?
... View more
02-08-2019
05:06 AM
Not sure which version of CDH you met with this issue. Note that the hbase namespace table is a system table required for HBase to function properly. This table is not handled with a higher priority than other tables as noted in HBASE-14190. If the HBase Master tries to split a WALs on dead or ghost Region Servers then the HBase Master might get stuck trying to split these WALs on invalid Region Servers. The HBase Master can also get stuck trying to split corrupt 83-byte or smaller WAL files on startup in which case just sidelining those blocking WAL files would help. So increasing the hbase.master.namespace.init.timeout may or may not help depending on whether the Master is stuck with any other tasks like WAL splitting. Another workaround is to recreate the hbase:namespace table in which case it will get loaded quickly.
... View more
02-08-2019
04:50 AM
If the exported HFiles are getting deleted in the target and if you can also confirm it's the Master's HFileCleaner thread which is deleting them, then there is some problem at the initial stage of ExportSnapshot where snapshot Manifest/References are copied over. Check if there is any warning/errors reported in the console logs. Also, check the manifest does exist in the target cluster.
... View more
02-06-2019
01:56 AM
NotServingRegionException indicates that the queried region is not ONLINE anywhere in the cluster meaning none of the Regionservers serve the region. However, the 'hbase:meta' table is somehow containing an invalid record pointing to a specific RegionServer which actually do not host the region. If at all the region is staying in transition state (like FAILED_CLOSE or FAILED_OPEN) for any reason, it should have been highlighted by the HMaster in it's WebUI (HMaster WebUI > Search for "Regions in Transition" section). If nothing is reported for this region by HMaster WebUI, then you can simply try assigning the region (command: "assign '<REGION_ENCODED_NAME>'") from an hbase shell (launch hbase shell as 'hbase' user or user with sufficient privilege) and see if any of the RS can open it. If the region assignment fails, review the corresponding RS logs and investigate the reason (Active HMaster logs will help to know the RS where the region assignment was placed)
... View more