About Lingesh

Lingesh · ‎03-07-2020

There is a work in progress Jira that aims to make all the Cloudera software (including CM agent, Hue etc) use only FIPS-compliant crypto algorithms. Just it is not in place as of this writing.

Lingesh · ‎11-20-2019

Rightly said. Above SSSD config change will help here. Along with above SSSD change and restart, don't forget to restart the involved Hadoop daemons (like Nodemanager etc). This is needed to rebuild the in-memory cache which holds the UID -> Username mapping for up to 4 hours without invalidation [1] [1]: Ref: org.apache.hadoop.fs.CommonConfigurationKeys ---- public static final String HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY = "hadoop.security.uid.cache.secs"; public static final long HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT = 4*60*60; // 4 hours ----

Lingesh · ‎11-14-2019

Thanks @samarsimha for sharing the feedback. Just to be on the same, how did you resolve the issue? were you on any of the affected JDK version? Your answer may help other users who face a similar issues.

Lingesh · ‎11-14-2019

Wha is your Java/JDK version in use here? The error java.util.zip.ZipException: zip END header not found seems suspicious. Seems similar bugs reported in certain versions of openJDK 9 at https://bugs.openjdk.java.net/browse/JDK-8172872 or https://bugs.openjdk.java.net/browse/JDK-8170276

Lingesh · ‎09-04-2019

Without reviewing the logs, If I can guess on the issue, If you are submitting the Spark job from Oozie Shell action, then I would suspect the problem is with Oozie's behavior of setting the environment variable HADOOP_CONF_DIR behind the scene. There is an internal Jira that tracks this behavior/work. The KB [1] explains a bit on this (even though it is reported for the hive-site.xml, I think it may influence the HBase client conf as well). Try working around the problem by following the instructions on the KB [1] and see if it helps. Thanks Lingesh [1]: https://community.cloudera.com/t5/Customer/How-to-run-a-Spark2-job-using-Oozie-Shell-Action-which/ta-p/73185

Lingesh · ‎03-04-2019

The error snippets posted do not indicate a FATAL event that could interrupt with the main thread of HMaster or RS. Do you see any FATAL events in their respective roles before the crash? If not, check the standard output logs of these roles and see if they record any OOM situation.

Lingesh · ‎03-04-2019

Normally Region splits in HBase is lightweight (the major delay could be attributed by a reference link file creation RPC call to Namenode) and hence should be pretty fast unless NN is undergoing performance issue. If client access this region during this timeframe, it may experience the said exception but that should be transient, transparent and non-fatal to the client application. Do you see any fatal errors at your client application? Do you have customized retry attempts in your client?

Lingesh · ‎02-08-2019

Not sure which version of CDH you met with this issue. Note that the hbase namespace table is a system table required for HBase to function properly. This table is not handled with a higher priority than other tables as noted in HBASE-14190. If the HBase Master tries to split a WALs on dead or ghost Region Servers then the HBase Master might get stuck trying to split these WALs on invalid Region Servers. The HBase Master can also get stuck trying to split corrupt 83-byte or smaller WAL files on startup in which case just sidelining those blocking WAL files would help. So increasing the hbase.master.namespace.init.timeout may or may not help depending on whether the Master is stuck with any other tasks like WAL splitting. Another workaround is to recreate the hbase:namespace table in which case it will get loaded quickly.

Lingesh · ‎02-08-2019

If the exported HFiles are getting deleted in the target and if you can also confirm it's the Master's HFileCleaner thread which is deleting them, then there is some problem at the initial stage of ExportSnapshot where snapshot Manifest/References are copied over. Check if there is any warning/errors reported in the console logs. Also, check the manifest does exist in the target cluster.

Lingesh · ‎02-06-2019

NotServingRegionException indicates that the queried region is not ONLINE anywhere in the cluster meaning none of the Regionservers serve the region. However, the 'hbase:meta' table is somehow containing an invalid record pointing to a specific RegionServer which actually do not host the region. If at all the region is staying in transition state (like FAILED_CLOSE or FAILED_OPEN) for any reason, it should have been highlighted by the HMaster in it's WebUI (HMaster WebUI > Search for "Regions in Transition" section). If nothing is reported for this region by HMaster WebUI, then you can simply try assigning the region (command: "assign '<REGION_ENCODED_NAME>'") from an hbase shell (launch hbase shell as 'hbase' user or user with sufficient privilege) and see if any of the RS can open it. If the region assignment fails, review the corresponding RS logs and investigate the reason (Active HMaster logs will help to know the RS where the region assignment was placed)

Online	Offline
Last Visited	‎05-12-2022 03:59 PM

Member Since	‎01-26-2016 09:44 PM
Last Visited	‎05-12-2022 03:59 PM
Posts	64
Kudos received	5

Cloudera Community

Re: Cloudera install on FIPS enabled rhel machine

Re: Nifi 1.10.0 startup issues

Re: Cloudera install on FIPS enabled rhel machine

Re: Cannot read log files on applications started ...

Re: Nifi 1.10.0 startup issues

Re: Nifi 1.10.0 startup issues

Re: Passing hbase-site.xml to spark jobs using spa...

Re: Region Servers start throwing Timeout exceptio...

Re: Hbase region not serving exceptions - Table si...

Re: hbase Master failed to become active master

Re: HBase exportSnapshot missing hfiles

Re: Hbase region not serving exceptions - Table si...