About mbigelow

mbigelow · ‎07-08-2017

I did this test and I was able to connect to both the statestore and catalogd over SSL, but this was because I was using the FQDN (hostname -f). The issue is that CatalogD and the Statestore are using the short name post upgrade for the statestore subscription. This feels like a bug was introduced or possible this was the intended behavior and it was "fixed", but now you need this configuration setting to get SSL for Impala to work. Cloudera please fix the code or update the Impala SSL docs to reflect the need for this setting.

mbigelow · ‎07-08-2017

I think a new "feature" was added in this latest release. We hit the same issue. We have SSL enabled for Impala with Kerberos. SSL worked for other services like the UIs and Impalad subscribing to the statestore, but catalogd continued to fail to subscribe to the statestore with the same error. Cloudera Support kindly pointed out that it wasn't trying to communicate using the FQDN; just the hostname. They provided this information. We applied the change and Impala is operational again. Cloudera Manager > Impala > Configurations> For Catalog > Catalog Server Command Line Argument Advanced Configuration Snippet (Safety Valve) For StateStore > Statestore Command Line Argument Advanced Configuration Snippet (Safety Valve) [configure] --hostname=hostname.example.com

mbigelow · ‎07-06-2017

There are a few SO entries and the message triggers thoughts of IPv6. Impala and Hadoop do not like IPv6. Check to see if it is enable. Either disable it or try to ensure that the Impala Statestore binds to an IPv4 address. I'll poke around to see if the latter is even possible.

mbigelow · ‎07-06-2017

It is for the CDSW master IP.

mbigelow · ‎07-06-2017

Presumably, Kerberos is enabled or you wouldn't be getting this error at all. All users must have a valid ticket from a KDC. This typically means running kinit prior to running any commands or jobs. You can also get a ticket using a keytab file, which is just a store version of the users password. The ticket is store in the ticket cache on the system. By default it is /tmp/krb5cc_<userid>. The client will check here first for a ticket. I would venture that some other process is getting a ticket and storing it in the ticket cache and the other processes are able to use it. This is likely since you are using the 'hdfs' account that the HDFS processes are running under. I strongly encourage you to not operate in this fashion. Instead of using the 'hdfs' account update the Superuser Group setting in CM to include a group that you wish to have HDFS superuser access, which I assumed is why you are using 'hdfs' in the first place.

mbigelow · ‎07-05-2017

This is a garbage collection (GC) pause. A GC trigger will depend on the type of GC in use for HS2. An obvious trigger that you can detect is that the heap was too full. Go the HiveServer2 role an click Chart Library. It typically defaults to the Process Resources charts, if not go there. You should see the JVM Heap Memory Usage and GC pause charts. If the heap is constantly high (70% or above) then that is the likely reason. In that case the solution could be a simple as increasing the HS2 heap. Note: the heap will increase as usage increases. So you could just have more concurrent or larger queries being processed by HS2.

mbigelow · ‎07-05-2017

1. yes, remove the corrupt files. Try the normal way, hdfs dfs -rm... If that doesn't work use hdfs fsck -move or -delete. The first will move the files to /lost+found the latter will remove them from the cluster. But to do that you need to know which files. 2. Use the command hdfs fsck <path> -list-corruptfileblocks -files -locations 3. Oh, I didn't notice this was in the HBase board. Can you expand on HBase's role in this issue? As that will effect the above answers (You don't want to be deleted HBase files through HDFS. HBase has its own version of fsck. Please run that and provide the output. The balancer will not handle missing or under-replicated blocks. Its only deals with existing blocks. HDFS should repair itself, but if this has to do with corrupt regions in HBase, then HDFS likely wont as HBase is more aware of the actual data. Here is a Cloudera doc on the topic of HBase and corrupt regions. https://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbck_poller.html

mbigelow · ‎07-05-2017

@littlewolf you should try to use the CM API, it has a Java api. I don't know the specific metric you are looking for but it may be there. I have used these to gather all queries issues to Hive and Impala to due some usage analysis. If it isn't in CM I would look at collecting it from the source. Most of the services and components expose metrics through JMX and host stats can be collected through some of the typical methods. https://cloudera.github.io/cm_api/

mbigelow · ‎07-04-2017

That links contains the proper instructions. Effectively, the hostnames you installed CDH with (CM agent and up) will be what is used for CM (and most likely director). Any change after that will require these steps.

mbigelow · ‎07-04-2017

Yes, the missing blocks (with replication factor 1) means that those files are now corrupt and unrecoverable. The 1169 are listed as missing and under replicated. This means that they need to be replicated from the other replicas of those blocks on the cluster. By default the minimum repl factor is 1 and the repl factor is 3. This means that if their are 2 replicas for a block, it will eventually be replicated, but not immediately. I believe the default value is that it will replicate them after after 1 hour (CDH) or 8 hours (Apache Hadoop). This provides some leeway for node outages without it flooding the cluster with replication operations. The cluster should recover. Please post if it does not.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Impala Catalogue server down after upgrading f...

Re: Impala Catalogue server down after upgrading f...

Re: Impala services fail on QuickStartVM

Re: Data Science Workbench installation instructio...

Re: Does hdfs user need to issue kinit?

Re: hive server 2 pause duration

Re: HDFS Missing blocks (with replication factor 1...

Re: How can I get some monitoring data from cloude...

Re: Hostnames of node instances

Re: HDFS Missing blocks (with replication factor 1...