About saileshmukil

saileshmukil · ‎07-17-2018

Hi @mcalnd, We recently found a bug in 5.15, which we've tracked here: https://issues.apache.org/jira/browse/IMPALA-7298 I wonder if your error is a different symptom of the same bug. In krb5.conf, are the following flags set to true? "rdns=true" & "dns_canonicalize_hostname=true" If they're false, then we hit a known bug as mentioned in the above JIRA. - Sailesh

saileshmukil · ‎06-18-2018

Hi @buddelflinktier, Impala has started using KRPC for the TransmitData() RPC as well from 5.15. So you would need to configure the 'trusted_subnets' flag the same way in your Impala configuration as well (it doesn't pick it up from the Kudu configuration). - Sailesh

saileshmukil · ‎06-01-2018

@vvinaga It looks like it cannot talk to the HDFS NameNode from the logs. Could you check if HDFS is configured correctly to use Kerberos?

saileshmukil · ‎11-30-2016

Hi Pettax, You should be able to find the safety valve in the Cloudera Manager under the HDFS service. The S3AConnector used by Impala is managed by the HDFS service. It will be under the title: "Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml". Let me know if you have any other issues. - Sailesh

saileshmukil · ‎09-23-2016

Hi Cibi, Sorry for the delay. I missed the notification. The kerberos knobs seem correct. Could you attach your catalogd, statestored and some impalad logs? - Sailesh

saileshmukil · ‎09-13-2016

Hi Cibi, There is a 'renewable_lifetime' option in the krb5.conf file and a 'kerberos_reinit_interval' Impala startup flag. Which one are you looking at?

saileshmukil · ‎09-12-2016

Hi Jais, Is there anymore information regarding the errors mentioned? Also, are you encountering any problems using Impala? If not, it would be worth posting in the Cloudera Manger page as well: https://community.cloudera.com/t5/Cloudera-Manager-Installation/bd-p/CMInstall - Sailesh

saileshmukil · ‎09-12-2016

Hi Cibi, What is your 'kerberos_reinit_interval' flag set to? Also, do you see this only with the catalogd or even with the statestored and impalad? - Sailesh

saileshmukil · ‎08-22-2016

Hi Pettax, I would say option 1 and option 3 would be very similar in performance and allow for the best distribution of data across the cluster. I wouldn't opt for option 2.

saileshmukil · ‎08-18-2016

Hi, Looking at your Hive question, the parquet.block.size and dfs.blocksize should be honored. So, I'm not sure what's going wrong. The Hive folks should be able to help you with that. I can help you with the Impala side however. 1GB Parquet files with 4 row groups (256MB each) should work just fine w.r.t optimal performance. The key is that the row group boundaries should preferrably be at block boundaries, i.e. the beginning or end of a row group shouldn't cross a block boundary. It will work fine even if it does cross block boundaries, but this would cause some remote reads which would slow down scan time. However, having 4 Impala nodes for this setting would be ideal, so there is a higher chance that each row group is scanned by a different impalad. Or the easier thing to do would be to generate data as mentioned below. The fastest scans for Parquet files in Impala would be to have one row group per file where the file completely fits in a block (so 256MB or less is preferrable). This however, wouldn't give you a tremendous boost in performance compared to multiple row groups per file (probably ~5% on the average case). In any case, if you're able to fix the Hive Parquet file generation issue, you should start seeing faster scans through Impala.

Online	Offline
Last Visited	‎08-06-2018 05:17 PM

Member Since	‎01-29-2016 03:45 PM
Last Visited	‎08-06-2018 05:17 PM
Posts	17
Kudos received	7

Cloudera Community

Re: Impala Daemons in CDH 5.15 introduces KRPC por...

Re: KUDU trusted subnets does not work

Re: Setting max S3 connections

Re: Generation strategy for Parquet files

Re: Impala Daemons in CDH 5.15 introduces KRPC por...

Re: KUDU trusted subnets does not work

Re: kerberos authentication failure: GSSAPI Failur...

Re: Setting max S3 connections

Re: kerberos authentication failure: GSSAPI Failur...

Re: kerberos authentication failure: GSSAPI Failur...

Re: IMPALAD_QUERY_MONITORING_STATUS has become bad

Re: kerberos authentication failure: GSSAPI Failur...

Re: Generation strategy for Parquet files

Re: Generation strategy for Parquet files