Member since
01-29-2016
17
Posts
11
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4677 | 07-17-2018 09:18 AM | |
4279 | 06-18-2018 10:27 AM | |
7860 | 11-30-2016 08:32 AM | |
9405 | 08-18-2016 02:24 PM |
07-17-2018
09:18 AM
1 Kudo
Hi @mcalnd, We recently found a bug in 5.15, which we've tracked here: https://issues.apache.org/jira/browse/IMPALA-7298 I wonder if your error is a different symptom of the same bug. In krb5.conf, are the following flags set to true? "rdns=true" & "dns_canonicalize_hostname=true" If they're false, then we hit a known bug as mentioned in the above JIRA. - Sailesh
... View more
06-18-2018
10:27 AM
Hi @buddelflinktier, Impala has started using KRPC for the TransmitData() RPC as well from 5.15. So you would need to configure the 'trusted_subnets' flag the same way in your Impala configuration as well (it doesn't pick it up from the Kudu configuration). - Sailesh
... View more
06-01-2018
01:31 PM
@vvinaga It looks like it cannot talk to the HDFS NameNode from the logs. Could you check if HDFS is configured correctly to use Kerberos?
... View more
11-30-2016
08:32 AM
1 Kudo
Hi Pettax, You should be able to find the safety valve in the Cloudera Manager under the HDFS service. The S3AConnector used by Impala is managed by the HDFS service. It will be under the title: "Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml". Let me know if you have any other issues. - Sailesh
... View more
09-23-2016
10:00 AM
Hi Cibi, Sorry for the delay. I missed the notification. The kerberos knobs seem correct. Could you attach your catalogd, statestored and some impalad logs? - Sailesh
... View more
09-13-2016
09:07 PM
Hi Cibi, There is a 'renewable_lifetime' option in the krb5.conf file and a 'kerberos_reinit_interval' Impala startup flag. Which one are you looking at?
... View more
09-12-2016
10:37 PM
Hi Jais, Is there anymore information regarding the errors mentioned? Also, are you encountering any problems using Impala? If not, it would be worth posting in the Cloudera Manger page as well: https://community.cloudera.com/t5/Cloudera-Manager-Installation/bd-p/CMInstall - Sailesh
... View more
09-12-2016
07:05 PM
Hi Cibi, What is your 'kerberos_reinit_interval' flag set to? Also, do you see this only with the catalogd or even with the statestored and impalad? - Sailesh
... View more
08-22-2016
11:32 AM
1 Kudo
Hi Pettax, I would say option 1 and option 3 would be very similar in performance and allow for the best distribution of data across the cluster. I wouldn't opt for option 2.
... View more
08-18-2016
02:24 PM
1 Kudo
Hi, Looking at your Hive question, the parquet.block.size and dfs.blocksize should be honored. So, I'm not sure what's going wrong. The Hive folks should be able to help you with that. I can help you with the Impala side however. 1GB Parquet files with 4 row groups (256MB each) should work just fine w.r.t optimal performance. The key is that the row group boundaries should preferrably be at block boundaries, i.e. the beginning or end of a row group shouldn't cross a block boundary. It will work fine even if it does cross block boundaries, but this would cause some remote reads which would slow down scan time. However, having 4 Impala nodes for this setting would be ideal, so there is a higher chance that each row group is scanned by a different impalad. Or the easier thing to do would be to generate data as mentioned below. The fastest scans for Parquet files in Impala would be to have one row group per file where the file completely fits in a block (so 256MB or less is preferrable). This however, wouldn't give you a tremendous boost in performance compared to multiple row groups per file (probably ~5% on the average case). In any case, if you're able to fix the Hive Parquet file generation issue, you should start seeing faster scans through Impala.
... View more