About thewayofthinkin

thewayofthinkin · ‎02-14-2020

Wire compatibility ❏ Preserves compatibility with Hadoop 2 clients ❏ Distcp/WebHDFS compatibility preserved

thewayofthinkin · ‎02-13-2020

@attilabukor Hi, Thank you for your comment. Yesterday, I got a comment from HaoHao about this issue. The issue not being able to create KUDU table on CDH 6.3.2 is related to remote HMS configuration in hive-site.xml KuduTable.java ( on CDH 6.3.2 ) has a logic validating if `hmsuris` is null or empty. If `hmsuris` is empty or null, it raises exception and fails. This has been fixed on master branch, but I'm not sure if this fix will be delivered with CDH 6.3.3 Can you confirm if the fix is shipped with CDH 6.3.3? Here is the bug report ticket. https://issues.apache.org/jira/browse/IMPALA-8974 Gatsby

thewayofthinkin · ‎02-11-2020

My current environment is CDH 6.3.2 Impala v3.2.0-cdh6.3.2 kudu 1.10.0-cdh6.3.2 Somehow, creating table with kudu storage gives IllegalArgumentException. It was ok with kudu 1.7.0-cdh5.16.2 CREATE TABLE test_mlee ( id BIGINT, name STRING, PRIMARY KEY(id) ) PARTITION BY HASH PARTITIONS 16 STORED AS KUDU ERROR: IllegalArgumentException: null Any comment is appreciated Thank you in advance.

thewayofthinkin · ‎02-04-2020

have you checked the tested/supported OpenJDK list? https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_java_requirements.html#concept_hzw_zyl_rcb JDK 8u40, 8u45, 8u60, and 8u242 are not supported due to JDK issues impacting CDH functionality:

thewayofthinkin · ‎02-04-2020

Hello, Thank you for reading this question. Recently, one of Hadoop clusters we have has been upgraded to Hadoop 3.0 ( CDH 6.3.2 ). I'm curious that the copied data from Hadoop 3.0 ( by distcp ) can be used by Hadoop 2.6 cluster. I was able to distcp from Hadoop 3.0 cluster to Hadoop 2.6, but I found out this document. https://hadoop.apache.org/docs/r3.0.3/hadoop-distcp/DistCp.html#Copying_Between_Versions_of_HDFS Can you give me some comment for this? Thank you very much in advance.

thewayofthinkin · ‎09-07-2017

Hello, Have you encountered this problem? $ beeline -u jdbc:hive2:// Could not find valid SPARK_HOME while searching ['/home', '/usr/bin'] /usr/bin/beeline: line 32: /bin/spark-class: No such file or directory /usr/bin/beeline: line 32: exec: /bin/spark-class: cannot execute: No such file or directory

thewayofthinkin · ‎04-24-2017

Yeap, you're right

thewayofthinkin · ‎03-21-2017

Alex, Thank you again. Subquery approach has been recommended to our team as a long term solution. However, for short-tem solution to avoid regression impact, using view with limited partitions has been selected. If I remember correctly, in MySQL `table A` data can be limited by `ON Clause` before joining so that cadidates for join can be reduced. Thank you for your valuable comment. Gatsby

thewayofthinkin · ‎03-20-2017

Alex, First of all, thank you very much for your explanation. You're right. the second query selects partition in table A. And, I'm fully aware of the difference between the first one( using on clause ) and second on ( where clause ) like the way you explained. The reason different variances were tried to find out ways to limit table A data before joinging two tables. ( yes, second query doesn't work this way ) In MySQL, table A could be limited by `ON clause`, but with Impala, I don't know how to do it. Do you think using subquery is the best way? Thank you Gatsby

thewayofthinkin · ‎03-20-2017

Hi Henry, I have a question for you and it is about `partition pruning` ( about pruning ) Let's say there are two tables A and B. And, each table is partitioned by yearweek. And, here is the query I'd like to run. ( Yes. I need to use left join to get result what I want ) SELECT * FROM A LEFT OUTER JOIN B ON A.account_id = B.account_id AND A.yearweek = 201710 and B.yearweek = 201710 Even this doesn't select specific partition in `table A` SELECT * FROM A LEFT OUTER JOIN B ON A.account_id = B.account_id AND B.yearweek = 201710 WHERE A.yearweek = 201710 Like you said, `A.yearweek` = 201710 on `On clause` couldn't select partition yearweek=201710. This might be filter is applied from `Left to Right`. In order to select specific partition for `table A`. I used `dynamic partition` and updated query like this. SELECT * FROM (SELECT * FROM A WHERE yearweek = 201710) a LEFT OUTER JOIN B b ON a.account_id = b.account_id AND b.yearweek = 201710 Do you think this is best I can do? Or is there way to limit data for `table A` by using `On clause`? And, is there any refrerence you would recommend for me to upderstand how JOIN works in Impala and Hive? Thank you very much in advance. Gatsby

Online	Offline
Last Visited	‎04-28-2020 07:04 PM

Member Since	‎12-30-2015 01:25 PM
Last Visited	‎04-28-2020 07:04 PM
Posts	73
Kudos received	3

Cloudera Community

Re: Distcp between Hadoop 2.6 ( CDH 5.16.2 ) and H...

Re: Failed to create table with kudu storage ( CDH...

Re: Impersonation issue after migrating from Oracl...

Re: ExecPlanRequest rpc exception error while COMP...

Re: Distcp between Hadoop 2.6 ( CDH 5.16.2 ) and H...

Re: Failed to create table with kudu storage ( CDH...

Failed to create table with kudu storage ( CDH 6.3...

Re: Impersonation issue after migrating from Oracl...

Distcp between Hadoop 2.6 ( CDH 5.16.2 ) and Haddo...

beeline doesn't start due to missing /bin/spark-cl...

Re: Need help with Impala 2.8 on CDH 5.10 upgrade

Re: Impala runtime filter not working as expected

Re: Impala runtime filter not working as expected

Re: Impala runtime filter not working as expected