Member since
11-17-2017
76
Posts
7
Kudos Received
6
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6031 | 05-11-2020 01:31 AM | |
| 1949 | 04-14-2020 03:48 AM | |
| 6001 | 02-04-2020 01:29 AM | |
| 2160 | 10-17-2019 01:26 AM | |
| 8067 | 09-24-2019 01:46 AM |
02-15-2021
03:01 AM
Hi @jayGenesis , Impala supports simple bind authentication in CDH 6.3. The documentation for reference: LDAP BaseDN (--ldap_baseDN)
Replaces the username with a distinguished name (DN) of the form: uid=userid,ldap_baseDN. (This is equivalent to a Hive option).
LDAP Pattern (--ldap_bind_pattern)
This is the most general option, and replaces the username with the string ldap_bind_pattern where all instances of the string #UID are replaced with userid. For example, an ldap_bind_pattern of "user=#UID,OU=foo,CN=bar" with a username of henry will construct a bind name of "user=henry,OU=foo,CN=bar". This means that with the mentioned base dn configured there will be a bind request from Impala towards the LDAP server with uid=<username>,ou=users,dc=ldap,dc=xxx,dc=com user dn and its password, if this user does not exist the authentication will fail. Does the mentioned user exist in the LDAP directory?
... View more
05-11-2020
01:31 AM
1 Kudo
Hi @parthk, This is a tough question, because when discussing S3 access multiple components come into picture: First and foremost S3, S3 Select only supports CSV and JSON format at the moment, while Impala/Hive favors columnar storage formats Parquet/ORC in general. With just a couple of fields to filter on a partition strategy should possibly have the similar results with Parquet/ORC, I have not tested this, it would need some perf test on the datasets. Secondly Impala/Hive connects to S3 with the Hadoop S3A client, which is in the hadoop-aws module. An experimental S3 Select feature doc can be found here. Lastly the third party component has to support it as well. I spent some time on the AWS Hive S3 Select support and it seem to be a closed source INPUTFORMAT solution, I could not find 'com.amazonaws.emr.s3select.hive.S3SelectableTextInputFormat' anywhere. Digging a bit more I have found that upstream Hive does not support S3 Select either, the upstream Jira is HIVE-21112. I hope this 10,000 foot view helps, it is hard to answer questions about the future.
... View more
04-14-2020
03:48 AM
Hi @Jumo, CDH 6.3.x is packaged with Impala 3.2, the packaging details can be found on this page. The 2.6.9 Impala ODBC driver can be used with on CDH 6.3.x. I understand that the recommendation can be confusing and reached out internally to update the documentations.
... View more
02-04-2020
01:29 AM
1 Kudo
Hi @kentlee406, From the images it looks like that Kudu is not installed on the QuickStart VM: Kudu service can not be seen in the cluster services list Impala can not see any Kudu service on the config page Could you try adding Kudu service to the cluster, please see the steps in our documentation here.
... View more
10-17-2019
01:26 AM
Hi @ChineduLB, UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing. A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches: https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details: https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/
... View more
09-24-2019
01:46 AM
1 Kudo
@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'. Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows. For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.
... View more
09-17-2019
01:12 AM
@eMazarakis, apologies, I meant the part_col='201801*' intends to remove a whole month. If possible might worth reconsidering the partition strategy or the drop operation could be done in Hive separately. From CDH 5.10+ partition expressions can be specified, please see my response here for details.
... View more
09-16-2019
01:45 AM
Hi @eMazarakis, Thank you for the additional information. Altering multiple partitions was implemented in IMPALA-1654, this feature is available from Impala 2.8+ which is part of CDH 5.10+. Although, I am not aware of the workflow and the amount of data behind a partition, this specific expression part_col='201801* removes a whole month. If these requests are frequent and the workload/workflow allows it re-partitioning based on months could be a feasible workaround.
... View more
09-05-2019
01:59 AM
Hi eMazarakis, Multiple partitions can be dropped with the following syntax: alter table historical_data drop partition (year = 1996 and month between 1 and 6); Please see our ALTER TABLE Statement documentation for more details, the multiple partition drop can be found in section: To drop or alter multiple partitions.
... View more