Member since
11-17-2017
76
Posts
7
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3646 | 05-11-2020 01:31 AM | |
1200 | 04-14-2020 03:48 AM | |
4804 | 02-04-2020 01:29 AM | |
1315 | 10-17-2019 01:26 AM | |
4481 | 09-24-2019 01:46 AM |
02-15-2021
03:01 AM
Hi @jayGenesis , Impala supports simple bind authentication in CDH 6.3. The documentation for reference: LDAP BaseDN (--ldap_baseDN)
Replaces the username with a distinguished name (DN) of the form: uid=userid,ldap_baseDN. (This is equivalent to a Hive option).
LDAP Pattern (--ldap_bind_pattern)
This is the most general option, and replaces the username with the string ldap_bind_pattern where all instances of the string #UID are replaced with userid. For example, an ldap_bind_pattern of "user=#UID,OU=foo,CN=bar" with a username of henry will construct a bind name of "user=henry,OU=foo,CN=bar". This means that with the mentioned base dn configured there will be a bind request from Impala towards the LDAP server with uid=<username>,ou=users,dc=ldap,dc=xxx,dc=com user dn and its password, if this user does not exist the authentication will fail. Does the mentioned user exist in the LDAP directory?
... View more
05-11-2020
01:31 AM
1 Kudo
Hi @parthk, This is a tough question, because when discussing S3 access multiple components come into picture: First and foremost S3, S3 Select only supports CSV and JSON format at the moment, while Impala/Hive favors columnar storage formats Parquet/ORC in general. With just a couple of fields to filter on a partition strategy should possibly have the similar results with Parquet/ORC, I have not tested this, it would need some perf test on the datasets. Secondly Impala/Hive connects to S3 with the Hadoop S3A client, which is in the hadoop-aws module. An experimental S3 Select feature doc can be found here. Lastly the third party component has to support it as well. I spent some time on the AWS Hive S3 Select support and it seem to be a closed source INPUTFORMAT solution, I could not find 'com.amazonaws.emr.s3select.hive.S3SelectableTextInputFormat' anywhere. Digging a bit more I have found that upstream Hive does not support S3 Select either, the upstream Jira is HIVE-21112. I hope this 10,000 foot view helps, it is hard to answer questions about the future.
... View more
04-14-2020
03:48 AM
Hi @Jumo, CDH 6.3.x is packaged with Impala 3.2, the packaging details can be found on this page. The 2.6.9 Impala ODBC driver can be used with on CDH 6.3.x. I understand that the recommendation can be confusing and reached out internally to update the documentations.
... View more
02-04-2020
04:51 AM
Hi @jaya123, Earlier when I have seen ImpalaThriftAPICallFailed it was due to a connection timeout between the client and Impala. Through ODBC/JDBC the connection can become inactive when Impala is executing the query, if the client tries to use a closed connection the call fails. The client might be trying to close the connection as well. Possible causes of the connection termination could be: A load balancer terminates the idle connection The driver closes the connection because SocketTimeout is reached The TRACE level driver logs can help to identify how and when was the connection terminated the next steps could be: Enable TRACE level driver logging, the log level and the log path has to be configured, please see our documentation here. This configuration is often client specific. Open the connection logs and look for the ImpalaThriftAPICallFailed message. Check the earlier messages, there could be other errors or the connection was probably closed just before the client request. The timestamps should help identify which timeout was reached, the SocketTimeout is 30s by default. The timeouts could be reached because a slow query execution or because the client did not close the query. As the dashboard is refreshing it is probably because the client does not close the query. Just in case the query speed should be checked, if that is fine then the socket timeout could be increased a bit to give time for the client to close the query.
... View more
02-04-2020
01:29 AM
1 Kudo
Hi @kentlee406, From the images it looks like that Kudu is not installed on the QuickStart VM: Kudu service can not be seen in the cluster services list Impala can not see any Kudu service on the config page Could you try adding Kudu service to the cluster, please see the steps in our documentation here.
... View more
11-11-2019
08:54 AM
1 Kudo
Hi @mrmikewhitman, Based on the error message it appears to be a certification issue. I would start by verifying if the certificate is valid with openssl and check if it works with impala-shell to connect to Impala. Additionally, with a proxy installed there are further requirements, please see them here.
... View more
11-11-2019
05:58 AM
Hi @Asad, Impala does not fully support unicode characters at the moment, please see 'Character sets' chapter of our documentation here for more information. Could you advise if the data is stored in UTF-8?
... View more
10-17-2019
01:26 AM
Hi @ChineduLB, UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing. A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches: https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details: https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/
... View more
10-11-2019
06:59 AM
Hi @Shruhti, This indeed odd, my first assumption would be that the 'select 1' queries are triggered by a client application such as a BI tool silently. Maybe to check/keep the connection alive? Might worth checking the trace level driver logs, that could verify if the queries are coming from a tool/application. This can be done by changing the driver log level, which is described here for ODBC. Additionally, the query profile contains a Network Address as well, this should help confirm whether the source of the query is valid.
... View more
10-11-2019
06:32 AM
Hi @Nisha2019, This example seems like a snippet from our documentation here. Just above this example DESCRIBE statement there is a sample CREATE TABLE query that generates this table schema, please see bellow. As per ingesting data into these tables, Impala does not support creating data with complex type columns currently, Loading Data Containing Complex Types describes it in more detail. Additionally, some more information can be found in the Complex type considerations chapter. Hive does not support inserting values to a parquet complex type one-by-one either, but there are two solutions: Creating a temporary table with values, then transform it to Parquet complex type with Hive, please see our documentation here for sample queries: Constructing Parquet Files with Complex Columns Using Hive Using INSERT INTO ... SELECT <values> query, for inserting records one by one, reference queries can be found in the description of IMPALA-3938. Please note that this will generate separate files for each records that occasionally need to be compacted. CREATE TABLE struct_demo
(
id BIGINT,
name STRING,
-- A STRUCT as a top-level column. Demonstrates how the table ID column
-- and the ID field within the STRUCT can coexist without a name conflict.
employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,
-- A STRUCT as the element type of an ARRAY.
places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,
-- A STRUCT as the value portion of the key-value pairs in a MAP.
memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,
-- A STRUCT where one of the fields is another STRUCT.
current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
)
STORED AS PARQUET;
... View more