Support Questions

Find answers, ask questions, and share your expertise

Impala SQL for KUDU does not work

avatar
Contributor

CDH 5.14

 

CREATE TABLE kudu_db.fact_patientencounter_kudu
PRIMARY KEY (pk_PatientEncounterID int, fk_AgencyID int)
PARTITION BY HASH(fk_AgencyID) PARTITIONS 4
STORED AS KUDU
AS SELECT pk_PatientEncounterID, fk_AgencyID FROM default.fact_patientencounter;

 

Why does this CREATE TABLE statement not work? This documentation says it should work - https://kudu.apache.org/docs/kudu_impala_integration.html

 

CREATE TABLE AS SELECT

You can create a table by querying any other table or tables in Impala, using a CREATE TABLE …​ AS SELECT statement. The following example imports all rows from an existing table old_table into a Kudu table new_table. The names and types of columns in new_table will determined from the columns in the result set of the SELECT statement. Note that you must additionally specify the primary key and partitioning.

CREATE TABLE new_table
PRIMARY KEY (ts, name)
PARTITION BY HASH(name) PARTITIONS 8
STORED AS KUDU
AS SELECT ts, name, value FROM old_table;

 

1 ACCEPTED SOLUTION

avatar

Looks like https://issues.apache.org/jira/browse/IMPALA-6454 which was recently fixed.

 

The problem is with the casing of primary key or partition column names. Should work if you convert all column names to lower case in your CTAS.

View solution in original post

7 REPLIES 7

avatar
Contributor

Hard to say for certain without the error message, but it's likely that the parser doesn't like the column types being supplied in the PRIMARY KEY clause.

avatar
Contributor

This is the actual query

CREATE TABLE kudu_db.fact_patientencounter_kudu
PRIMARY KEY (pk_PatientEncounterID, fk_AgencyID)
PARTITION BY HASH(fk_AgencyID) PARTITIONS 3
STORED AS KUDU
AS SELECT pk_PatientEncounterID, fk_AgencyID FROM default.fact_patientencounter

 

I dont see any errors in the log.

 

This is displayed


ERROR: AnalysisException: null
CAUSED BY: IllegalStateException: null

avatar
Contributor

Let me ask the question better, where should I be looking for logs?

 

cdh1 and cdh2 are masters

cdh3-6 are tablet servers

 

Running from impalad on cdh4.

 

The master logs dont show any errors for this transaction.

avatar
Contributor

I had to do this in many steps, why does CTAS option not work?

 

DROP TABLE IF EXISTS kudu_db.fact_patientencounter_kudu;

CREATE TABLE kudu_db.fact_patientencounter_kudu
(
pk_PatientEncounterID int,
fk_AgencyID int,
measure_ChuteTime_TimeInSeconds bigint,
measure_Response_TimeInSeconds bigint,
PRIMARY KEY(pk_PatientEncounterID, fk_AgencyID)
)
PARTITION BY HASH(fk_AgencyID) PARTITIONS 3
STORED AS KUDU;

 

INSERT INTO kudu_db.fact_patientencounter_kudu
SELECT pk_PatientEncounterID, fk_AgencyID, measure_ChuteTime_TimeInSeconds, measure_Response_TimeInSeconds FROM default.fact_patientencounter;

avatar
Contributor

Hi AKB,

 

Nothing is expected to show up in Kudu logs, because the failure is occurring in the Impala query analysis stage.  I haven't been able to reproduce something similar using CDH 5.14, for instance the following queries complete successfully:

 

CREATE TABLE fact_patientencounter
(
pk_PatientEncounterID int,
fk_AgencyID int,
measure_ChuteTime_TimeInSeconds bigint,
measure_Response_TimeInSeconds bigint
)
STORED AS PARQUET;

CREATE TABLE fact_patientencounter_kudu
PRIMARY KEY (pk_PatientEncounterID, fk_AgencyID)
PARTITION BY HASH(fk_AgencyID) PARTITIONS 3
STORED AS KUDU
AS SELECT pk_PatientEncounterID, fk_AgencyID FROM fact_patientencounter;

@tmarshalldo you have any ideas about where that null analysis exception might be coming from?

avatar

Looks like https://issues.apache.org/jira/browse/IMPALA-6454 which was recently fixed.

 

The problem is with the casing of primary key or partition column names. Should work if you convert all column names to lower case in your CTAS.

avatar
Contributor

DROP TABLE IF EXISTS kudu_db.fact_patientencounter_kudu;

CREATE TABLE kudu_db.fact_patientencounter_kudu
PRIMARY KEY (pk_patientencounterid, fk_agencyid)
PARTITION BY HASH(fk_agencyid) PARTITIONS 3
STORED AS KUDU
AS SELECT pk_patientencounterid, fk_agencyid FROM default.fact_patientencounter

 

The above does work. Thanks for the quick replies on this.