Reply
Explorer
Posts: 20
Registered: ‎03-31-2017

Joining Two Tables - One with Complex Types

[ Edited ]

I am attempting to join two tables - one of which contains complex types. I am not trying to join on the complex type field.

 

For example, table t1 has fields (id int, first_name string, last_name string) and table t2 has fields (idNumber int, scores array<int>). I would like a table with the columns t1.id, t1.first_name, t1.last_name, t2.scores. Is it possible to join t1 to t2 on t1.id=t2.idNumber and get the resulting table with those four columns?

 

The only way I know how is using a CREATE TABLE AS (SELECT ...), but I can't include complex types in the select list.

 

Cloudera Employee
Posts: 287
Registered: ‎10-16-2013

Re: Joining Two Tables - One with Complex Types

It sounds like you are trying to CREATE a new table with complex types and then INSERT into that new table (or use a shorthand like CREATE TABLE AS SELECT). I'm afraid Impala currently cannot INSERT into tables with complex types.

 

See "Limitations and Restrictions for Complex Types" here:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html

 

If you really don't want to INSERT but want to perform a query against the join result, then I think Impala should be able to do that in most cases.

 

Highlighted
Explorer
Posts: 20
Registered: ‎03-31-2017

Re: Joining Two Tables - One with Complex Types

[ Edited ]

That's exactly what I'm trying to do. Really, I just want to perform a join of these two datasets and write them to a parquet file for later processing.

 

I currently have a Spark job running to perform this, but was hoping to compare performance between the join in Spark and the join in Impala.

Announcements