11-08-2017 12:04 PM - edited 11-08-2017 12:06 PM
I am attempting to join two tables - one of which contains complex types. I am not trying to join on the complex type field.
For example, table t1 has fields (id int, first_name string, last_name string) and table t2 has fields (idNumber int, scores array<int>). I would like a table with the columns t1.id, t1.first_name, t1.last_name, t2.scores. Is it possible to join t1 to t2 on t1.id=t2.idNumber and get the resulting table with those four columns?
The only way I know how is using a CREATE TABLE AS (SELECT ...), but I can't include complex types in the select list.
11-08-2017 04:28 PM
It sounds like you are trying to CREATE a new table with complex types and then INSERT into that new table (or use a shorthand like CREATE TABLE AS SELECT). I'm afraid Impala currently cannot INSERT into tables with complex types.
See "Limitations and Restrictions for Complex Types" here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html
If you really don't want to INSERT but want to perform a query against the join result, then I think Impala should be able to do that in most cases.
11-09-2017 08:56 AM - edited 11-09-2017 08:57 AM
That's exactly what I'm trying to do. Really, I just want to perform a join of these two datasets and write them to a parquet file for later processing.
I currently have a Spark job running to perform this, but was hoping to compare performance between the join in Spark and the join in Impala.