Support Questions

Tomas79 · ‎07-26-2016

Hi,

how can I access the position of the element (in my case of the MAP) in the array column?

TABLE DDL:

CREATE TABLE mytable (

id int,

location ARRAY< MAP< string, string> > )

When I query the table with this SQL I got a position of the KEY-VALUE pair INSIDE the MAP, but what I really want to access is the position in the location ARRAY, e.g. 1st location, 2nd location and so on.

select c.id, a.pos, a.key, a.value from mytable c left join c.locations a order by c.id, a.pos;

Thanks

alex.behm · ‎07-29-2016

Still investigating and working on the JIRA. In the meantimg, I think the query that you mean is this:

select t.id, l.pos as location_number, m.key, m.value from mytable t join t.locations l join l.item m order by id, l.pos;

The bug is that those queries returning wrong results are not semantically correct but Impala runs them anyway and gives "arbitrary" results.

View solution in original post

alex.behm · ‎07-26-2016

Hi Thomas,

in your example query the table alias 'a' has a 'pos' pseudo-column that refers to the element-position within the ARRAY, so I think it's indeed what you want.

However, I think there is a bug here that may lead to confusion.

The query you wrote "should" be illegal because you should not be able to access a.key and a.value without referencing the nested map in the FROM clause. I will follow-up with a JIRA, stay tuned.

Can you try running this and see if you get the expected results?

select c.id, a.pos from mytable c left join c.`location` a order by c.id, a.pos;

Alternatively try this query if you also want to explode the items within the nested MAP:

select c.id, a.pos, a.key, a.value from mytable c left join c.`location` a, a.item order by c.id, a.pos;

Tomas79 · ‎07-27-2016

Hi,

thanks for the response, yes it seems to be a bug. You can test it yourself, here are the scripts to reproduce:

(in Hive, "i" is just a dummy table):

create table mytable( id int, locations array< map<string,string>>) stored as parquet;

insert into table mytable select 10001, array( map("Location1.City","Bratislava","Location1.Country","SK","Location1.LAT","41"), map("Location2.City","Kosice","Location2.Country","SK","Location2.LAT","42") ) from i limit 1;

insert into table mytable select 10002, array( map("Location1.City","Wien","Location1.Country","AT","Location1.LAT","40"), map("Location2.City","Graz","Location2.Country","AT","Location2.LAT","40") ) from i limit 1;

Query results in Impala:

These results are wrond, what I need is a location_number -

0 for Location1.City

0 for Location1.Country

0 for Location1.LAT

1 for Location2.City

1 for Location2.Country

1 for Location2.LAT

... and so on for every ID in the table.

What you suggested (adding l.item), returned a wrong result set, because City in the result set is twice, but LAT is missing - this seems to be definitely a BUG:

Tomas79 · ‎07-27-2016

In Hive the solution is:

select id, ix, locations.key, locations.val from ( select id, ix, locs from mytable lateral view posexplode( locations ) x as ix,locs ) tab lateral view explode( locs ) locations as key,val;

which returns the correct result set:

Tomas79 · ‎07-29-2016

Any update on this? Have you created the JIRA?
thanks

alex.behm · ‎07-29-2016

Still investigating and working on the JIRA. In the meantimg, I think the query that you mean is this:

select t.id, l.pos as location_number, m.key, m.value from mytable t join t.locations l join l.item m order by id, l.pos;

The bug is that those queries returning wrong results are not semantically correct but Impala runs them anyway and gives "arbitrary" results.

alex.behm · ‎07-29-2016

Filed the JIRA: https://issues.cloudera.org/browse/IMPALA-3938

Thanks a lot for your detailed report and easy reproduction!

Cloudera Community

Support Questions

Impala Complex types: position in ARRAY of MAP