Support Questions

Find answers, ask questions, and share your expertise

Convert columns to rows

avatar
New Contributor

Hi Team,

 

I want to convert columns as rows as per user requirement. I know that this can be done using UNION ALL but I have 60 columns need to be converted. If I go with UNION ALL then I would need to write 60 UNION ALLs which will impact the performance.

Is there any direct way of converting columns to rows like using any functions etc.,

 

Untitled.png

3 ACCEPTED SOLUTIONS

avatar

Hi, you dont have to union 60 times,  you can do this:

select 
t.rowid,
t.orderdate,
t.shipmode,
t.customername,
t.state,
m.metric,
case m.metric
when 'sales' then t.sales
when 'quantity' then t.quantity
end value from mytable t
cross join (
select 'sales' metric union all
select 'quantity' metric
) m

 

 

View solution in original post

avatar
New Contributor

Hello Tomas79,

 

Yes, it looks good.

Since we go with cross join doesn't it effect the performance?

If I have 100000 records then if I go with 10 UNION ALL then the count of records will go to 1million.

Is there any suggestion please to improve the performance with this approach?

View solution in original post

avatar
You can materialize this metric table with 60 rows, and then Impala will broadcast it to every node (or Hive will run a map-side join) so it will not affect the shuffling over network.
However the result set will be N times larger, but that's the point of the query right? (produce from 100k with 10 metrics a 1M table)

View solution in original post

3 REPLIES 3

avatar

Hi, you dont have to union 60 times,  you can do this:

select 
t.rowid,
t.orderdate,
t.shipmode,
t.customername,
t.state,
m.metric,
case m.metric
when 'sales' then t.sales
when 'quantity' then t.quantity
end value from mytable t
cross join (
select 'sales' metric union all
select 'quantity' metric
) m

 

 

avatar
New Contributor

Hello Tomas79,

 

Yes, it looks good.

Since we go with cross join doesn't it effect the performance?

If I have 100000 records then if I go with 10 UNION ALL then the count of records will go to 1million.

Is there any suggestion please to improve the performance with this approach?

avatar
You can materialize this metric table with 60 rows, and then Impala will broadcast it to every node (or Hive will run a map-side join) so it will not affect the shuffling over network.
However the result set will be N times larger, but that's the point of the query right? (produce from 100k with 10 metrics a 1M table)