Member since
10-02-2018
3
Posts
0
Kudos Received
0
Solutions
10-04-2018
04:49 PM
Hey @Naresh P R , thanks for the response. HIVE-6348 should solve the issue by removing the order by from the subquery, however the ticket seems to be treating it as an optimization. Would you know why the data corruption is happening?
... View more
10-03-2018
01:13 AM
EDIT : A more minimal failure case at the bottom
I have a table with definition
CREATE TABLE `testtable`(
`color_gbakc2` string,
`noq_empty_gbakc3` bigint,
`color_gbakc1` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://<ip>:8020/apps/hive/warehouse/testtable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='1',
The data in it looks like
select * from testTable
testtable.color_gbakc2,testtable.noq_empty_gbakc3,testtable.color_gbakc1
"",45456,
Black,15681,Black
Blue,6203,Blue
Multi,6196,Multi
Red,7716,Red
Silver,5408,Silver
White,908,White
Yellow,7599,Yellow
I have a working query
SELECT t_6.noq_empty noq_empty,
t_6.color color
FROM
(SELECT t_5.color_gbakc2 color,
t_5.noq_empty_gbakc3 noq_empty
FROM
(SELECT testtable.color_gbakc2 color_gbakc2,
testtable.noq_empty_gbakc3 noq_empty_gbakc3,
testtable.color_gbakc1 color_gbakc1
FROM testtable testtable
WHERE testtable.color_gbakc2 IN ('Red',
'Blue',
'Green') ) t_5
WHERE t_5.color_gbakc2 IN ('Red',
'Blue')
ORDER BY noq_empty ASC ) t_6
It returns the expected results
noq_empty
color
6203
Blue
7716
Red
But when I add an order by clause I get unexpected result
SELECT t_6.noq_empty noq_empty,
t_6.color color
FROM
(SELECT t_5.color_gbakc2 color,
t_5.noq_empty_gbakc3 noq_empty
FROM
(SELECT testtable.color_gbakc2 color_gbakc2,
testtable.noq_empty_gbakc3 noq_empty_gbakc3,
testtable.color_gbakc1 color_gbakc1
FROM testtable testtable
WHERE testtable.color_gbakc2 IN ('Red',
'Blue',
'Green') ) t_5
WHERE t_5.color_gbakc2 IN ('Red',
'Blue')
ORDER BY noq_empty ASC ) t_6
ORDER BY color
I get un expected results. Please see the attachment. Running HDP 2.6.5. Is this a known issue? ******EDIT 1******** NOT WORKING SELECT
t6.amt amt, t6.color color
FROM
(SELECT t5.color color,
t5.c1 amt
FROM
(SELECT
t1.c1 c1,
t1.c2 AS color from
(SELECT 7716 AS c1, "Red" AS c2
UNION SELECT 6203 AS c1, "Blue" AS c2) t1
WHERE t1.c2 IN ('Red',
'Blue',
'Green')) t5
WHERE t5.color IN ('Red',
'Blue')
ORDER BY amt ASC) t6
ORDER BY color
Changing the order of columns in outer most select WORKS SELECT
t6.color color,t6.amt amt
FROM
(SELECT t5.color color,
t5.c1 amt
FROM
(SELECT
t1.c1 c1,
t1.c2 AS color from
(SELECT 7716 AS c1, "Red" AS c2
UNION SELECT 6203 AS c1, "Blue" AS c2) t1
WHERE t1.c2 IN ('Red',
'Blue',
'Green')) t5
WHERE t5.color IN ('Red',
'Blue')
ORDER BY amt ASC) t6
ORDER BY color
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez