Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Warehouse connector gives incorrect result in a select query

avatar
New Contributor

Hello All, 

 

I'm using hive warehouse connector in order to read data from an external hive table. This table is partitioned by date dt as in this example: 

 

 

 

CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';

 

 

 

this table contains only one partition as shown here : 

 

 

 

$ hdfs dfs -ls /mypath/db/test
Found 1 items
drwx------   - amine hadoop          0 2021-05-04 10:41 /mypath/db/test/20210210

 

 

 

 

When select the dt column using the sql function, it returns the correct result. As shown here :

 

 

 

import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()

hive.sql("select distinct dt from db.test").show
+----------+
|        dt|
+----------+
|2021-02-10|
+----------+

 

 

 

 

But, when i'm reading the same data using the function table, it returns a wrong result 

 

 

 

import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()

+----------+
|        dt|
+----------+
|2021-02-09|
+----------+

 

 

 

 As you can see the returned result is 2021-02-09 instead of 2021-02-10. 

Looks like it's doing date minus 1 day

 

Does any body have any idea why i'm getting this error ?? 

 

Note: i'm using this version : hive-warehouse-connector-assembly-1.0.0.7.1.5.56-1.jar

 

Thanks, 

Regards,

Amine.

2 REPLIES 2

avatar
Super Collaborator

Hi @AmineCHERIFI 

I am suspecting due to timezone it is causing the issue. To check further, Could you please share sample data what you have created and table structure. We will try to reproduce internally?

 

Note: Have you tried the do the same logic with out HWC. Please test and share the results as well.

 

For reading/writing externals tables HWC is not required. 

avatar
New Contributor

hello
Yes i tested without HWC and i'm getting a correct date value.
Yes the HWC is not required for external table, but as we are creating a common lib for all hive tables, we prefered use HWC.

Here are the steps to reproduce the problem:

1- i'm creating a csv file file.csv with those values:
a;1
b;2
c;3

that i save in this path /mypath/db/test/dt=2021-02-10.

2- create an external database :
CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';


3- start a spark shell with HWC
import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()

+--------------+
| dt|
+--------------+
|2021-02-09|
+--------------+