Created 01-31-2022 03:05 AM
Hello All,
I'm using hive warehouse connector in order to read data from an external hive table. This table is partitioned by date dt as in this example:
CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';
this table contains only one partition as shown here :
$ hdfs dfs -ls /mypath/db/test
Found 1 items
drwx------ - amine hadoop 0 2021-05-04 10:41 /mypath/db/test/20210210
When select the dt column using the sql function, it returns the correct result. As shown here :
import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.sql("select distinct dt from db.test").show
+----------+
| dt|
+----------+
|2021-02-10|
+----------+
But, when i'm reading the same data using the function table, it returns a wrong result
import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()
+----------+
| dt|
+----------+
|2021-02-09|
+----------+
As you can see the returned result is 2021-02-09 instead of 2021-02-10.
Looks like it's doing date minus 1 day
Does any body have any idea why i'm getting this error ??
Note: i'm using this version : hive-warehouse-connector-assembly-1.0.0.7.1.5.56-1.jar
Thanks,
Regards,
Amine.
Created on 02-08-2022 04:12 AM - edited 02-08-2022 04:13 AM
I am suspecting due to timezone it is causing the issue. To check further, Could you please share sample data what you have created and table structure. We will try to reproduce internally?
Note: Have you tried the do the same logic with out HWC. Please test and share the results as well.
For reading/writing externals tables HWC is not required.
Created 02-08-2022 05:16 AM
hello
Yes i tested without HWC and i'm getting a correct date value.
Yes the HWC is not required for external table, but as we are creating a common lib for all hive tables, we prefered use HWC.
Here are the steps to reproduce the problem:
1- i'm creating a csv file file.csv with those values:
a;1
b;2
c;3
that i save in this path /mypath/db/test/dt=2021-02-10.
2- create an external database :
CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';
3- start a spark shell with HWC
import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()
+--------------+
| dt|
+--------------+
|2021-02-09|
+--------------+