Support Questions

AmineCHERIFI · ‎01-31-2022

Hello All,

I'm using hive warehouse connector in order to read data from an external hive table. This table is partitioned by date dt as in this example:

CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';

this table contains only one partition as shown here :

$ hdfs dfs -ls /mypath/db/test
Found 1 items
drwx------   - amine hadoop          0 2021-05-04 10:41 /mypath/db/test/20210210

When select the dt column using the sql function, it returns the correct result. As shown here :

import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()

hive.sql("select distinct dt from db.test").show
+----------+
|        dt|
+----------+
|2021-02-10|
+----------+

But, when i'm reading the same data using the function table, it returns a wrong result

import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()

+----------+
|        dt|
+----------+
|2021-02-09|
+----------+

As you can see the returned result is 2021-02-09 instead of 2021-02-10.

Looks like it's doing date minus 1 day

Does any body have any idea why i'm getting this error ??

Note: i'm using this version : hive-warehouse-connector-assembly-1.0.0.7.1.5.56-1.jar

Thanks,

Regards,

Amine.

RangaReddy · ‎02-08-2022

Hi @AmineCHERIFI

I am suspecting due to timezone it is causing the issue. To check further, Could you please share sample data what you have created and table structure. We will try to reproduce internally?

Note: Have you tried the do the same logic with out HWC. Please test and share the results as well.

For reading/writing externals tables HWC is not required.

AmineCHERIFI · ‎02-08-2022

hello
Yes i tested without HWC and i'm getting a correct date value.
Yes the HWC is not required for external table, but as we are creating a common lib for all hive tables, we prefered use HWC.

Here are the steps to reproduce the problem:

1- i'm creating a csv file file.csv with those values:
a;1
b;2
c;3

that i save in this path /mypath/db/test/dt=2021-02-10.

2- create an external database :
CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test';

3- start a spark shell with HWC
import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()

+--------------+
| dt|
+--------------+
|2021-02-09|
+--------------+

Cloudera Community

Support Questions

Hive Warehouse connector gives incorrect result in a select query