Member since
01-31-2022
2
Posts
0
Kudos Received
0
Solutions
02-08-2022
05:16 AM
hello Yes i tested without HWC and i'm getting a correct date value. Yes the HWC is not required for external table, but as we are creating a common lib for all hive tables, we prefered use HWC. Here are the steps to reproduce the problem: 1- i'm creating a csv file file.csv with those values: a;1 b;2 c;3 that i save in this path /mypath/db/test/dt=2021-02-10. 2- create an external database : CREATE EXTERNAL TABLE db.test ( col1 string, col2 string) PARTITIONED BY (dt date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/mypath/db/test'; 3- start a spark shell with HWC import com.hortonworks.hwc.HiveWarehouseSession val hive=HiveWarehouseSession.session(spark).build() hive.table("db.test").select("dt").distinct().show() +--------------+ | dt| +--------------+ |2021-02-09| +--------------+
... View more
01-31-2022
03:05 AM
Hello All, I'm using hive warehouse connector in order to read data from an external hive table. This table is partitioned by date dt as in this example: CREATE EXTERNAL TABLE db.test
( col1 string, col2 string)
PARTITIONED BY (dt date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/db/test'; this table contains only one partition as shown here : $ hdfs dfs -ls /mypath/db/test
Found 1 items
drwx------ - amine hadoop 0 2021-05-04 10:41 /mypath/db/test/20210210 When select the dt column using the sql function, it returns the correct result. As shown here : import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.sql("select distinct dt from db.test").show
+----------+
| dt|
+----------+
|2021-02-10|
+----------+ But, when i'm reading the same data using the function table, it returns a wrong result import com.hortonworks.hwc.HiveWarehouseSession
val hive=HiveWarehouseSession.session(spark).build()
hive.table("db.test").select("dt").distinct().show()
+----------+
| dt|
+----------+
|2021-02-09|
+----------+ As you can see the returned result is 2021-02-09 instead of 2021-02-10. Looks like it's doing date minus 1 day Does any body have any idea why i'm getting this error ?? Note: i'm using this version : hive-warehouse-connector-assembly-1.0.0.7.1.5.56-1.jar Thanks, Regards, Amine.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark