Created on 08-01-2019 01:33 PM - last edited on 08-02-2019 09:43 AM by VidyaSargur
Hello,
I am trying to load tables from Kudu to HDFS using spark2 and i have noticed that timestamp is off by 8 hours between Kudu and HDFS.
df=spark_session.read.format('org.apache.kudu.spark.kudu')
.option('kudu.master','dcaldd163:7051,dcaldd162:7051,dcaldd161:7051')
.option('kudu.table',"impala::DB.kudu_table_name").load()
df.write.format("parquet").mode('overwrite').saveAsTable("db_name.kudu_table_name")
I have tried to set the timezone locally for the session in Spark2 and still it does not solve the issue.
Can someone give a hint on how to solve this issue?