Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kudu to HDFS data load timestamp issue.

SOLVED Go to solution

Kudu to HDFS data load timestamp issue.

New Contributor

Hello,

I am trying to load tables from Kudu to HDFS using spark2 and i have noticed that timestamp is off by 8 hours between Kudu and HDFS. 

 

df=spark_session.read.format('org.apache.kudu.spark.kudu')
.option('kudu.master','dcaldd163:7051,dcaldd162:7051,dcaldd161:7051')
.option('kudu.table',"impala::DB.kudu_table_name").load()

 

df.write.format("parquet").mode('overwrite').saveAsTable("db_name.kudu_table_name")

 

I have tried to set the timezone locally for the session in Spark2 and still it does not solve the issue. 

 

Can someone give a hint on how to solve this issue? 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Kudu to HDFS data load timestamp issue.

Explorer

hello @GopiG ,
have you tried setting the executor's and the driver's params in spark-defaults.conf ?

spark.driver.extraJavaOptions -Duser.timezone=UTC
spark.executor.extraJavaOptions -Duser.timezone=UTC


you can set the default time zone UTC or any example you want like GMT+8 etc...

 cheers.

2 REPLIES 2
Highlighted

Re: Kudu to HDFS data load timestamp issue.

Explorer

hello @GopiG ,
have you tried setting the executor's and the driver's params in spark-defaults.conf ?

spark.driver.extraJavaOptions -Duser.timezone=UTC
spark.executor.extraJavaOptions -Duser.timezone=UTC


you can set the default time zone UTC or any example you want like GMT+8 etc...

 cheers.

Re: Kudu to HDFS data load timestamp issue.

New Contributor

Thank you so much for your response. 

 

Unfortunately the solution did not work for me. 

Cloudera Version -> CDH-5.16.1-1.cdh5.16.1.p0.3

spark version  ->  2.3.0


Instead of making changes in spark-defaults.conf file, i have passed the executor's and the driver's params along spark2 submit command. 

I have tried it with UTC, UTC+8, GMT+8 and America/Los_Angeles timezone , but none of them changed the time in date portion.

I have copied the entire spark2 submit command for your reference.

 

===========================================================================

command = "spark2-submit --deploy-mode cluster --master yarn --executor-memory " + executor_memory + \
" --name " + job_name + " --executor-cores " + executor_cores + " --driver-memory " + driver_memory \
+ " --conf spark.dynamicAllocation.initialExecutors=" + num_executors \
+ " --conf spark.dynamicAllocation.minExecutors=2" \
+ " --conf spark.dynamicAllocation.maxExecutors=" + str(max_executor) \
+ " --py-files " + utils_file + "," + module_name \
+ " --conf spark.dynamicAllocation.executorIdleTimeout=10" \
+ " --conf spark.serializer=org.apache.spark.serializer.KryoSerializer" \
+ " --conf spark.task.maxFailures=14" \
+ " --conf spark.port.maxRetries=50" \
+ " --conf spark.yarn.max.executor.failures=14" \
+ " --conf spark.executor.memoryOverhead=2000" \
+ " --conf spark.yarn.maxAppAttempts=1" \
+ " --packages org.apache.kudu:kudu-spark2_2.11:1.6.0 "

command += " --files {4},{1},{5},{7} --conf spark.executor.extraJavaOptions=\'-Dlog4j.configuration={6} -Duser.timezone=UTC+8\' --conf spark.driver.extraJavaOptions=\'-Dlog4j.configuration={6} -Duser.timezone=UTC+8\' {0} {3} {2}".format(PROCESS_HANDLER_FILE_PATH, CONFIG_FILE_PATH, job_name, os.path.basename(CONFIG_FILE_PATH), process_csv, log4j_file, os.path.basename(log4j_file), module_base_table_path)

===========================================================================

 

After submitting the above command, i could see it setting params properly from SPARK properties of YARN. Below lines are copied from Spark Properties while the job is running.

 

spark.executor.extraJavaOptions -Dlog4j.configuration=spark2_log4j.properties -Duser.timezone=UTC+8
spark.executor.extraJavaOptions -Dlog4j.configuration=spark2_log4j.properties -Duser.timezone=UTC+8

 

Appreciate your response.