About Sriparna

Sriparna · ‎08-05-2021

Hi, I have Spark program written in scala which is continuously consuming data from Kafka topic processing and aggregating data. The program uses structured streaming and is supposed to insert data in Hive (non- acid table) partitioned based on 'startdate' field from the data. I tried below to insert data to Hive table. val query = stream_df_agg_tr1.writeStream.format("orc").partitionBy("startdate").option("checkpointLocation","<>:8020/user/sm/hivecheckpt8") .outputMode(OutputMode.Append).start("<>/warehouse/tablespace/external/hive/activecore.db/applicationstats5minute") query.awaitTermination() I tried by specifying the table path using .option("path",<>) as well. I have below properties set (tried in spark program and as spark-shell arguments) . "spark.hadoop.hive.exec.dynamic.partition=true" "spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict" Still, I do not see the partitions getting created automatically. Once I tried with one date-based partition pre-created before inserting data, in that case, data fell into proper partition, although that also did not work afterward. Can you please advise how to solve this issue?

Online	Offline
Last Visited	‎08-05-2021 05:33 PM

Member Since	‎08-05-2021 11:55 AM
Last Visited	‎08-05-2021 05:33 PM
Posts	1

Cloudera Community

Dynamic partitioning not working from Spark stream...