Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Dynamic partitioning not working from Spark streaming to Hive (ORC format)

avatar
New Contributor

Hi,

I have Spark program written in scala which is continuously consuming data from Kafka topic processing and aggregating data. The program uses structured streaming and is supposed to insert data in Hive (non- acid table) partitioned based on 'startdate' field from the data.

I tried below to insert data to Hive table.

 

val query = stream_df_agg_tr1.writeStream.format("orc").partitionBy("startdate").option("checkpointLocation","<>:8020/user/sm/hivecheckpt8")
.outputMode(OutputMode.Append).start("<>/warehouse/tablespace/external/hive/activecore.db/applicationstats5minute")
query.awaitTermination()

 

I tried by specifying the table path using .option("path",<>) as well. I have below properties set (tried in spark program and as spark-shell arguments) .

"spark.hadoop.hive.exec.dynamic.partition=true"  "spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict"

 

Still, I do not see the partitions getting created automatically. Once I tried with one date-based partition pre-created before inserting data, in that case, data fell into proper partition, although that also did not work afterward.

Can you please advise how to solve this issue?

1 ACCEPTED SOLUTION

avatar
Moderator

Hi @Sriparna ,

thank you for reaching out to Cloudera Community.

I understand that you would like to consume from Kafka topic to Hive table using Structured Streaming and you run into some issues.

 

I've found a Community Article that looks related. Have you seen this?

 

Please note, if you consider using Continuous Processing, it is not supported as it is still in an experimental state.

 

Hope these pointers take you closer to a solution!


Best regards:

Ferenc


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

View solution in original post

3 REPLIES 3

avatar
Moderator

Hi @Sriparna ,

thank you for reaching out to Cloudera Community.

I understand that you would like to consume from Kafka topic to Hive table using Structured Streaming and you run into some issues.

 

I've found a Community Article that looks related. Have you seen this?

 

Please note, if you consider using Continuous Processing, it is not supported as it is still in an experimental state.

 

Hope these pointers take you closer to a solution!


Best regards:

Ferenc


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Community Manager

@Sriparna Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  I



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hi @Sriparna!

 

Hope you are doing good! I recently encountered this same issue as you did, I have tried many different approaches but none of them worked, and I was just wondering if you could share your solution or what did you do to make it work? It is maybe some configuration issues with Spark? Thanks a lot in advance!

 

Best,

Dunk