Support Questions

dataGihan · ‎02-08-2022

We are trying to write Spark Structure Stream to Partitioned table. Previously we had accomplished this by using the HWC, but now we are trying to use the toTable() function newly added from Spark 3.1.

df.writeStream.format("parquet").option("checkpointLocation", "path/").toTable("database.new_table")

The above code worked perfectly, and the data is written to the new_table in the database. Please note that the new_able is created through this process. However, when we were trying to write to a partitioned hive table using the below code segment, it did not write anything to the table, but it ran without any error.

df.writeStream.format("parquet").option("checkpointLocation", "path/").partitionBy("col_1", "col_2").toTable("database.new_partitioned_table")

The col_1 and col_2 are columns available in the df.

Please let me know if someone knows where I made the mistake or from where we can get additional reference materials about the toTable() function.

Anrygzhang · ‎08-17-2022

I have the same question to ask，how to use spark3 structured streaming to write hive partition dynamical table.

Cloudera Community

Support Questions

Writing Structured Streaming to Partitioned Hive table using toTable

Spark Structured Streaming example with CDE

Writing parquet on HDFS using Spark Streaming

Memory usage of state in Spark Structured Streamin...

HIVE - Duplicate table and merge partitions from ...

Hive Streaming Compaction

RDBMS to Hive using NiFi (small-medium tables)

Lessons learnt from nifi streaming data to hive tr...

Spak structured streaming job failed

Apache Deep Learning 101: Using Apache MXNet with ...

Solace Integration with Spark Structured Streaming