Reply
New Contributor
Posts: 5
Registered: ‎06-16-2017

Spark Streaming + Kafka + Multiple database tables for output

Hi all,

 

I am just starting to work with Spark Streaming and Kafka and i am trying to find what is the best approach for this kind of scenario:

 

- spark application is running getting Kafka messages

- kafka messages are nested JSON

- after processing the message in spark, some dataframes are created to handle attributes that are objects

 

and here is my doubt, how to dump this rdd's into several database tables keeping in mind that if for some reason one of the rdd gives an error we should rollback all the information.

 

Use staging tables and at the end of processing mark data as valid the data? 

 

What do you think is the best approach?

 

Best regards.

Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Spark Streaming + Kafka + Multiple database tables for output

Hi, You might be able to do this if your destination database and its JDBC driver supports transactions (rollbacks and commits).

 

 

New Contributor
Posts: 5
Registered: ‎06-16-2017

Re: Spark Streaming + Kafka + Multiple database tables for output

Hi vanhalen,

 

I think the JDBC allows transactions, the problem i see is if i can broadcast the transaction to the spark workers. What i am trying to do is to abort everything inserted if something fails.

Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Spark Streaming + Kafka + Multiple database tables for output

I don't think this can be done in Spark. You have to use JDBC API style syntax (import java.sql.* ) for this and wrap your DML statements inside a transaction ie. setAutocommit = false, commit if everything is OK, rollback everything if any one of the DML statements fails.

New Contributor
Posts: 5
Registered: ‎06-16-2017

Re: Spark Streaming + Kafka + Multiple database tables for output

I am trying a different approach but at the end it's similar to what you said.

I will expand the json and create a single dataframe, then dump the dataframe to database and execute a single statement using transactions.

 

What do you think?

 

Thanks.

Best regards.

Highlighted
Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Spark Streaming + Kafka + Multiple database tables for output

Sounds good to me.