Support Questions

Freakabhi · ‎09-12-2018

All,

I am working on writing to RDBMS ( Sql Server ) using Spark from hive, process works with great speed.

But there is big issue, that each tasks until completes does not commits - which utilizes transaction log of the database and can cause impacts to other running jobs.

Need to have some way to commits at regular interval ( 10000 K or so) .

Can someone please suggest how this can be done??

Spark version : 2.2

SQL Server 2016

Thanks

freakabhi

edu_vikassri · ‎09-14-2018

Hi @Abhijeet Rajput,

Did you tried like this ?

dfOrders.write.mode("overwrite").format("jdbc") .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .option("url", "jdbc:sqlserver://server.westus.cloudapp.azure.com;databaseName=TestDB") .option("dbtable", "TestDB.dbo.orders") .option("user", "myuser") .option("batchsize","200000") .option("password","MyComplexPassword!001").save()

Thanks

Vikas Srivastava

Cloudera Community

Support Questions

Spark - JDBC intermediate Commits - urgent