Support Questions

Freakabhi · ‎09-12-2018

All,

I am working on writing to RDBMS ( Sql Server ) using Spark from hive, process works with great speed.

But there is big issue, that each tasks until completes does not commits - which utilizes transaction log of the database and can cause impacts to other running jobs.

Need to have some way to commits at regular interval ( 10000 K or so) .

Can someone please suggest how this can be done??

Spark version : 2.2

SQL Server 2016

Thanks

freakabhi

edu_vikassri · ‎09-14-2018

Hi @Abhijeet Rajput,

Did you tried like this ?

dfOrders.write.mode("overwrite").format("jdbc") .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .option("url", "jdbc:sqlserver://server.westus.cloudapp.azure.com;databaseName=TestDB") .option("dbtable", "TestDB.dbo.orders") .option("user", "myuser") .option("batchsize","200000") .option("password","MyComplexPassword!001").save()

Thanks

Vikas Srivastava

Cloudera Community

Support Questions

Spark - JDBC intermediate Commits - urgent

Spark 3 legacy configurations list ( Spark 2 behav...

Password secure way to use Spark JDBC

Spark Python Supportability Matrix

Spark and Java versions Supportability Matrix

Spark Scala Version Compatibility Matrix

Spark with HIVE JDBC connection

How to Connect to Hiveserver2 Using Cloudera JDBC ...

Support for Hive DatabaseType in JDBC Storage Hand...

HiveServer2 JDBC Connection URL Examples

Data Integrity check using Spark JDBC with Encrypt...