We recently upgraded to CDH 6.2.1 and we are now having issues with spark-shell seemingly reverting to SQL Server JDBC drivers that exist in the CDH parcels directory during execution. We start spark shell like so: spark-shell --jars /myjars/msql-jdbc-8.2.0.jre8 --conf spark.driver.extraClassPath=myjars/msql-jdbc-8.2.0.jre8 --conf spark.exectuor.extraClassPath=myjars/msql-jdbc-8.2.0.jre8 Then run scala code to connect to SQL server (only included part that shows it connected so driver config is good): ...
scala>val data = spark.read.jdbc(url,dbtbl,conProps)
data: orig.apache.spark.sql.DataFrame = [id: bigint, data_yymm: int ...4 more fields]
scala>data.limit(2).show Then, when the actual read is attempted, the following error is thrown at the console: java.io.InvalidClassException: com.microsoft.sql.server.jdbc.SQLServerException; local class incompatible: stream classdesc serialVersionUID = 6017853943264163411, local class serialVersionUID = -1236842957120113434 The executor fails with the below std error: 20/04/04 10:00:32 ERROR executor.Executor Exception in task 0.0 in state 0.0 (TID 0) com.microsoft.sqlserver.jdbc.SQLServerException: The authenicationScheme NTLM is not valid. NTLM is enabled in the connection string as the above example shows the successful initial connection that retrieves the table structure. Doesn't this indicate that during execution, it's reverting to the older version of the MSSQL jar in the CDH parcels dir that doesn't support NTLM? Are we possibly missing other override startup parameters now required by CDH 6.2 to force executors to use the provided jar? Also thought I would add this same code works fine on CDH 5.14. Thanks in advance.
... View more