Support Questions

debananda_sahoo · ‎07-17-2018

Hi,

The below code is not working in Spark 2.3 , but its working in 1.7.

Can someone modify the code as per Spark 2.3

import os

from pyspark import SparkConf,SparkContext

from pyspark.sql import HiveContext

conf = (SparkConf() .setAppName("data_import") .set("spark.dynamicAllocation.enabled","true") .set("spark.shuffle.service.enabled","true"))

sc = SparkContext(conf = conf)

sqlctx = HiveContext(sc)

df = sqlctx.load( source="jdbc", url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", dbtable="test")

## this is how to write to an ORC file df.write.format("orc").save("/tmp/orc_query_output")

## this is how to write to a hive table df.write.mode('overwrite').format('orc').saveAsTable("test")

Error : AttributeError: 'HiveContext' object has no attribute 'load'

falbani · ‎07-17-2018

@Debananda Sahoo

In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:

from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession \
    .builder \
    .appName("data_import") \
    .config("spark.dynamicAllocation.enabled", "true") \
    .config("spark.shuffle.service.enabled", "true") \
    .enableHiveSupport() \
    .getOrCreate()

jdbcDF2 = spark.read \
    .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")

More information and examples on this link:

https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases

Please let me know if that works for you.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

falbani · ‎07-17-2018