Support Questions

debananda_sahoo · ‎07-17-2018

Hi,

The below code is not working in Spark 2.3 , but its working in 1.7.

Can someone modify the code as per Spark 2.3

import os

from pyspark import SparkConf,SparkContext

from pyspark.sql import HiveContext

conf = (SparkConf() .setAppName("data_import") .set("spark.dynamicAllocation.enabled","true") .set("spark.shuffle.service.enabled","true"))

sc = SparkContext(conf = conf)

sqlctx = HiveContext(sc)

df = sqlctx.load( source="jdbc", url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", dbtable="test")

## this is how to write to an ORC file df.write.format("orc").save("/tmp/orc_query_output")

## this is how to write to a hive table df.write.mode('overwrite').format('orc').saveAsTable("test")

Error : AttributeError: 'HiveContext' object has no attribute 'load'

falbani · ‎07-17-2018

@Debananda Sahoo

In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:

from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession \
    .builder \
    .appName("data_import") \
    .config("spark.dynamicAllocation.enabled", "true") \
    .config("spark.shuffle.service.enabled", "true") \
    .enableHiveSupport() \
    .getOrCreate()

jdbcDF2 = spark.read \
    .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")

More information and examples on this link:

https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases

Please let me know if that works for you.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

falbani · ‎07-17-2018

@Debananda Sahoo

In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:

from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession \
    .builder \
    .appName("data_import") \
    .config("spark.dynamicAllocation.enabled", "true") \
    .config("spark.shuffle.service.enabled", "true") \
    .enableHiveSupport() \
    .getOrCreate()

jdbcDF2 = spark.read \
    .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")

More information and examples on this link:

https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases

Please let me know if that works for you.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

debananda_sahoo · ‎07-17-2018

Thanks Felix for your quick response. It worked. Thanks a lot.

Cloudera Community

Support Questions

AttributeError in Spark 2.3

Installing Spark 1.6 on HDP 2.3.x

Kafka 2.3 Performance testing

Apache Spark and Iceberg Supportability Matrix

Spark 2.3 Csd failed to load

pyspark using Spark 2.3

Executing TensorFlow Classifications from Apache N...

Spark Streaming Graceful Shutdown - Part1

Index Documents using HDPSearch in HDP 2.3

EMC Isilon HDP 2.3 and Ambari 2.1 Installation Gui...

PyCharm and Spark Connect Quickstart in Cloudera D...