Created 07-17-2018 12:35 PM
Hi,
The below code is not working in Spark 2.3 , but its working in 1.7.
Can someone modify the code as per Spark 2.3
import os
from pyspark import SparkConf,SparkContext
from pyspark.sql import HiveContext
conf = (SparkConf() .setAppName("data_import") .set("spark.dynamicAllocation.enabled","true") .set("spark.shuffle.service.enabled","true"))
sc = SparkContext(conf = conf)
sqlctx = HiveContext(sc)
df = sqlctx.load( source="jdbc", url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", dbtable="test")
## this is how to write to an ORC file df.write.format("orc").save("/tmp/orc_query_output")
## this is how to write to a hive table df.write.mode('overwrite').format('orc').saveAsTable("test")
Error : AttributeError: 'HiveContext' object has no attribute 'load'
Created 07-17-2018 12:51 PM
In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:
from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession \ .builder \ .appName("data_import") \ .config("spark.dynamicAllocation.enabled", "true") \ .config("spark.shuffle.service.enabled", "true") \ .enableHiveSupport() \ .getOrCreate() jdbcDF2 = spark.read \ .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")
More information and examples on this link:
https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases
Please let me know if that works for you.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-17-2018 12:51 PM
In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:
from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession \ .builder \ .appName("data_import") \ .config("spark.dynamicAllocation.enabled", "true") \ .config("spark.shuffle.service.enabled", "true") \ .enableHiveSupport() \ .getOrCreate() jdbcDF2 = spark.read \ .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")
More information and examples on this link:
https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases
Please let me know if that works for you.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-17-2018 01:03 PM
Thanks Felix for your quick response. It worked. Thanks a lot.
Created 07-18-2018 11:59 AM
@Felix Albani There is still some issue. Tables were exist in hive but I am not able to access it. Its showing below error while I am doing a select * from table.
<small>java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1531811351810_0064_1_00, diagnostics=[Task failed, taskId=task_1531811351810_0064_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) <strong>Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet</strong> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)</small>
Created 07-18-2018 01:17 PM
@Deb This looks to be related to parquet way for coding being different in spark than in hive. Have you tried reading a different non parquet table?
Try adding the following configuration for the parquet table:
.config("spark.sql.parquet.writeLegacyFormat","true")
If that does not work please open a new thread on this issue and we can follow up on this new thread.
Thanks!