<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: AttributeError in Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185734#M80829</link>
    <description>&lt;P&gt;Thanks Felix for your quick response. It worked. Thanks a lot.&lt;/P&gt;</description>
    <pubDate>Tue, 17 Jul 2018 20:03:42 GMT</pubDate>
    <dc:creator>debananda_sahoo</dc:creator>
    <dc:date>2018-07-17T20:03:42Z</dc:date>
    <item>
      <title>AttributeError in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185732#M80827</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;The below code is not working in Spark 2.3 , but its working in 1.7.&lt;/P&gt;&lt;P&gt;Can someone modify the code as per Spark 2.3&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;import os&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;from pyspark import SparkConf,SparkContext&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;from pyspark.sql import HiveContext&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;conf = (SparkConf() .setAppName("data_import") .set("spark.dynamicAllocation.enabled","true") .set("spark.shuffle.service.enabled","true"))&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;sc = SparkContext(conf = conf)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;sqlctx = HiveContext(sc)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;df = sqlctx.load( source="jdbc", url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", dbtable="test")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;## this is how to write to an ORC file df.write.format("orc").save("/tmp/orc_query_output")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;## this is how to write to a hive table df.write.mode('overwrite').format('orc').saveAsTable("test")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Error : AttributeError: 'HiveContext' object has no attribute 'load'&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jul 2018 19:35:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185732#M80827</guid>
      <dc:creator>debananda_sahoo</dc:creator>
      <dc:date>2018-07-17T19:35:32Z</dc:date>
    </item>
    <item>
      <title>Re: AttributeError in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185733#M80828</link>
      <description>&lt;P&gt; &lt;A href="https://community.hortonworks.com/questions/202618/attributeerror-in-spark-23.html#"&gt;@Debananda Sahoo&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code:&lt;/P&gt;&lt;PRE&gt;from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession \
    .builder \
    .appName("data_import") \
    .config("spark.dynamicAllocation.enabled", "true") \
    .config("spark.shuffle.service.enabled", "true") \
    .enableHiveSupport() \
    .getOrCreate()

jdbcDF2 = spark.read \
    .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test")&lt;/PRE&gt;&lt;P&gt;More information and examples on this link:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases" target="_blank"&gt;https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please let me know if that works for you.&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;&lt;P&gt;*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jul 2018 19:51:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185733#M80828</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-07-17T19:51:59Z</dc:date>
    </item>
    <item>
      <title>Re: AttributeError in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185734#M80829</link>
      <description>&lt;P&gt;Thanks Felix for your quick response. It worked. Thanks a lot.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jul 2018 20:03:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185734#M80829</guid>
      <dc:creator>debananda_sahoo</dc:creator>
      <dc:date>2018-07-17T20:03:42Z</dc:date>
    </item>
    <item>
      <title>Re: AttributeError in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185735#M80830</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11048/falbani.html" nodeid="11048"&gt;@Felix Albani&lt;/A&gt; There is still some issue. Tables were exist in hive but I am not able to access it. Its showing below error while I am doing a select * from table.&lt;/P&gt;&lt;PRE&gt;&amp;lt;small&amp;gt;java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1531811351810_0064_1_00, diagnostics=[Task failed, taskId=task_1531811351810_0064_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
&amp;lt;strong&amp;gt;Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet&amp;lt;/strong&amp;gt;
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.&amp;lt;init&amp;gt;(TezGroupedSplitsInputFormat.java:135)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
	at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
	at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
	at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
	at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)&amp;lt;/small&amp;gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 18 Jul 2018 18:59:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185735#M80830</guid>
      <dc:creator>debananda_sahoo</dc:creator>
      <dc:date>2018-07-18T18:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: AttributeError in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185736#M80831</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/75157/debanandasahoo.html" nodeid="75157"&gt;@Deb&lt;/A&gt; This looks to be related to parquet way for coding being different in spark than in hive. Have you tried reading a different non parquet table? &lt;/P&gt;&lt;P&gt;Try adding the following configuration for the parquet table: &lt;/P&gt;&lt;P&gt;.config("spark.sql.parquet.writeLegacyFormat","true")&lt;/P&gt;&lt;P&gt;If that does not work please open a new thread on this issue and we can follow up on this new thread. &lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jul 2018 20:17:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/AttributeError-in-Spark/m-p/185736#M80831</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-07-18T20:17:57Z</dc:date>
    </item>
  </channel>
</rss>

