<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: pyspark + SparkSql + transactional orc table throws NumberFormatException in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189529#M73473</link>
    <description>&lt;P&gt;According to &lt;A href="https://issues.apache.org/jira/browse/SPARK-15348" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-15348&lt;/A&gt;, spark now is not support transactional hive table.&lt;/P&gt;</description>
    <pubDate>Thu, 08 Mar 2018 07:16:14 GMT</pubDate>
    <dc:creator>xin_wang</dc:creator>
    <dc:date>2018-03-08T07:16:14Z</dc:date>
    <item>
      <title>pyspark + SparkSql + transactional orc table throws NumberFormatException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189528#M73472</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Versions:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;HDP-2.6.1&lt;/P&gt;&lt;P&gt;Hive 1.2.1000.2.6.1.0-129&lt;/P&gt;&lt;P&gt;Spark-2.1.1&lt;/P&gt;&lt;P&gt;Python 2.7.13&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;This is an issue only on a transactional hive table.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In HDFS, for a transactional hive table, data file is created under a delta directory as shown below&lt;/P&gt;&lt;PRE&gt;/user/acid_table/load_date=2018-01-14/delta_0018772_0018772_0000/bucket_00000&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;NumberFormatException thrown on delta directory.&lt;/P&gt;&lt;PRE&gt;Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0018773_0000"
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
.....
INFO PerfLogger: &amp;lt;PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl&amp;gt;
Traceback (most recent call last):
  File "/home/../ex.py", line 24, in &amp;lt;module&amp;gt;
    sc1.sql("select * from default.acid_table").toPandas()
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1585, in toPandas
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 391, in collect
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o71.collectToPython.
: java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Code:&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hiveContext = SparkSession.builder.enableHiveSupport().getOrCreate()
hiveContext.sql("select * from default.acid_table").toPandas()&lt;/PRE&gt;&lt;P&gt;Everything works fine when '0000' suffix is removed from the delta directory.&lt;/P&gt;&lt;P&gt;Please suggest.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jan 2018 06:48:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189528#M73472</guid>
      <dc:creator>manigandaprakas</dc:creator>
      <dc:date>2018-01-15T06:48:31Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark + SparkSql + transactional orc table throws NumberFormatException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189529#M73473</link>
      <description>&lt;P&gt;According to &lt;A href="https://issues.apache.org/jira/browse/SPARK-15348" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-15348&lt;/A&gt;, spark now is not support transactional hive table.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Mar 2018 07:16:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189529#M73473</guid>
      <dc:creator>xin_wang</dc:creator>
      <dc:date>2018-03-08T07:16:14Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark + SparkSql + transactional orc table throws NumberFormatException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189530#M73474</link>
      <description>&lt;P&gt;You will have to wait for the next release of HDP for Spark to Support Hive ACID tables.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Mar 2018 23:57:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189530#M73474</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2018-03-08T23:57:02Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark + SparkSql + transactional orc table throws NumberFormatException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189531#M73475</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/9304/tspann.html" nodeid="9304"&gt;@Timothy Spann&lt;/A&gt;&lt;/P&gt;&lt;P&gt;So this feature is now supported in HDP 3.0?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Sep 2018 13:55:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-SparkSql-transactional-orc-table-throws/m-p/189531#M73475</guid>
      <dc:creator>mvince</dc:creator>
      <dc:date>2018-09-19T13:55:03Z</dc:date>
    </item>
  </channel>
</rss>

