Created 01-14-2018 10:48 PM
Versions:
HDP-2.6.1
Hive 1.2.1000.2.6.1.0-129
Spark-2.1.1
Python 2.7.13
This is an issue only on a transactional hive table.
In HDFS, for a transactional hive table, data file is created under a delta directory as shown below
/user/acid_table/load_date=2018-01-14/delta_0018772_0018772_0000/bucket_00000
NumberFormatException thrown on delta directory.
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0018773_0000" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) ..... INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> Traceback (most recent call last): File "/home/../ex.py", line 24, in <module> sc1.sql("select * from default.acid_table").toPandas() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1585, in toPandas File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 391, in collect File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o71.collectToPython. : java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
Code:
hiveContext = SparkSession.builder.enableHiveSupport().getOrCreate() hiveContext.sql("select * from default.acid_table").toPandas()
Everything works fine when '0000' suffix is removed from the delta directory.
Please suggest.
Created 03-07-2018 11:16 PM
According to https://issues.apache.org/jira/browse/SPARK-15348, spark now is not support transactional hive table.
Created 03-07-2018 11:16 PM
According to https://issues.apache.org/jira/browse/SPARK-15348, spark now is not support transactional hive table.
Created 03-08-2018 03:57 PM
You will have to wait for the next release of HDP for Spark to Support Hive ACID tables.
Created 09-19-2018 06:55 AM
So this feature is now supported in HDP 3.0?