Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark not reading data from a Hive managed table. Meanwhile, Hive can query the data in the table just fine.
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
-
Apache Spark
New Contributor
Created ‎02-02-2018 09:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
// This query: sqlContext.sql("select * from retail_invoice").show // gives this output: +---------+---------+-----------+--------+-----------+---------+----------+-------+ |invoiceno|stockcode|description|quantity|invoicedate|unitprice|customerid|country| +---------+---------+-----------+--------+-----------+---------+----------+-------+ +---------+---------+-----------+--------+-----------+---------+----------+-------+ // The Hive DDL for the table in HiveView 2.0: CREATE TABLE `retail_invoice`( `invoiceno` string, `stockcode` string, `description` string, `quantity` int, `invoicedate` string, `unitprice` double, `customerid` string, `country` string) CLUSTERED BY ( stockcode) INTO 2 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hadoopsilon2.zdwinsqlad.local:8020/apps/hive/warehouse/retail_invoice' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"country\":\"true\",\"quantity\":\"true\",\"customerid\":\"true\",\"description\":\"true\",\"invoiceno\":\"true\",\"unitprice\":\"true\",\"invoicedate\":\"true\",\"stockcode\":\"true\"}}', 'numFiles'='2', 'numRows'='541909', 'orc.bloom.filter.columns'='StockCode, InvoiceDate, Country', 'rawDataSize'='333815944', 'totalSize'='5642889', 'transactional'='true', 'transient_lastDdlTime'='1517516006')
I can query the data in Hive just fine. The data is inserted from Nifi using the PutHiveStreaming processor.
We have tried to recreate the table, but the same problem arises. I haven't found any odd looking configurations.
Any Ideas on what could be going on here?
1 REPLY 1
Contributor
Created ‎02-04-2018 05:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your table is ACID i.e. transaction enabled. Spark doesn't support reading Hive ACID table. Take a look at SPARK-15348 and SPARK-16996
