Support Questions

Find answers, ask questions, and share your expertise

Spark not reading data from a Hive managed table. Meanwhile, Hive can query the data in the table just fine.

avatar
New Contributor
// This query:
sqlContext.sql("select * from retail_invoice").show

// gives this output:


+---------+---------+-----------+--------+-----------+---------+----------+-------+


|invoiceno|stockcode|description|quantity|invoicedate|unitprice|customerid|country| 


+---------+---------+-----------+--------+-----------+---------+----------+-------+


+---------+---------+-----------+--------+-----------+---------+----------+-------+

// The Hive DDL for the table in HiveView 2.0:
CREATE TABLE `retail_invoice`(
  `invoiceno` string, 
  `stockcode` string, 
  `description` string, 
  `quantity` int, 
  `invoicedate` string, 
  `unitprice` double, 
  `customerid` string, 
  `country` string)
CLUSTERED BY ( 
  stockcode) 
INTO 2 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://hadoopsilon2.zdwinsqlad.local:8020/apps/hive/warehouse/retail_invoice'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"country\":\"true\",\"quantity\":\"true\",\"customerid\":\"true\",\"description\":\"true\",\"invoiceno\":\"true\",\"unitprice\":\"true\",\"invoicedate\":\"true\",\"stockcode\":\"true\"}}', 
  'numFiles'='2', 
  'numRows'='541909', 
  'orc.bloom.filter.columns'='StockCode, InvoiceDate, Country', 
  'rawDataSize'='333815944', 
  'totalSize'='5642889', 
  'transactional'='true', 
  'transient_lastDdlTime'='1517516006')

I can query the data in Hive just fine. The data is inserted from Nifi using the PutHiveStreaming processor.

We have tried to recreate the table, but the same problem arises. I haven't found any odd looking configurations.

Any Ideas on what could be going on here?

1 REPLY 1

avatar
Contributor

@Matt Krueger

Your table is ACID i.e. transaction enabled. Spark doesn't support reading Hive ACID table. Take a look at SPARK-15348 and SPARK-16996