Created on 04-09-2020 08:36 AM - last edited on 04-09-2020 08:40 AM by VidyaSargur
So the title basically states it, but I'm currently running into an issue when leveraging Presto to ready from a Hive3 environment if the table is populated with ORC data by Nifi's PutHive3Streaming processor.
Presto is able to read ORC ACID tables if Hive 3 and populated via command line or other nifi processors. I attempted to write data using PutHive3Streaming from later versions of Nifi (1.11.4) to no avail.
Error:
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://path/to/bucket (offset=0, length=29205493): rowsInRowGroup must be greater than zero
Versions:
Nifi HDF 1.9
PrestoSQL 331/332
Created 04-13-2020 06:23 AM
@Eric_B Are the tables Presto cannot read owned by NiFi? The error you share seems like a permissions issue to the underlying files. Also if you can, please share screen shots of your processor configurations.
Created 04-13-2020 08:05 AM
Thanks for responding!
I did think this was a file permissions issue on the start, but I ran some tests.
Test 1: I chown'd/chmod'd the underlining files to match ORC files that presto could read from (those not written by PutHive3Streaming). Didn't work.
Test 2: I ran Nifi's SelectHive3QL (which supports inserts). This wrote the data with file permissions and ownership similar to the other processor. Presto is able to read that data.
Were you able to get to work?
Additionally here's a snippet of puthive3streaming (minus the specifics like table, pathways, dbs). Using an avroreader to write.