So the title basically states it, but I'm currently running into an issue when leveraging Presto to ready from a Hive3 environment if the table is populated with ORC data by Nifi's PutHive3Streaming processor.
Presto is able to read ORC ACID tables if Hive 3 and populated via command line or other nifi processors. I attempted to write data using PutHive3Streaming from later versions of Nifi (1.11.4) to no avail.
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://path/to/bucket (offset=0, length=29205493): rowsInRowGroup must be greater than zero
Nifi HDF 1.9
@Eric_B Are the tables Presto cannot read owned by NiFi? The error you share seems like a permissions issue to the underlying files. Also if you can, please share screen shots of your processor configurations.
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.
Thanks for responding!
I did think this was a file permissions issue on the start, but I ran some tests.
Test 1: I chown'd/chmod'd the underlining files to match ORC files that presto could read from (those not written by PutHive3Streaming). Didn't work.
Test 2: I ran Nifi's SelectHive3QL (which supports inserts). This wrote the data with file permissions and ownership similar to the other processor. Presto is able to read that data.
Were you able to get to work?
Additionally here's a snippet of puthive3streaming (minus the specifics like table, pathways, dbs). Using an avroreader to write.