About aaronmdunlap

aaronmdunlap · ‎09-06-2017

This made a *huge* difference. I'll accept the top level answer, but parallel processing made a big difference in this case. The processor itself is still fairly slow, but that may be a function of the action that its taking. I'm wondering if moving the data into memory prior to processing would make any difference. Thanks for the heads up though!

aaronmdunlap · ‎09-05-2017

I'm taking raw pipe-delimited text files, converting them to Avro and then converting them to ORC files (because ORC files are awesome), and everything is working swimmingly, except the conversion from Avro to ORC is extremely slow, which is causing my processing to back up infinitely. Is there a better method to convert raw text into an ORC file in NiFi or some kind of efficiency that can be gained to allow the data to flow through much faster?

aaronmdunlap · ‎08-24-2017

You're absolutely correct, I hadn't even looked at that. I was using ver 2.3 of the Hive JDBC driver. Downgrading to 2.2 worked like a charm. Thanks!

aaronmdunlap · ‎08-24-2017

Attempting to instantiate a JDBC connection to HiveServer2 Interactive results with the attached error. Error text: Could not open client transport with JDBC Uri: jdbc:hive2://127.0.0.1:36467/poc: Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists. Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists. Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists. java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists. java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.

aaronmdunlap · ‎08-07-2017

I have several datasets that together can be used to build a hierarchy, and in a typical RDMBS we would be able to use a recursive query or more proprietary method (CONNECT_BY) to build the hierarchy. Unfortunately the datasets are so huge that performance is terrible and it would be much better served in a Hadoop environment. Seems that most of the Apache stack does not yet support recursion on WITH statements, so is there a more programmatic method of building a hierarchy that people typically use in this situation?

Online	Offline
Last Visited	‎09-14-2017 06:25 PM

Member Since	‎07-31-2017 08:44 PM
Last Visited	‎09-14-2017 06:25 PM
Posts	14
Kudos received	2

Cloudera Community

Re: In NiFi, the ConvertAvroToORC processor is ext...

In NiFi, the ConvertAvroToORC processor is extreme...

Re: LLAP JDBC Connection Fails with "hive.server2....

LLAP JDBC Connection Fails with "hive.server2.thri...

Recursive query or better way to build hierarchica...