Member since
07-31-2017
14
Posts
2
Kudos Received
0
Solutions
09-06-2017
05:51 PM
This made a *huge* difference. I'll accept the top level answer, but parallel processing made a big difference in this case. The processor itself is still fairly slow, but that may be a function of the action that its taking. I'm wondering if moving the data into memory prior to processing would make any difference. Thanks for the heads up though!
... View more
09-05-2017
07:58 PM
I'm taking raw pipe-delimited text files, converting them to Avro and then converting them to ORC files (because ORC files are awesome), and everything is working swimmingly, except the conversion from Avro to ORC is extremely slow, which is causing my processing to back up infinitely. Is there a better method to convert raw text into an ORC file in NiFi or some kind of efficiency that can be gained to allow the data to flow through much faster?
... View more
Labels:
- Labels:
-
Apache NiFi
08-24-2017
05:20 PM
You're absolutely correct, I hadn't even looked at that. I was using ver 2.3 of the Hive JDBC driver. Downgrading to 2.2 worked like a charm. Thanks!
... View more
08-24-2017
04:32 PM
Attempting to instantiate a JDBC connection to HiveServer2 Interactive results with the attached error. Error text: Could not open client transport with JDBC Uri: jdbc:hive2://127.0.0.1:36467/poc: Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.
Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.
Failed to open new session: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.
java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.
java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.resultset.default.fetch.size does not exists.
... View more
Labels:
- Labels:
-
Apache Hive
08-07-2017
01:08 PM
I have several datasets that together can be used to build a hierarchy, and in a typical RDMBS we would be able to use a recursive query or more proprietary method (CONNECT_BY) to build the hierarchy. Unfortunately the datasets are so huge that performance is terrible and it would be much better served in a Hadoop environment. Seems that most of the Apache stack does not yet support recursion on WITH statements, so is there a more programmatic method of building a hierarchy that people typically use in this situation?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark