Member since
05-03-2016
13
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5586 | 06-17-2016 08:54 PM |
06-30-2017
09:16 PM
In theory hive streaming should run as fast as you can write to HDFS with the overhead happening when committing transactions. So to improve performance reduce the amount of transactions happening against a table. Here are a few knobs to turn. 1) Check the NiFi version. There was a bug fix in NiFi 1.2/ HDF 3.0. The processor has a new config property to set records per transaction.
2) Set the number of records per transaction high to improve throughput. 3) Increase the number of transactions per batch but not too high. If the data stream does not have enough data to quickly use the transactions in the batch they will be created and time out needlessly. As you are streaming use the show transactions command to monitor if transactions are timing out. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowTransactions 4) On the hive side increase the number of threads doing the compaction. hive.compactor.worker.threads Data is streamed into new files and hive does minor and major compactions to merge the new data into the existing ORC files.
... View more
06-30-2017
08:22 PM
1 & 2) Trying running "analyze table" to generate row and data size statistics. https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE 3) ORC files are compressed with zlib by default. zlib offers a higher level of compression than snappy. If you don't want compression you have to set orc.compress to "NONE" 4) I believe this is referencing the hive compression feature. Text files can be gzipped or bzipped and still read by Hive. https://cwiki.apache.org/confluence/display/Hive/CompressedStorage
... View more
06-17-2016
08:56 PM
I should add my odbc.ini and odbcinst.ini files are under /Library/ODBC too. I am using System DSNs only.
... View more
06-17-2016
08:54 PM
8 Kudos
I ran into this same problem with Excel 2016 on El Capitan. Jiaxing Liang is right in that OS X's sandboxing is blocking access to libhortonworkshiveodbc.dylib. You can verify the same by launching /Applications/Utilities/Console and filtering on 'sandboxd'. The Activity Monitor can also display if sandboxing is enabled. View -> Columns -> Sandbox. As a work around I copied the Hortonworks Hive ODBC driver from the default install location of /opt/hortonworks to /Library/ODBC/hortonworks. I then updated the odbc.ini and odbcinst.ini files to reference the new driver location. # Driver: The location where the ODBC driver is installed to.
Driver=/Library/ODBC/hortonworks/hiveodbc/lib/universal/libhortonworkshiveodbc.dylib The ErrorMessagesPath in the hortonworks.hiveodbc.ini files was being blocked too so that needed updating as well. ErrorMessagesPath=/Library/ODBC/hortonworks/hiveodbc/ErrorMessages/
... View more
06-08-2016
08:59 AM
When working with the NiFi REST API and no UI is it possible to exclude the processor's config -> descriptors field from the JSON response? Simply looking to avoid the large volume of useless data during development.
... View more
Labels:
- Labels:
-
Apache NiFi
05-27-2016
07:30 PM
Two things I would validate. 1) Make sure java.library.path has the mqjbnd in the path. I believe System.loadLibrary uses java.library.path instead of the LD_LIBRARY_PATH. Not sure about WMQ but if the newly loaded library needs to load another library then LD_LIBRARY_PATH will need to be set also. 2) Make sure the bit versions of the JVM and MQ client are the same. For example if you have a 64 bit JVM make sure it is loading the 64bit WMQ libraries and not 32 bit.
... View more
05-24-2016
01:18 AM
Thanks for the reply Andrew. I ended up using your alternative solution. The search and filter approach ended up being too tedious and couldn't guarantee accuracy.
... View more
05-23-2016
11:35 AM
The majority of commercial products used to move files (video in the media industry) use a UDP data channel with a TCP control channel to guarantee delivery and reduce the overhead of the TCP protocol. If a PutUDP was paired with ListenUDP could/should these processors be made to use a TCP control channel to group the UDP packets? Feels like we would be going beyond the intent of what a UDP processor should do.
... View more
05-23-2016
11:20 AM
In the NiFi rest api is it possible to define scope and parameters in the search-results endpoint? I would like to be able to limit searches to root processGroups and refine the search to match only on specific fields such as the name and not property values. Is this possible? Or are their alternitves to achieve this with the rest api when the UUIDs are unkown?
... View more
Labels:
- Labels:
-
Apache NiFi
05-15-2016
12:53 PM
1 Kudo
Does nifi offer any UDP acceleration when transferring large files over the WAN? Something similar to Aspera's FASP or open source variants such as Tsunami UDP. I would be interested in this feature in the site to site protocol and for moving files from edge sources potentially with minifi.
... View more
Labels:
- Labels:
-
Apache NiFi