Member since
05-17-2016
190
Posts
46
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1372 | 09-07-2017 06:24 PM | |
1777 | 02-24-2017 06:33 AM | |
2548 | 02-10-2017 09:18 PM | |
7048 | 01-11-2017 08:55 PM | |
4660 | 12-15-2016 06:16 PM |
02-07-2018
02:52 PM
@Felix Albani Thank you for your feedback... I have made the correction.
... View more
05-11-2017
03:26 PM
Thanks @Matt Burgess. Wanted to be sure if "replace" on the template was a dirty fix.
... View more
07-31-2019
07:21 PM
@amcbarnett : i am trying to aggregate a data using "state,count( distinct val ) group by state " but want just the "Not Null" Val - String datatype
... View more
02-21-2017
02:04 PM
Thanks @Andy LoPresto. This helps.
... View more
02-13-2017
05:57 PM
1 Kudo
For IO The throughput or latency one can expect to see varies greatly, depending on how the system is configured. Given that there are pluggable approaches to most of the major NiFi subsystems, performance depends on the implementation. But, for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50 MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100 MB per second or more of throughput. That is because linear growth is expected for each physical partition and content repository added to NiFi. This will bottleneck at some point on the FlowFile repository and provenance repository. We plan to provide a benchmarking and performance test template to include in the build, which allows users to easily test their system and to identify where bottlenecks are, and at which point they might become a factor. This template should also make it easy for system administrators to make changes and to verify the impact. For CPU The Flow Controller acts as the engine dictating when a particular processor is given a thread to execute. Processors are written to return the thread as soon as they are done executing a task. The Flow Controller can be given a configuration value indicating available threads for the various thread pools it maintains. The ideal number of threads to use depends on the host system resources in terms of numbers of cores, whether that system is running other services as well, and the nature of the processing in the flow. For typical IO-heavy flows, it is reasonable to make many dozens of threads to be available. For RAM NiFi lives within the JVM and is thus limited to the memory space it is afforded by the JVM. JVM garbage collection becomes a very important factor to both restricting the total practical heap size, as well as optimizing how well the application runs over time. NiFi jobs can be I/O intensive when reading the same content regularly. Configure a large enough disk to optimize performance. See: https://community.hortonworks.com/questions/22685/capacity-planning-for-nifi-cluster.html See: https://community.hortonworks.com/questions/4098/nifi-sizing.html https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.html See: https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.html http://apache-nifi.1125220.n5.nabble.com/Nifi-Benchmark-Performance-tests-td1099.html http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.1/bk_dataflow-overview/content/performance-expectations-and-characteristics-of-nifi.html
... View more
10-18-2018
01:27 AM
It was an access issue on the Buckets. Right permission settings on the bucket fixed it.
... View more
01-30-2017
06:35 PM
@Bryan Bende : Thanks for pointing the Jira.
... View more
02-01-2017
03:21 AM
1 Kudo
@Vaibhav Kumar
recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv. grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;
output (1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)
notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.
... View more
12-15-2016
06:16 PM
2 Kudos
Thanks @Karthik Narayanan. I was able to resolve the issue. Before diving into the solutions, I should make the below statement - With NiFi 1.0 and 1.1,
LZO compression cannot be achieved using the PutHDFS processor. The only supported compressions are the ones listed in the compression codec drop down. With the LZO related classes being present in the core-site.xml, the NiFi processor fails to run. The suggestion from the previous HCC post was to remove those classes. It needed to be retained so that NiFi's copy and HDP's copy of core-site are always in sync.
NiFi 1.0
I created the hadoop-lzo jar by building it from sources and added the same to the NiFi lib directory and restarted NiFi.
This resolved the issue and I am able to proceed using the PutHDFS without it erroring out. NiFi 1.1
Configure the processor's additional classpath to the jar file. No restart required.
Note : This does not provide LZO compression, it just can run the processor without ERROR even when you have the LZO classes in the core site.
UNSATISFIED LINK ERROR WITH SNAPPY I also had issue with Snappy Compression codec in NiFi. Was able to resolve it setting the path to the .so file. This did not work on the ambari-vagrant boxes, but I was able to get this working on an openstack cloud instance. The issue on the virtual box could be systemic.
To resolve the link error, I copied the .so files from HDP cluster and recreated the links. And as @Karthik Narayanan suggested, added the java library path to the directory containing the .so files. Below is the list of .so and links
And below is the bootstrap configuration change
... View more
11-18-2016
01:19 PM
Thanks @Matt Burgess. Currently I am handling this using a Javascript, similar approach to what you described. I wanted to confirm there is no other way. For simpler structures, I managed to extract the key values using Regex, but for deep nested keys, I was forced to use ExecuteScript.
... View more