Member since
01-14-2019
144
Posts
48
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
907 | 10-05-2018 01:28 PM | |
780 | 07-23-2018 12:16 PM | |
1101 | 07-23-2018 12:13 PM | |
6269 | 06-25-2018 03:01 PM | |
3360 | 06-20-2018 12:15 PM |
06-25-2018
07:42 PM
1 Kudo
You can use a regular expression to isolate the header line (for example search the entire content, start at the beginning using "^", and stop on the first newline) and replace it with what you'd like to have. You'l have to test out different regex strings to see what works for you but this should get you started.
... View more
06-25-2018
03:01 PM
@RAUI In Ambari, you can go to the NiFi service and select More Actions->Restart All. This will restart all the nodes of NiFi in your cluster at the same time.
... View more
06-21-2018
11:50 AM
I think you've seen this blog post already but just in case you haven't: https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka You'll want to understand whether the bottleneck is on the Kafka side or on the NiFi side so you can understand where to appropriately tune. How many NiFi nodes do you have? How many Kafka nodes? How many partitions for your Kafka topic? The blog post above goes into detail on how to match partitions with NiFi nodes and concurrent tasks as well.
... View more
06-21-2018
11:34 AM
How many rows of data do you have? Can you test this issue out with a new set of tables with a few rows in them? Can you try removing the rest of the columns to remove any extra variables?
... View more
06-20-2018
01:36 PM
What datatype is mdse_item_i originally? Can you paste the output here for when you get the 7 distinct values versus the 10 distinct values? I'd like to see what the difference is.
... View more
06-20-2018
12:15 PM
2 Kudos
Is your data on HDFS? If so, you would use the GetHDFS processor to load your file into a FlowFile. If your data is on your local NiFi node, then you would use a GetFile processor to load the file. Next if you want to split by newline, you could use SplitText processor to split your file into multiple FlowFiles. If you only want to split by your '#@' and '#$' you can use the SplitContent processor. That processor will split based on a sequence of text characters (set the 'Byte Sequence Format' to 'text') so you can put in '#@' to split on. I'm not sure exactly how you'd like to divide your data but that should give you a starting point. You can chain multiple of these SplitContent processors together to split on multiple character sequences. Ultimately, your one file on disk will be converted into multiple FlowFiles in NiFi. Take a look at the SplitContent processor for more info: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.SplitContent/index.html
... View more
06-19-2018
11:07 PM
1 Kudo
Here it is off of the NiFi Git repository - the code as well as everything that makes the project is open source so feel free to use it. https://git-wip-us.apache.org/repos/asf?p=nifi-site.git;a=tree;f=src/images;h=4319258d1204c08c31497c4494f46ddfd0a09e2f;hb=HEAD
... View more
06-19-2018
10:56 PM
1 Kudo
Yes, if what you are asking is to add an extra piece of data to a NiFi FlowFile then you can do that. What I am not sure of is the format of the data in your FlowFile - is it JSON, CSV, something else? If it is a human-readable format, you can use the ReplaceText processor to add more data into your FlowFile content. You'll need to modify your destination table schema and add another column to it assuming you're using Hive to read the data. The ReplaceText processor accepts statements in NiFi expression language so you'll want to read up on that to find out how to best find your string location and then insert text into it. https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
... View more
06-18-2018
02:45 PM
From the 'top' documentation: %CPU -- CPU Usage : The percentage of your CPU that is being used by the process. By default, top displays this as a percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, top will show a CPU use of 180%. Yes, MapReduce utilizes more than one core on your machine - it is parallelized at the node level as well as at the process level to take advantage of as many cores as possible. The processing of each row of data is independent of all other rows of data so that the data can be split up in as many ways as you have processing capabilities.
... View more
06-18-2018
02:13 PM
It looks like the application you've written uses almost 500 MB of driver memory. It sounds like your goal is to utilize all the CPU that your nodes carry - you'll have to either change the way your application works (to reduce the driver RAM) or reduce the executor memory to use all of the threads that your cluster offers.
... View more