Member since
12-13-2016
72
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6547 | 12-27-2017 05:06 AM |
11-06-2022
05:01 AM
Hi @varun_rathinam . Were you able to solve the above error by any chance ? This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue.
... View more
10-12-2020
09:33 AM
@varun_rathinam You can use MergeContent processor if it fits for your use case. It is better way to handle small file issue in NiFi. Please refer below link for more details. https://community.cloudera.com/t5/Support-Questions/Merge-Content-for-small-content-issue/td-p/167660 BR, Akash
... View more
05-04-2020
07:25 AM
@varun_rathinam Accessing json in an array object via EvaluateJsonPath can be quite confusing. I also notice the structure of your json is kind of confusing with same values in both. I have adjusted id2 for cc and dd for testing so that I can tell id1 and id2 values apart. The solution you want is (see template for exact string values): Notice we use the normal tree for each json object ( $.object ) then access the array ( 0, 1 ) then access the array's objects. Also notice it is possible to access the json object array with or without a . before the [. Reference: https://community.cloudera.com/t5/Support-Questions/how-to-extract-fields-in-flow-file-which-are-surrounded-by/m-p/208635 You can also find my template during testing of your issue on my GitHub: https://github.com/steven-dfheinz/NiFi-Templates/blob/master/NiFI_EvaluateJsonPath_Demo.xml If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
03-05-2020
10:47 AM
@varun_rathinam Observations from your configuration: 1. You are using "Defragment" merge strategy which tells me that somewhere upstream in your dataflow you are splitting some FlowFile in to fragments and then you are using this processor to merge those fragments back in to the original FlowFile. Correct? When using Defragment you can not use multiple MergeContent processors in series as i mentioned earlier because the defragment strategy is expecting to find all fragments from the fragment count before merging them. 2. When using the defragment strategy it is the fragment.count attribute on the FlowFiles that dictates when the bin should be merged and not the min number of entries. 3. Each FlowFile that has a unique value in the fragment.identifier will be allocated to a different bin. Setting the number of bins to "1" will never work no matter which merge strategy you choose to use. When the MergeContent processor executes it first checks to see if a free bin is available (if not it merges oldest bin or routes oldest bins FlowFiles to failure in case of Defragment to free up a bin), then it looks at the current FlowFiles in the inbound connection at that exact moment in time and starts allocating them to existing bins or new bins. So at a minimum you should always have at least "2" bins. The default is "5" bins. Having multiple bins does not mean that all those available bins will be used. 4. I see you changed Maximum Number of Entries from default 1000 to 100000. Is this because you know each of the FlowFiles you split will produce up to 100,000 FlowFiles? As i mentioned the ALL FlowFiles allocated to bins have their attributes held in heap memory. Adding to that... If you have multiple bins being filled because you have unique fragment.identifiers being defragmented, you could have even more than 100,000 FlowFiles worth of attributes in heap memory. So your NiFi JVM heap memory being set at only 2GB may lead you to hitting Out Of Memory (OOM) conditions with such a dataflow design. Also want to add that where ever you are doing the original splitting of your FlowFile in your dataflow will also have an impact on heap memory because the FlowFile Attributes for every FlowFile being produced during the split process is held in heap memory until every new split FlowFile being produced is committed to a downstream connection. NiFi connections between processors have swapping enabled by default to help reduce heap usage when queues get large, but same does not apply within the internals of a processors execution. As i mentioned before, the MergeContent does not load FlowFile content in heap memory, so the size of your FlowFiles does not impact heap here. So you really want to step back and look at your use case again and ask yourself: "do I really need to split my source FlowFile and merge it back in to the original FlowFile to satisfy my use case?" NiFi has numerous record based processors for working with records avoiding the need to split them in many use cases. Hope this helps, Matt
... View more
12-15-2019
09:26 PM
Sure thanks. @MattWho. it works!
... View more
02-06-2018
01:29 PM
2 Kudos
@Varun R
After split text processor use extract text processor and add new property with matching regex, then the extracted attribute will be added as the flowfile attribute. Example:- After split text you are having each address in a flowfile as like this http://aaa.com/q="bigdata"≈i_key="", now you want to know which query param value(ex:bigdata) has been used. Then use extract text add new property query_values
q="(.*?)" output flowfile:- Once the flowfile processed by extract text processor it matches the regex and adds the attribute query_values to the flowfile. By this way you are going to know which query param values are used to get response. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
07-21-2017
09:19 PM
1 Kudo
@Varun Please see the below to control number of Reducers setting MAPRED.REDUCE.TASKS = -1 -- this property lets Tez determine the no of reducers to initiate hive.tez.auto.reducer.parallelism = true; --this property is enabled to TRUE, hive will estimate data sizes and set parallelism estimates. Tez will sample source vertices, output sizes and adjust the estimates at run time
this is the 1st property that determines initial number of reducers once Tez starts the query hive.tex.min.partition.factor =0.25;-- when auto parallelism enable, this property will be used to put a lower limit to number of reducers that Tez specified 1. hive.tez.max.partition.factor - 2.0; -- this property specifies,over-partition data in shuffle edges 2.hive.exec.reducers.max by default is 1099 --max number of reducers 3.hive.exec.reducers.bytes.per.reducer = 256 MB; which is 268435456 bytes Now to calculate the number of reducers we will need to put altogether, along with this formula
also from Explain plan we will need to get the size of output, lets assume 200,000 bytes Max(1, Min(hive.exec.reducers.max [1099], Reducer Stage estimate/hive.exec.reducers.bytes.per.reducer)) x hive.tez.max.partition.factor [2]
Max(1, Min(1099, 200000/268435456)) x 2 =MAX(1,min(1099,0.00074505805)) X 2 =MAX(1,0.0007) X 2 = 1 X 2
= 2
Tez will spawn 2 Reducers. In this case we can legally make Tez initiate higher number of reducers by modifying value of hive.exec.reducers.bytes.per.reducer
by setting it to 20 KB =Max(1,min(1099,20000/10432)) X 2 =Max(1,19) X 2 = 38 Please note higher number of reducers doesn't mean better performance
... View more
04-25-2017
03:30 PM
1 Kudo
Thanks all for your responses. Once again i reassign ownership. It works!!! ## hdfs dfs -chown -R admin:hadoop /user/admin
... View more
03-22-2017
07:48 PM
1 Kudo
Varun, I do not believe this functionality exist for Zeppelin at this time. You would beed an additional query to complete this task.
... View more
02-16-2017
03:04 PM
@Anshuman Ghosh No inconvenience at all. We just want to keep it as easy as possible for community members to find similar issues and solutions. I would have moved it myself if I couldhave. 🙂
... View more