Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
New Contributor

Description

NiFI performance comparison between Python and groovy script. This analysis shows the performance impact of programming scripts in NiFi. And also, it has an impact on setting concurrent tasks. Here, generate processor has input, and it is coming out as a attribute. Groovy and Python script read the attributes and replace them with the actual value for target_file_cfg. Below are the sample codes.

Objective

Translate Python script to groovy script and analyze the performance and set the concurrent task accordingly.

Use Case

Following is the input where the target file name will replace with the actual value.

 

input-target_file_cfg — /movement/%C/%D/MOVE%V/%F

output-target-filename — /movement/JP/None/MOVE055/xxxxxx.xxxx.x

Input

target_file_cfg — /movement/%C/%D/MOVE%V/%F

country-JP

dns-country-code- JP

dns-site-number- 055

source-file-country-code-WM

source-filename-xxxxxx.xxxx.x

target_file_cfg — /movement/%C/%D/MOVE%V/%F

Output

target-filename — /movement/JP/None/MOVE055/xxxxxx.xxxx.x

 

Current approach

The Python script, in its current version, can be viewed in the following code block.
(Sample Python code).

import java.io
import re
def hasnumbers(inputString):
return any(char.isdigit() for char in inputString)
flowFile = session.get()
if (flowFile != None):
targetFileCfg = flowFile.getAttribute("target_file_cfg")
originalFileName = flowFile.getAttribute("source-filename")
sourceFileCountryCode = flowFile.getAttribute("source-file-country-code")
storeNumber = flowFile.getAttribute("dns-site-number")
storeNumberShort = flowFile.getAttribute("dns-site-number")
storeNumberSegment = storeNumber[:3]
countryCode = flowFile.getAttribute("dns-country-code")
hostname = flowFile.getAttribute("dns-hostname")
if (re.search(r"%n",targetFileCfg) != None):
targetFileCfg=re.sub(r"%n",storeNumber,targetFileCfg)
if (re.search(r"%#",targetFileCfg) != None):
targetFileCfg=re.sub(r"%#",storeNumberShort,targetFileCfg)
if (re.search(r"%V",targetFileCfg) != None):
targetFileCfg=re.sub(r"%V",storeNumberSegment,targetFileCfg)
if (re.search(r"%N",targetFileCfg) != None):
targetFileCfg=re.sub(r"%N",hostname,targetFileCfg)
if (re.search(r"%Z",targetFileCfg) != None):
if (region == "XX"):
targetFileCfg=re.sub(r"%Z",countryCode,targetFileCfg)
else:
targetFileCfg=re.sub(r"%Z",region,targetFileCfg)
if (re.search(r"%r",targetFileCfg) != None):
targetFileCfg=re.sub(r"%r",countryCode,targetFileCfg)
if (re.search(r"%F",targetFileCfg) != None):
targetFileCfg=re.sub(r"%F",originalFileName,targetFileCfg)
if (re.search(r"%D",targetFileCfg) != None):
if (division < 0):
division = 0
targetFileCfg=re.sub(r"%D",division.zfill(2),targetFileCfg)
if (re.search(r"%",targetFileCfg) != None):
session.transfer(flowFile, REL_FAILURE)
else:
flowFile = session.putAttribute(flowFile,"target-filename",targetFileCfg)
session.transfer(flowFile, REL_SUCCESS)
session.commit()

Groovy Script conversion

The Groovy script converted from above Python script, can be viewed in the following code block. (Sample code)

flowFile = session.get()
if (!flowFile)
return
targetFileCfg = flowFile.getAttribute("target_file_cfg");
originalFileName = flowFile.getAttribute("source-filename");
sourceFileCountryCode = flowFile.getAttribute("source-file-country-code");
storeNumber = flowFile.getAttribute("dns-site-number");
storeNumberShort = flowFile.getAttribute("dns-site-number");
storeNumberSegment = storeNumber[0..2];
countryCode = flowFile.getAttribute("dns-country-code");
region = flowFile.getAttribute("region");
dusCC = flowFile.getAttribute("country");
division = flowFile.getAttribute("division");
hostname = flowFile.getAttribute("dns-hostname");
storeNumberShort = storeNumberShort[1..4];
if (targetFileCfg.contains("%n")) {
targetFileCfg = targetFileCfg.replaceFirst("%n",storeNumber);
}
if (targetFileCfg.contains("%#")){
targetFileCfg = targetFileCfg.replaceFirst("%#",storeNumberShort);
}
if (targetFileCfg.contains("%V")){
targetFileCfg=targetFileCfg.replaceFirst("%V",storeNumberSegment);
}
if (targetFileCfg.contains("%N")) {
targetFileCfg=targetFileCfg.replaceFirst("%N",hostname);
}
if (targetFileCfg.contains("%Z")) {
if (region == "XX") {
targetFileCfg=targetFileCfg.replaceFirst("%Z",countryCode);
}
else {
targetFileCfg=targetFileCfg.replaceFirst("%Z",region);
}
}
if (targetFileCfg.contains("%r")) {
targetFileCfg=targetFileCfg.replaceFirst("%r",countryCode);
}
if (targetFileCfg.contains("%F")) {
targetFileCfg=targetFileCfg.replaceFirst("%F",originalFileName);
}
if (targetFileCfg.contains("%")) {
session.transfer(flowFile, REL_FAILURE)
}
flowFile = session.putAttribute(flowFile,"target-filename",targetFileCfg)
session.transfer(flowFile, REL_SUCCESS)
session.commit()

NiFi Flow

NiFi FlowNiFi Flow

Metrics

MetricsMetrics

Conclusion

Python script can be replaced by Groovy-based on the above metrics.

Groovy was built “for the JVM” and leverages/integrates with Java more cleanly than Python. Try it.

 

Disclaimer:  This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all the use cases and is very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.

1,929 Views