Member since
05-19-2021
3
Posts
2
Kudos Received
0
Solutions
11-11-2021
08:42 PM
2 Kudos
Description
NiFI performance comparison between Python and groovy script. This analysis shows the performance impact of programming scripts in NiFi. And also, it has an impact on setting concurrent tasks. Here, generate processor has input, and it is coming out as a attribute. Groovy and Python script read the attributes and replace them with the actual value for target_file_cfg. Below are the sample codes.
Objective
Translate Python script to groovy script and analyze the performance and set the concurrent task accordingly.
Use Case
Following is the input where the target file name will replace with the actual value.
input-target_file_cfg — /movement/%C/%D/MOVE%V/%F
output-target-filename — /movement/JP/None/MOVE055/xxxxxx.xxxx.x
Input
target_file_cfg — /movement/%C/%D/MOVE%V/%F
country-JP
dns-country-code- JP
dns-site-number- 055
source-file-country-code-WM
source-filename-xxxxxx.xxxx.x
target_file_cfg — /movement/%C/%D/MOVE%V/%F
Output
target-filename — /movement/JP/None/MOVE055/xxxxxx.xxxx.x
Current approach
The Python script, in its current version, can be viewed in the following code block. (Sample Python code).
import java.io
import re
def hasnumbers(inputString):
return any(char.isdigit() for char in inputString)
flowFile = session.get()
if (flowFile != None):
targetFileCfg = flowFile.getAttribute("target_file_cfg")
originalFileName = flowFile.getAttribute("source-filename")
sourceFileCountryCode = flowFile.getAttribute("source-file-country-code")
storeNumber = flowFile.getAttribute("dns-site-number")
storeNumberShort = flowFile.getAttribute("dns-site-number")
storeNumberSegment = storeNumber[:3]
countryCode = flowFile.getAttribute("dns-country-code")
hostname = flowFile.getAttribute("dns-hostname")
if (re.search(r"%n",targetFileCfg) != None):
targetFileCfg=re.sub(r"%n",storeNumber,targetFileCfg)
if (re.search(r"%#",targetFileCfg) != None):
targetFileCfg=re.sub(r"%#",storeNumberShort,targetFileCfg)
if (re.search(r"%V",targetFileCfg) != None):
targetFileCfg=re.sub(r"%V",storeNumberSegment,targetFileCfg)
if (re.search(r"%N",targetFileCfg) != None):
targetFileCfg=re.sub(r"%N",hostname,targetFileCfg)
if (re.search(r"%Z",targetFileCfg) != None):
if (region == "XX"):
targetFileCfg=re.sub(r"%Z",countryCode,targetFileCfg)
else:
targetFileCfg=re.sub(r"%Z",region,targetFileCfg)
if (re.search(r"%r",targetFileCfg) != None):
targetFileCfg=re.sub(r"%r",countryCode,targetFileCfg)
if (re.search(r"%F",targetFileCfg) != None):
targetFileCfg=re.sub(r"%F",originalFileName,targetFileCfg)
if (re.search(r"%D",targetFileCfg) != None):
if (division < 0):
division = 0
targetFileCfg=re.sub(r"%D",division.zfill(2),targetFileCfg)
if (re.search(r"%",targetFileCfg) != None):
session.transfer(flowFile, REL_FAILURE)
else:
flowFile = session.putAttribute(flowFile,"target-filename",targetFileCfg)
session.transfer(flowFile, REL_SUCCESS)
session.commit()
Groovy Script conversion
The Groovy script converted from above Python script, can be viewed in the following code block. (Sample code)
flowFile = session.get()
if (!flowFile)
return
targetFileCfg = flowFile.getAttribute("target_file_cfg");
originalFileName = flowFile.getAttribute("source-filename");
sourceFileCountryCode = flowFile.getAttribute("source-file-country-code");
storeNumber = flowFile.getAttribute("dns-site-number");
storeNumberShort = flowFile.getAttribute("dns-site-number");
storeNumberSegment = storeNumber[0..2];
countryCode = flowFile.getAttribute("dns-country-code");
region = flowFile.getAttribute("region");
dusCC = flowFile.getAttribute("country");
division = flowFile.getAttribute("division");
hostname = flowFile.getAttribute("dns-hostname");
storeNumberShort = storeNumberShort[1..4];
if (targetFileCfg.contains("%n")) {
targetFileCfg = targetFileCfg.replaceFirst("%n",storeNumber);
}
if (targetFileCfg.contains("%#")){
targetFileCfg = targetFileCfg.replaceFirst("%#",storeNumberShort);
}
if (targetFileCfg.contains("%V")){
targetFileCfg=targetFileCfg.replaceFirst("%V",storeNumberSegment);
}
if (targetFileCfg.contains("%N")) {
targetFileCfg=targetFileCfg.replaceFirst("%N",hostname);
}
if (targetFileCfg.contains("%Z")) {
if (region == "XX") {
targetFileCfg=targetFileCfg.replaceFirst("%Z",countryCode);
}
else {
targetFileCfg=targetFileCfg.replaceFirst("%Z",region);
}
}
if (targetFileCfg.contains("%r")) {
targetFileCfg=targetFileCfg.replaceFirst("%r",countryCode);
}
if (targetFileCfg.contains("%F")) {
targetFileCfg=targetFileCfg.replaceFirst("%F",originalFileName);
}
if (targetFileCfg.contains("%")) {
session.transfer(flowFile, REL_FAILURE)
}
flowFile = session.putAttribute(flowFile,"target-filename",targetFileCfg)
session.transfer(flowFile, REL_SUCCESS)
session.commit()
NiFi Flow
NiFi Flow
Metrics
Metrics
Conclusion
Python script can be replaced by Groovy-based on the above metrics.
Groovy was built “for the JVM” and leverages/integrates with Java more cleanly than Python. Try it.
Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all the use cases and is very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.
... View more
Labels:
05-19-2021
11:59 PM
@SC Thanks for the input.
... View more