Support Questions

Find answers, ask questions, and share your expertise

Out of memory heap error while processing bulk file (1.26.0)

avatar
Contributor

I have a text file which needs to be fetched from remote server through SFTP. File will be having 26k+ rows with | separated values like below:

I have used below processor's to do the data manipulation. But facing “Processing halted.Out of memory heap” error after splitJSON.

VikasNifi_1-1732798103698.png

Please help with the better approach to handle this case.

7 REPLIES 7

avatar
Super Guru

Hi,

Its seems like you are running out of heap memory when adding new attributes through the evaluateJsonPath processor. Attributes are stored in the heap and you should be aware not to store large data in flowfile attributes if you are going to have so many flowfiles in order to avoid running into such issue.

Can you please elaborate on what are you trying to accomplish after converting Avro To Json? to me it doesnt make sense what you are doing because you are  merging towards the end which means  you might not even get the attribute you are extracting depending on how you set the Attribute Strategy in the MergeRecord processor.

avatar
Contributor

@SAMSAL 

Below is the thing that I am trying for:
 
Input file sample: (Which will have 40k+ records, with a header and data segments, with "|" separated columns.)
 
DATE|CUST_ID|CUST_NAME|PAYMENT_AMOUNT|PAYMENT_TYPE|CARD_TYPE
2024-11-29|123|Test1|0.0100|Credit Card|Visa
2024-11-29|456|Test2|10.00|Credit Card|Master
2024-11-29|789|Test3|500.00|Credit Card|American Express
**********************************************************************************
Expected output: This should be in a file with column having fixed length and sent to the remote server using the 'putSFTP' processor.
 
20241129Visa                Test1               0.01      Credit    
20241129Master              Test2               10.00     Credit    
20241129American Express    Test3               500.00    Credit     
*************************************************************************************
I am storing the input data in a database table using the 'putDatabaseRecord' processor and retrieving it using the 'executeSQL' processor (using SQL query adding fixed length for columns) 
 
convertAvroToJSOn output:
[
{
"date": "20241129", 
"card_type": "Master              ", 
"payment_amount": "10.00      ", 
"payment_type": "Credit    ",
"cust_name": "Test1               "  
},
{
"date": "20241129", 
"card_type": "Visa                ", 
"payment_amount": "0.01      ", 
"payment_type": "Credit    ",
"cust_name": "Test2               "
}
]
 
SplitJSON to split the JSON and evaluateJSON path to get each attribute value
Update attribute to place value in require output pattern
Merge all the content and put it in a file.

avatar
Contributor
Below is the thing that I am trying for:
 
Input file sample: (Which will have 40k+ records, with a header and data segments, with "|" separated columns.)
 
DATE|CUST_ID|CUST_NAME|PAYMENT_AMOUNT|PAYMENT_TYPE|CARD_TYPE
2024-11-29|123|Test1|0.0100|Credit Card|Visa
2024-11-29|456|Test2|10.00|Credit Card|Master
2024-11-29|789|Test3|500.00|Credit Card|American Express
 
Expected output: This should be in a file with column having fixed length and sent to the remote server using the 'putSFTP' processor.
 
20241129Visa                Test1               0.01      Credit    
20241129Master              Test2               10.00     Credit    
20241129American Express    Test3               500.00    Credit     
 
I am storing the input data in a database table using the 'putDatabaseRecord' processor and retrieving it using the 'executeSQL' processor (using SQL query adding fixed length for columns) 
 
convertAvroToJSOn output:
[
{
"date": "20241129", 
"card_type": "Master              ", 
"payment_amount": "10.00      ", 
"payment_type": "Credit    ",
"cust_name": "Test1               "  
},
{
"date": "20241129", 
"card_type": "Visa                ", 
"payment_amount": "0.01      ", 
"payment_type": "Credit    ",
"cust_name": "Test2               "
}
]
 
SplitJSON to split the JSON and evaluateJSON path to get each attribute value
Update attribute to place value in require output pattern
Merge al the content and put it in a file.

avatar
Contributor

@SAMSAL :

Below is the thing that I am trying for:
-------------------------------------------------------------------
Input file sample: (Which will have 40k+ records, with a header and data segments, with "|" separated columns.)
 
DATE|CUST_ID|CUST_NAME|PAYMENT_AMOUNT|PAYMENT_TYPE|CARD_TYPE
2024-11-29|123|Test1|0.0100|Credit Card|Visa
2024-11-29|456|Test2|10.00|Credit Card|Master
2024-11-29|789|Test3|500.00|Credit Card|American Express
-------------------------------------------------------------------------
Expected output: This should be in a file with column having fixed length and sent to the remote server using the 'putSFTP' processor.
 
20241129Visa                Test1               0.01      Credit    
20241129Master              Test2               10.00     Credit    
20241129American Express    Test3               500.00    Credit     
--------------------------------------------------------------------------
I am storing the input data in a database table using the 'putDatabaseRecord' processor and retrieving it using the 'executeSQL' processor (using SQL query adding fixed length for columns) 
 
convertAvroToJSOn output:
[
{
"date": "20241129", 
"card_type": "Master              ", 
"payment_amount": "10.00      ", 
"payment_type": "Credit    ",
"cust_name": "Test1               "  
},
{
"date": "20241129", 
"card_type": "Visa                ", 
"payment_amount": "0.01      ", 
"payment_type": "Credit    ",
"cust_name": "Test2               "
}
]
 
SplitJSON to split the JSON and evaluateJSON path to get each attribute value
Update attribute to place value in require output pattern ( (date)(card_type)(cust_name)(payment_amount)(payment_type) )
Merge all the content into single file and put it in a file.

avatar
Contributor

@SAMSAL 

Below is the thing that I am trying for:
 
Input file sample: (Which will have 40k+ records, with a header and data segments, with "|" separated columns.)
 
DATE|CUST_ID|CUST_NAME|PAYMENT_AMOUNT|PAYMENT_TYPE|CARD_TYPE
2024-11-29|123|Test1|0.0100|Credit Card|Visa
2024-11-29|456|Test2|10.00|Credit Card|Master
2024-11-29|789|Test3|500.00|Credit Card|American Express
******************************************************************************************
 
Expected output: This should be in a file with column having fixed length and sent to the remote server using the 'putSFTP' processor.
 
20241129Visa                Test1               0.01      Credit    
20241129Master              Test2               10.00     Credit    
20241129American Express    Test3               500.00    Credit     
********************************************************************************
 
I am storing the input data in a database table using the 'putDatabaseRecord' processor and retrieving it using the 'executeSQL' processor (using SQL query adding fixed length for columns) 
 
convertAvroToJSOn output:
[
{
"date": "20241129", 
"card_type": "Master              ", 
"payment_amount": "10.00      ", 
"payment_type": "Credit    ",
"cust_name": "Test1               "  
},
{
"date": "20241129", 
"card_type": "Visa                ", 
"payment_amount": "0.01      ", 
"payment_type": "Credit    ",
"cust_name": "Test2               "
}
]
 
SplitJSON to split the JSON and evaluateJSON path to get each attribute value
Update attribute to place value in require output pattern
Merge al the content and put it in a file.

avatar
Contributor

@SAMSAL 

Below is the thing that I am trying for:
 
Input file sample: (Which will have 40k+ records, with a header and data segments, with "|" separated columns.)
 
DATE|CUST_ID|CUST_NAME|PAYMENT_AMOUNT|PAYMENT_TYPE|CARD_TYPE
2024-11-29|123|Test1|0.0100|Credit Card|Visa
2024-11-29|456|Test2|10.00|Credit Card|Master
2024-11-29|789|Test3|500.00|Credit Card|American Express
******************************************************************************
Expected output: This should be in output file with column having fixed length and sent to the remote server using the 'putSFTP' processor.
 
20241129Visa                Test1               0.01      Credit    
20241129Master              Test2               10.00     Credit    
20241129American Express    Test3               500.00    Credit     
***********************************************************************************
I am storing the input data in a database table using the 'putDatabaseRecord' processor and retrieving it using the 'executeSQL' processor (using SQL query adding fixed length for columns) 
 
convertAvroToJSOn output:
[
{
"date": "20241129", 
"card_type": "Master              ", 
"payment_amount": "10.00      ", 
"payment_type": "Credit    ",
"cust_name": "Test1               "  
},
{
"date": "20241129", 
"card_type": "Visa                ", 
"payment_amount": "0.01      ", 
"payment_type": "Credit    ",
"cust_name": "Test2               "
}
]
 
SplitJSON to split the JSON and evaluateJSON path to get each attribute value
Update attribute to place value in require output pattern
Merge al the content and put it in a file.

avatar
Super Guru

Hi @Vikas-Nifi ,

I think can avoid a lot of overhead such as writing the data to the DB for just doing the transformation and assigning the fixed width (unless you need to store the data in the DB). You can use processors like QueryRecord, UpdateRecord  to do the needed transformation of data in bulk vs one record at a time and one field at a time. In QueryRecord you can use SQL like function based on apache calcite sql syntax to make transformation or derive new columns just as if you are doing mysql query. UpadateRecord also you can use Nifi Record Path to traverse fields and apply functions in bulk vs one record at a time. There is also a FreeFormTextRecordSetWriter service that you can use to create custom format as an output. For example in the following dataflow, Im using ConvertRecord process with CSVReader and FreeFormTextRecordSetWriter  to produce desired out:

SAMSAL_0-1732995025918.png

The GenerateFlowFile processor is used to create the input CSV in flowfile:

SAMSAL_1-1732995108765.png

The ConvertRecord is configured as follows:

SAMSAL_2-1732995152930.png

The CSVReader you can use default configuration.

The FreeFormTextRecordSetWriter is configured as follows:

SAMSAL_3-1732995248562.png

In the Text Property you can use the columns\fields names as listed in the input and provided to the reader . You can also use Nifi Expression Language to do proper formatting and transformation to  the written data as follows:

${DATE:replace('-',''):append(${CARD_TYPE}):padRight(28,' ')}${CUST_NAME:padRight(20,' ')}${PAYMENT_AMOUNT:padRight(10,' ')}${PAYMENT_TYPE:padRight(10,' ')}

This will produce the following output:

20241129Visa                Test1               0.01      Credit Card
20241129Master              Test2               10.0      Credit Card
20241129American Express    Test3               500.0     Credit Card

I know this not 100% what you need but it should give you an idea what you need to do to get the desired output.

Hope that helps and if it does, please accept the solution.

Let me know if you have any other questions.

Thanks