Created 05-19-2017 09:17 AM
Hi,
I try to get 2,500,000 records from a MongoDB collection with
1. these parameters in bootstrap.conf
# JVM memory settings
java.arg.2=-Xms6144m
java.arg.3=-Xmx6144m
2.the following properties processor
SSL Context Service No value set
Client Auth NONE
With 4Go - Starting GetMongo processor at 09:05 and Error at 09:24
2017-05-19 09:24:40,150 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.mongodb.GetMongo GetMongo[id=4ee5171c-1006-115b-5dc0-6ef54c1e9a73] GetMongo[id=4ee5171c-1006-115b-5dc0-6ef54c1e9a73] failed to process due to java.lang.OutOfMemoryError: Java heap space; rolling back session: java.lang.OutOfMemoryError: Java heap space
2017-05-19 09:24:40,167 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.mongodb.GetMongo java.lang.OutOfMemoryError : Java heap space
With 6Go - Starting GetMongo processor at 09:42 and Error at 10:28
2017-05-19 10:28:50,336 ERROR [NiFi logging handler] org.apache.nifi.StdErr
2017-05-19 10:28:50,337 ERROR [NiFi logging handler] org.apache.nifi.StdErr Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-2-thread-1"
Do you have any suggestions to help me ?
Thanks
Created 05-22-2017 03:01 PM
Try putting a larger range between the minimum and maximum group size, like 25 MB and 50 MB.
How much memory have you allocated to the NiFi JVM? The default is 512MB, it is set in the bootsrrap.conf file.
# JVM memory settings
java.arg.2=-Xms512m, change to 2g or 4g if you have the memory available on your system
java.arg.3=-Xmx512m, change to 2g or 4g if you have the memory available on your system
Since you are dealing with 1.3 GB in the MergeContent processor, make sure to at least allocate double that for the NiFi JVM, because the MergeContent processor uses the JVM memory to build it merged flow files. In addition I would set the number of Concurrent Tasks to 3.
Created 05-19-2017 12:51 PM
Try setting the Batch Size property to 1000, and see if that helps.
Created 05-19-2017 02:53 PM
I've set it to 100 or 1000 or 2000. After about 10 minutes, the processor reads with success all the collection but in one shot and whatever the value of the propoerty. Is it normal ?
Created 05-19-2017 05:56 PM
So, you are saying it works with Batch Size set, but it does not matter what you set the value to?
Created 05-22-2017 10:04 AM
Exactlty
The queue after GetMongo has 2,500,000,evt (1,3GB). The following processor (MergeContent) can not empty this queue.
And I don't understand why ?
Created 05-22-2017 11:53 AM
How are the properties set in the MergeContent processor?
Created 05-22-2017 01:14 PM
Thanks for your feddback,
Here are the properties (I've also tried with 200MB before but it doesn't work).
Merge Strategy Bin-Packing Algorithm
Merge Format Binary Concatenation
Attribute Strategy Keep Only Common Attributes
Correlation Attribute Name No value set
Minimum Number of Entries 1
Maximum Number of Entries No value set
Minimum Group Size 20 MB
Maximum Group Size 20 MB
Max Bin Age 5 min
Maximum number of Bins 100
Delimiter Strategy Text
Header No value set
Footer No value set
Demarcator
Compression Level 1
Keep Path false
After an hour Nifi fails with a outOfmemory
2017-05-22 12:35:44,770 WARN [NiFi Web Server-22-acceptor-0@2c439296-ServerConnector@ccf1486{HTTP/1.1,[http/1.1]}{0.0.0.0:28080}] o.eclipse.jetty.server.AbstractConnector java.lang.OutOfMemoryError: Java heap space 2017-05-22 12:35:44,784 WARN [NiFi Web Server-21] org.eclipse.jetty.servlet.ServletHandler Error for /nifi-api/flow/controller/bulletins java.lang.OutOfMemoryError: Java heap space at java.lang.StringBuilder.toString(StringBuilder.java:407) ~[na:1.8.0_66] at java.net.Inet4Address.numericToTextFormat(Inet4Address.java:373) ~[na:1.8.0_66] at java.net.Inet4Address.getHostAddress(Inet4Address.java:328) ~[na:1.8.0_66] at org.eclipse.jetty.server.Request.getRemoteAddr(Request.java:1193) ~[na:na] at javax.servlet.ServletRequestWrapper.getRemoteAddr(ServletRequestWrapper.java:275) ~[javax.servlet-api-3.1.0.jar:3.1.0] at org.apache.nifi.web.filter.RequestLogger.doFilter(RequestLogger.java:62) ~[classes/:na] at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676) ~[na:na] at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:316) ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
Created 05-22-2017 03:01 PM
Try putting a larger range between the minimum and maximum group size, like 25 MB and 50 MB.
How much memory have you allocated to the NiFi JVM? The default is 512MB, it is set in the bootsrrap.conf file.
# JVM memory settings
java.arg.2=-Xms512m, change to 2g or 4g if you have the memory available on your system
java.arg.3=-Xmx512m, change to 2g or 4g if you have the memory available on your system
Since you are dealing with 1.3 GB in the MergeContent processor, make sure to at least allocate double that for the NiFi JVM, because the MergeContent processor uses the JVM memory to build it merged flow files. In addition I would set the number of Concurrent Tasks to 3.
Created 05-23-2017 06:23 AM
Hi,
Memory settings already to 8Go for both.
This morning I tried to set Number min and max of Entries and it works !!!
Thanks for your help Wynner.
Regards
Parameters set (so the merge create about 1000 files)
Minimum Number of Entries
1
Maximum Number of Entries
2500
Minimum Group Size
0 B
Maximum Group Size No value set Max Bin Age
5 min
Maximum number of Bins
100
Delimiter Strategy
Text