The data is streamed so the file size should be irrelevant.
Perhaps there are too many reducer slots configured, i.e. the machine doesn?t have enough RAM to handle that many reducers concurrently (there?s also a --reducers CLI option which always rounds up to a multiple of the number of solr shards)
To see if the JVM -Xmx settings are applied correctly, you could, for example, add the registerJVMMetrics and startReportingMetricsToSLF4J commands to your morphline
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/registerJVMMetricshttp://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#startReportingMetricsT...Wolfgang.