Support Questions

Tomek · ‎06-22-2021

Hello,

Rather new to Hadoop, so this might seems like simple question with a straightforward answer. 🙂

We are presently facing the following error:

2021-06-19 03:31:07.614: Watching for process completion/termination.

2021-06-19 03:31:31.997: Task ReceivingAgent failed: RuntimeException: MapReduce job failed - please look in application logs for details. Cause:Task failed task_1621791333957_1544505_m_000016

Job failed as tasks failed. failedMaps:1 failedReduces:0

AttemptID:attempt_1621791333957_1544505_m_000015_0 Info:Error: com.podiumdata.base.error.PodiumFault: utils.error.code.HADOOP_SPLIT - error processing hadoop file split : unable to find split record boundary for position:2013265920 after 2625536 bytes.

you have specified a simple terminated record with no field enclosures

check your specified recordTerminator:ANY_NEWLINE and your specified fixedFieldCount:4

@at com.nvs.utils.stream.hadoop.CsvSplitInputStream.throwUnableToLocateRecordBoundary(CsvSplitInputStream.java:49)

@at com.nvs.utils.stream.hadoop.PrologueEpilogueSplitInputStream.fillToEndOfRecordBoundary(PrologueEpilogueSplitInputStream.java:228)

@at com.nvs.utils.stream.hadoop.PrologueEpilogueSplitInputStream.fillPrologue(PrologueEpilogueSplitInputStream.java:174)

@at com.nvs.utils.stream.hadoop.PrologueEpilogueSplitInputStream.<init>(PrologueEpilogueSplitInputStream.java:65)

@at com.nvs.utils.stream.hadoop.CsvSplitInputStream.<init>(CsvSplitInputStream.java:28)

@at com.nvs.utils.stream.hadoop.CsvSplitInputStream.newInstance(CsvSplitInputStream.java:21)

@at com.nvs.utils.stream.hadoop.SplitInputStream.newNoncompressedInstance(SplitInputStream.java:82)

@at com.nvs.utils.stream.hadoop.SplitInputStream.newInstance(SplitInputStream.java:63)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.getSplitInputStream(InputStreamReceivingAgentMapper.java:160)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.allocateRecordCutter(InputStreamReceivingAgentMapper.java:113)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.allocateRecordButcher(InputStreamReceivingAgentMapper.java:107)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.allocateRecordTransformerCore(InputStreamReceivingAgentMapper.java:75)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.map(InputStreamReceivingAgentMapper.java:67)

@at com.podiumdata.coop.service.impl.mapreduce.InputStreamReceivingAgentMapper.map(InputStreamReceivingAgentMapper.java:36)

The same error occurs every day when attempting to send the csv file for the date in question.

Trying to read up and grasp how Hadoop processes records split across block boundaries, but still not really clear on that. 😉
Would like to know if anyone can help out with understanding the possible root causes for this kind of an issue.

Thanks a bunch for any assistance,
Tomasz

Bender · ‎07-07-2021

Hello @Tomek ,

the exception is coming from Podium Data. Please reach out to Qlik Support, as based on the stack trace, the issue occurs in the Podium Data code, hence we do not have access to its sourcecode.

Kind regards:

Ferenc

Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

View solution in original post

Bender · ‎07-07-2021

Hello @Tomek ,

the exception is coming from Podium Data. Please reach out to Qlik Support, as based on the stack trace, the issue occurs in the Podium Data code, hence we do not have access to its sourcecode.

Kind regards:

Ferenc

Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Tomek · ‎07-08-2021

Hello Ferenc,

Thank you for your update and the info provided.
Makes sense. Will reach out to Qlik.

Have a great day!
Tom

Cloudera Community

Support Questions

MapReduce job failed - unable to find split record boundary

MapReduce: FixedRecordReader - Partial record foun...

Split CSV between Multiple Records in Apache NIFI

insert splitted records in database using apache n...

MapReduce: 0 records written from Reducer

NiFi: ConvertCSVToAvro processor is unable to find...

oozie is unable to submit job getting below error

Split JSON after Convert Record (CSVtoJSON) creati...

MR job Split locations null

XML Processing: Encoding, Validation, Parsing & Sp...

Single records of a file split into multiple?