Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Appending flowfiles to the same AVRO file results in java.io.IOException: Invalid sync!

Appending flowfiles to the same AVRO file results in java.io.IOException: Invalid sync!

New Contributor

Hi,

I´m trying to append flowfiles to a single file with AVRO format in HDFS but it keeps failing.

The last part of the flow looks like this:

-> ConvertJSONToAvro -> MergeContent (buffering 100 flowfiles) -> UpdateAttribute (set the filename) -> PutHDFS (Conflict resolution strategy: Append)

The first 100 flowfiles are successfully written to the file. The file is ok and I can display it´s content. When the next set of 100 flowfiles are appended to the file it becomes corrupt. The following error message is shown when I display the file content:

Fatal internal error

org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!

at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)

at org.apache.hadoop.fs.shell.Display$AvroFileInputStream.read(Display.java:303)

at java.io.InputStream.read(InputStream.java:179)

at java.io.InputStream.read(InputStream.java:101)

at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)

at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:62)

at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:122)

at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)

at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)

at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)

at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)

at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)

at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)

at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)

at org.apache.hadoop.fs.shell.Command.run(Command.java:165)

at org.apache.hadoop.fs.FsShell.run(FsShell.java:297)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

at org.apache.hadoop.fs.FsShell.main(FsShell.java:356)

Caused by: java.io.IOException: Invalid sync!

at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)

at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)

... 18 more

I have tried different settings in both MergeContent and PutHDFS but none of them seems to work.

Is it possible to append JSON structured data to a AVRO file in HDFS successfully?

If so, do you have any ideas on how a working configuration may look like?

Regards,

Staffan

Don't have an account?
Coming from Hortonworks? Activate your account here