Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Getting error of avro runtime exception invalid sync while loading data from mysql to hdfs in avro format with same filename

New Contributor

Hi Guys,

I have made the following nifi flow to load data from Mysql to hdfs to capture data change and putting the same in hdfs creating only one file in ddMmYYYY every day:

QueryDatabaseTable->UpdateAttribute(Attribute name:filename and Value:${now():format("ddMMyyyy")}->PutHdfs.

The above flow works fine while loading data for the first time from mysql but gives the following error if I try to open the file after doing new insert or update using command hdfs dfs -text :

org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) at org.apache.hadoop.fs.shell.Display$AvroFileInputStream.read(Display.java:302) at java.io.InputStream.read(InputStream.java:179) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:372) Caused by: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:297) at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198) ... 18 more

However if I insert the processor ConvertAvroToJson in between it works fine giving output in json format,but gives above error for avro format.Can any of you suggest a solution to resolve above error?

2 REPLIES 2

Super Guru
@Parth Karkhanis

Could you try with introducing SplitAvro Processor in your flow after QueryDatabaseTable processor and configure the processor to create small chunks of flowfile instead of one big AVRO file then try to run your commands again.

New Contributor

Hi Shu,

I got the same error after adding split avro in between querydatabasetable and update attribute . Is it not possible with a single avro file I know it works fine with individual small flow files.