- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Getting error of avro runtime exception invalid sync while loading data from mysql to hdfs in avro format with same filename
- Labels:
-
Apache NiFi
Created ‎08-16-2018 11:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
I have made the following nifi flow to load data from Mysql to hdfs to capture data change and putting the same in hdfs creating only one file in ddMmYYYY every day:
QueryDatabaseTable->UpdateAttribute(Attribute name:filename and Value:${now():format("ddMMyyyy")}->PutHdfs.
The above flow works fine while loading data for the first time from mysql but gives the following error if I try to open the file after doing new insert or update using command hdfs dfs -text :
org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) at org.apache.hadoop.fs.shell.Display$AvroFileInputStream.read(Display.java:302) at java.io.InputStream.read(InputStream.java:179) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:372) Caused by: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:297) at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198) ... 18 more
However if I insert the processor ConvertAvroToJson in between it works fine giving output in json format,but gives above error for avro format.Can any of you suggest a solution to resolve above error?
Created ‎08-16-2018 11:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try with introducing SplitAvro Processor in your flow after QueryDatabaseTable processor and configure the processor to create small chunks of flowfile instead of one big AVRO file then try to run your commands again.
Created ‎08-21-2018 07:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shu,
I got the same error after adding split avro in between querydatabasetable and update attribute . Is it not possible with a single avro file I know it works fine with individual small flow files.
