Created 08-16-2018 11:15 AM
Hi Guys,
I have made the following nifi flow to load data from Mysql to hdfs to capture data change and putting the same in hdfs creating only one file in ddMmYYYY every day:
QueryDatabaseTable->UpdateAttribute(Attribute name:filename and Value:${now():format("ddMMyyyy")}->PutHdfs.
The above flow works fine while loading data for the first time from mysql but gives the following error if I try to open the file after doing new insert or update using command hdfs dfs -text :
org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) at org.apache.hadoop.fs.shell.Display$AvroFileInputStream.read(Display.java:302) at java.io.InputStream.read(InputStream.java:179) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121) at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:372) Caused by: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:297) at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198) ... 18 more
However if I insert the processor ConvertAvroToJson in between it works fine giving output in json format,but gives above error for avro format.Can any of you suggest a solution to resolve above error?
Created 08-16-2018 11:31 AM
Could you try with introducing SplitAvro Processor in your flow after QueryDatabaseTable processor and configure the processor to create small chunks of flowfile instead of one big AVRO file then try to run your commands again.
Created 08-21-2018 07:02 AM
Hi Shu,
I got the same error after adding split avro in between querydatabasetable and update attribute . Is it not possible with a single avro file I know it works fine with individual small flow files.