Support Questions

Find answers, ask questions, and share your expertise

Cannot read sequence file which was created by NiFi CreateHadoopSequenceFile processor.

avatar
Explorer

Hi!

I have a dataflow in which I create a sequence file from multiple files and load it to hdfs.

60522-nifi-hdfs.png

Unfortunately I cannot correctly read the generated file in Spark.

For example, I generate 5 txt files:

1.txt
1
2.txt
2
22
3.txt
3
33
333
4.txt
4
44
444
4444
5.txt
5
55
555
5555
55555

and create from those files the new sequence file.

After that I try to read the resulting file:

60523-nifi-hdfs2.png

We can see there are corrupted or trash characters in output (they are zero bytes).

How I can get rid from those unnecessary bytes?

Some additional screenshots are attached.

60521-nifi-hdfs5.png

60520-nifi-hdfs4.png

60519-nifi-hdfs3.png

60518-nifi-hdfs2.png

1 ACCEPTED SOLUTION

avatar
Explorer

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.

View solution in original post

1 REPLY 1

avatar
Explorer

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.