Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Cannot read sequence file which was created by NiFi CreateHadoopSequenceFile processor.

avatar
New Member

Hi!

I have a dataflow in which I create a sequence file from multiple files and load it to hdfs.

60522-nifi-hdfs.png

Unfortunately I cannot correctly read the generated file in Spark.

For example, I generate 5 txt files:

1.txt
1
2.txt
2
22
3.txt
3
33
333
4.txt
4
44
444
4444
5.txt
5
55
555
5555
55555

and create from those files the new sequence file.

After that I try to read the resulting file:

60523-nifi-hdfs2.png

We can see there are corrupted or trash characters in output (they are zero bytes).

How I can get rid from those unnecessary bytes?

Some additional screenshots are attached.

60521-nifi-hdfs5.png

60520-nifi-hdfs4.png

60519-nifi-hdfs3.png

60518-nifi-hdfs2.png

1 ACCEPTED SOLUTION

avatar
New Member

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.

View solution in original post

1 REPLY 1

avatar
New Member

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.