Archives of Support Questions (Read Only)

rfatkullin · ‎02-13-2018

Hi!

I have a dataflow in which I create a sequence file from multiple files and load it to hdfs.

Unfortunately I cannot correctly read the generated file in Spark.

For example, I generate 5 txt files:

1.txt
1
2.txt
2
22
3.txt
3
33
333
4.txt
4
44
444
4444
5.txt
5
55
555
5555
55555

and create from those files the new sequence file.

After that I try to read the resulting file:

We can see there are corrupted or trash characters in output (they are zero bytes).

How I can get rid from those unnecessary bytes?

Some additional screenshots are attached.

rfatkullin · ‎02-16-2018

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.

View solution in original post

rfatkullin · ‎02-16-2018

It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.

Cloudera Community

Archives of Support Questions (Read Only)

Cannot read sequence file which was created by NiFi CreateHadoopSequenceFile processor.