Member since
11-27-2017
7
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1442 | 02-16-2018 06:18 AM |
02-16-2018
06:18 AM
It is totally my fault! I have used wrong method (getBytes method) to get bytes from BytesWritable class object. There is copyBytes method for that purpose.
... View more
02-13-2018
01:33 PM
Hi! I have a dataflow in which I create a sequence file from multiple files and load it to hdfs. Unfortunately I cannot correctly read the generated file in Spark. For example, I generate 5 txt files: 1.txt
1
2.txt
2
22
3.txt
3
33
333
4.txt
4
44
444
4444
5.txt
5
55
555
5555
55555 and create from those files the new sequence file. After that I try to read the resulting file: We can see there are corrupted or trash characters in output (they are zero bytes). How I can get rid from those unnecessary bytes? Some additional screenshots are attached.
... View more
Labels:
11-27-2017
05:26 PM
Sorry. I have read MergeContent documentation and realized my mistake. Thank you!
... View more
11-27-2017
05:02 PM
Hello @Abdelkrim Hadjidj There are a lot of types of files. For example, png, bmp, pdf and etc. And i think it is bad idea, for example, to merge two pdf files. I think that Sequence Files were developed to store small files effectively. It is strange that CreateHadoopSequenceFile processor does not have capabilities to accumulate small files to bigger file.
... View more
11-27-2017
04:09 PM
I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark. I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi