Created 12-05-2017 10:53 AM
I am having a YYYYMMdd folder in which I have files with YYYYMMddHHmm format.
ex:
/day/20171202/201712020000..201712022359
Can we zip the folder 20171202 to compress and put it back in the same location.
Created on 12-05-2017 02:45 PM - edited 08-17-2019 08:05 PM
You can do that by using
GetHDFS,GetFTP,GetSFTP processors by using
Keep Source File
false //by default it is set to false.
So once you configure GET processors then all the files in that directory will be deleted.
GetHDFS Configs:-
Then use PutHDFS,PutFTP,PutSFTP processors and change the property
Compression codec
BZIP
Directory
<same-directory-path-as-gethdfs-directory-info>
PutHDFS Configs:-
Right now in Put hdfs processor has been configured the same directory as GetHDFS processo,r we have configured puthdfs processor with Compression codec as BZIP.
When we are storing the data into HDFS directory we are compressing the files and storing them in HDFS directory.
FLOW:-
GetHDFS(Success Relation) //get the files from hdfs directory and delete them in the source directory--> PutHDFS //Compress the files and store them in the same directory source directory.
If you are thinking to merge the files then use merge content processor before PutHDFS processor.
Use the below reference to configure merge content processor.
Created on 12-05-2017 02:45 PM - edited 08-17-2019 08:05 PM
You can do that by using
GetHDFS,GetFTP,GetSFTP processors by using
Keep Source File
false //by default it is set to false.
So once you configure GET processors then all the files in that directory will be deleted.
GetHDFS Configs:-
Then use PutHDFS,PutFTP,PutSFTP processors and change the property
Compression codec
BZIP
Directory
<same-directory-path-as-gethdfs-directory-info>
PutHDFS Configs:-
Right now in Put hdfs processor has been configured the same directory as GetHDFS processo,r we have configured puthdfs processor with Compression codec as BZIP.
When we are storing the data into HDFS directory we are compressing the files and storing them in HDFS directory.
FLOW:-
GetHDFS(Success Relation) //get the files from hdfs directory and delete them in the source directory--> PutHDFS //Compress the files and store them in the same directory source directory.
If you are thinking to merge the files then use merge content processor before PutHDFS processor.
Use the below reference to configure merge content processor.
Created 12-06-2017 04:33 PM
I tried the above as you said.
What I am getting is /day/20171202/YYYYMMddHHmm.bz2
what I am looking for is /day/20171202.zip
can you help me please
Created on 12-06-2017 09:53 PM - edited 08-17-2019 08:05 PM
<br>
Method1:-
Use Execute Process processor with below configs:-
Properties:-
Command
zip
Command Arguments
-rm /day/${now():format('yyyyMMdd')}.zip /day/${now():format('yyyyMMdd')}
i have configured above argument with Expression language but you can change above arguments as per your requirements.
(or)
Method2:-
we can zip the folder by using execute process processor then use execute stream command processor to delete the source directory.
Use Execute Process Processor and Configure the processor as below.
Command
zip
Command Arguments
-r /day/${now():format('yyyyMMdd')}.zip /day/${now():format('yyyyMMdd')}
So in this processor we are using Expression language and Zip command and passing our desired zip folder name and source folder path.
Then use Execute Process(success relation) to Execute Stream command processor to delete the source directory.
Configs:-
For removing directory we need to use a simple shell script
bash# cat del.sh #!/bin/bash rm -rf $1
the above shell script will expects an argument and we are passing that from command Arguments property as
/day/${now():format('yyyyMMdd')}
so in this processor we are removing the directory.
Make sure nifi user having access to delete these directories.
You can choose the best method that fit for your case.
Created on 12-07-2017 10:33 AM - edited 08-17-2019 08:05 PM
I tried the above.
I am getting an error
I have the directory and files, still I am getting this error.
Created on 12-07-2017 09:56 PM - edited 08-17-2019 08:05 PM
I think you are using Windows and windows won't have zip utility by default, Zip utility will be presented in linux env as i tried in linux.
To resolve this you need to download
https://www.microsoft.com/en-us/download/details.aspx?id=17657 and run the .exe file.
In Execute Process Processor use
Command
C:\Program Files (x86)\Windows Resource Kits\Tools\compress.exe //path where compress.exe got installed
Command Arguments
C:\<input directory> C:\<output-directory.zip>
Configs:-
So we are creating zip directory in Execute Process processor.
Your case Input directory like
C:\day\${now():format('yyyyMMdd')}
Output Directory
C:\day\${now():format('yyyyMMdd')}.zip
Then use Execute Stream Command Processor to delete the input Directory(Source directory).
We need to create .bat file that would delete the input directory in this processor.
cmd>remove_dir.bat
@RD /s/q %1
So the above script would get argument and delete the directory we are passing that argument as our input directory.
What is /s and /Q?
RD [/S] [/Q] [drive:]path /S Removes all directories and files in the specified directory in addition to the directory itself. Used to remove a directory tree. /Q Quiet mode, do not ask if ok to remove a directory tree with /S
Configs:-
Command Arguments
"C:\day\${now():format('yyyyMMdd')}"
Command Path
C:<delete-directory.bat file path>
For testing i tried with below configs:-
In this processor we are deleting the input directory.
Created on 12-08-2017 03:17 PM - edited 08-17-2019 08:05 PM
Directory needs to be in local not in hadoop directory to work with zip command.
Make sure zip is installed in your node.
Command to check zip is installed
#zip
after executing zip if it shows output as above that means zip is installed on the node.
if not installed then do
#yum install zip
If you want to do zip the hdfs files then follow below steps:-
Use Get HDFS processor to pick your files from HDFS,Use Configs for gethdfs same as my first answer
then use MergeContent processor with
As every flowfile from GetHDFS processor will have path attribute associated with it, we are using path attribute as our Correlation Attribute Name in merge content processor.
Processor waits for 1 min and merges all the flow files that having same path attribute.
Change Keep Path property as per your requirements.
Keep Path | false |
| If using the Zip or Tar Merge Format, specifies whether or not the FlowFiles' paths should be included in their entry names; if using other merge strategy, this value is ignored |
But you can change the configs as per your requirements by following below reference to configure merge content processor.
Then in Put HDFS processor Use configs as my first answer and change property to
Compression codec
NONE
Because we are doing zipping in merge content processor it self no need to do compression again in PutHDFS processor.
Created 12-08-2017 02:26 PM
Thanks for the detailed explination.
I am using nifi 1.1 and in linux env.
Also the /day folder is in hdfs which is in linux.
Iam also wondered why zip was not working.
Created 12-21-2017 02:06 PM