Member since
01-09-2019
2
Posts
0
Kudos Received
0
Solutions
01-09-2019
08:40 AM
Hi, Thanks for your support. We have performed the above scenario as said by you. We have configured the settings : Increased Connection Timeout and Data Timeout, Remote Poll Batch Size = 5000. But we are facing the same problem. When we set the remote poll size to 1000,size of data pushed to HDFS is 1.2 GB,when set 5000 size of data pushed is 6.4 GB,when set it to 20000 data pushed is 25 GB, our data consists of 225 GB (containing 50 sub folders, no.of total files ~ 1,75,000). So what can be the possible way for transferring the full data. We have attached the screscreenshot-from-2019-01-09-12-59-04.pngscreenshot-from-2019-01-09-12-59-49.pngenshots for GetSFTP processor.
... View more
01-09-2019
05:05 AM
I'm trying to load huge data consisting of 225 GB (no. of file ~1,75,000) from SFTP server and copying data to HDFS. To implement above scenario we've used 2 processors. 1. GetSFTP (To get the files from SFTP server) Configured Processor -> serach recursively = true ; use Natural Ordering = true ; Remote Poll Batch Size = 20000; concurrent tasks = 3 2.PutHDFS (Pushing the data to HDFS) Configured Processor -> concurrent tasks = 3; Confict Resolution Strategy = replace ; Hadoop Configuration Resources; Directory But after some time data copying is getting stopped and it's size is not updating in HDFS. But I can't seem to figure out what I'm doing wrong.
... View more
Labels:
- Labels:
-
Apache NiFi