Created 08-18-2017 08:41 PM
Created 08-19-2017 08:38 PM
I don't know Pyton, but if you don't need specifically to use Pyton, you can use NIFI.
The NIFI has many processors to this purpose.
You can get files from FTP, FS, HDFS, and to ingest to HDFS.
I hope to helped.
Created 08-21-2017 07:57 AM
I'm not completely sure what you mean by 'incremental load format', but here are some hints:
Here's an implementation for you using urlopen and HdfsCli. To try it first install HdfsCli with pip install hdfs.
from urllib.request import urlopen from hdfs import InsecureClient # You can also use KerberosClient or custom client namenode_address = 'your namenode address' webhdfs_port = 'your webhdfs port' # default for Hadoop 2: 50070, Hadoop 3: 9870 user = 'your user name' client = InsecureClient('http://' + namenode_address + ':' + webhdfs_port, user=user) ftp_address = 'your ftp address' hdfs_path = 'where you want to write' with urlopen(ftp_address) as response: content = response.read() # You can also use append=True # Further reference: https://hdfscli.readthedocs.io/en/latest/api.html#hdfs.client.Client.write with client.write(hdfs_path) as writer: writer.write(content