Support Questions

Report Inappropriate Content · ‎08-18-2017

br_gilvan · ‎08-19-2017

I don't know Pyton, but if you don't need specifically to use Pyton, you can use NIFI.

The NIFI has many processors to this purpose.

You can get files from FTP, FS, HDFS, and to ingest to HDFS.

I hope to helped.

gnovak · ‎08-21-2017

@swathi thukkaraju

I'm not completely sure what you mean by 'incremental load format', but here are some hints:

To read FTP server files you can simply use the builtin python module urllib, more specifically urlopen or urlretrieve
To write to HDFS you can
- Use an external library, like HdfsCLI
- Use the HDFS shell and call it from python with subprocess
- Mount your HDFS with HDFS NFS Gateway and simply write with the normal write() method. Beware, that using this solution you won't be able to append!

Here's an implementation for you using urlopen and HdfsCli. To try it first install HdfsCli with pip install hdfs.

from urllib.request import urlopen
from hdfs import InsecureClient

# You can also use KerberosClient or custom client
namenode_address = 'your namenode address'
webhdfs_port = 'your webhdfs port' # default for Hadoop 2: 50070, Hadoop 3: 9870
user = 'your user name'
client = InsecureClient('http://' + namenode_address + ':' + webhdfs_port, user=user)

ftp_address = 'your ftp address'
hdfs_path = 'where you want to write'
with urlopen(ftp_address) as response:
    content = response.read()
    # You can also use append=True
    # Further reference: https://hdfscli.readthedocs.io/en/latest/api.html#hdfs.client.Client.write
    with client.write(hdfs_path) as writer:
        writer.write(content

Cloudera Community

Support Questions

How read ftp server files and load into hdfs in incremental load format using python

NiFi Cluster and Load Balancer

API Incremental load in NiFi using end_time

Data loading in CDP Private Cloud using beeline a...

sqoop incremental load failed with timestamp

Crash Investigation: NameNode failed to load FsIma...

Hadoop and LDAP: Usage, Load Patterns and Tuning

S3 loading into HDFS

How to load .csv file and read it in Flume?

How to achieve better load-balancing using NiFi's ...

how to do incremental load using sqoop shell scrip...