Created 04-04-2017 11:25 AM
I want to read and write files to and from a remote HDFS. I program by Pycharm in local machine and I want to connect to a remote hdfs (HDP 2.5). Is there any solution?
How can I configure hdfs and how can I refer to a file in hdfs?
Thanks a million,
Shanghoosh
Created 04-04-2017 11:33 AM
Please check this article : https://community.hortonworks.com/articles/26416/how-to-install-snakebite-in-hdp.html You only need to pass the hdfs config files of your remote cluster to the python client.
.
For example in the python code :you can tell where is your core-site.xml /hdfs-site.xml present. It can be any path where these files are present.
Created 04-04-2017 11:33 AM
Please check this article : https://community.hortonworks.com/articles/26416/how-to-install-snakebite-in-hdp.html You only need to pass the hdfs config files of your remote cluster to the python client.
.
For example in the python code :you can tell where is your core-site.xml /hdfs-site.xml present. It can be any path where these files are present.
Created 04-05-2017 05:26 AM
Thank you for excellent notes.
I solved the problem before reading your notes as follows:
from hdfs import InsecureClient client = InsecureClient('http://localhost:50070') # for reading a file with client.read('/tmp/tweets_staging/tweets-082940117.json') as reader: features = reader.read() # for writing a file with client.write('/tmp/tweets_staging/1.json', overwrite=True) as writer: writer.write(features)
but I am going to try your solution too.
Regards,
Shanghoosh