Created 02-16-2017 01:05 PM
Hi
I was able to load data remotely through the webhdfs rest api ,but it doesn't allow to load a big volume of data remotely.Is there any possibilty to load huge data remotely ,i ask if there is a hadoop api to do that ?
Thank you
Created 02-16-2017 01:14 PM
what is the error you are hitting? Can you paste the error ?
Also what is the size of data you are loading in hadoop ?
Created 02-16-2017 01:31 PM
@Sagar Shimpi i am using the webhds api through .net and c# source code.when i use a little size of data , the operation of the load through remote works well but when i tried about 850MB a csv file it doesn't work , i have tried several example of volumes but there is still problem ,am sure my code because i tried it with a little csv file (little size) is there other api of hadoop to access a remote hadoop cluster ? Thanks
Created 02-16-2017 07:13 PM
You always have scalability problems with the REST API, and possibly timeout issues as well. Have you considered installing a HDFS client on the machine where you have the data, and using the native protocol?
Created 02-17-2017 09:51 AM
@Hellmar BeckerYou mean like creating a cluster and ensure the duplication of the data in order to be reachable by the remote hadoop cluster ? and use the normal load operation ?
Created 02-17-2017 09:57 AM
No, I mean have a single edge node that has only a Hadoop client installed (via Ambari or manually) and where you have the files to be uploaded available. Then upload the files in native hdfs protocol, which makes use of the distributed nature of Hadoop.