Created 02-01-2017 01:30 PM
In production setups are files loaded into HDFS from a particular machine?
If so, if that machine were also a data node then would not that machine be identified as a co-located client - thus prevent data distribution across the cluster?
Or is the standard practice to load the files from the name node host?
Or what other practice is commonly used for loading files into HDFS?
Appreciate the insights.
Created 02-04-2017 04:09 AM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 02-01-2017 02:24 PM
In production, you would have "edge nodes" where you have client programs install and they are talking to the cluster. But even if you put data in local file system on data node and then copy into HDFS, it will not prevent data distribution. The client file is in local file system (XFS, ext4) which is unrelated to HDFS (well not exactly, but as far as your question is concerned).
Standard practice is to use Edge node and not name node.
Created 02-01-2017 06:14 PM
If moving files into hdfs from datanode will not prevent distribution then when does co-located client dynamic work?
Also is the edge node that you mention a datanode? If not is it simply a machine with hadoop software to facilitate interaction with hdfs?
Appreciate the feedback.
Created 02-03-2017 05:51 PM
is it simply a machine with hadoop software to facilitate interaction with hdfs?
yes.
Created 02-03-2017 02:52 PM
Hello, any response on whether the 'edge node' is a datanode?
Appreciate the feedback.
Created 02-03-2017 05:54 PM
whether the 'edge node' is a datanode?
No. You can if you want, put edge processes like client configs to run client programs on the same node as data node but that doesn't make data node an edge node. Ideally this is not recommended but if you have very small cluster, then sure, no problem with that.
Created 02-03-2017 06:32 PM
So what is required for the edge node to connect to the cluster : hadoop software, core-site.xml, hdfs-site.xml, ... and what else ?
Appreciate the clarification.
Created 02-06-2017 06:17 PM
Can I have a response on what is required for the edge node to connect to the cluster pls?
Appreciate the feedback.
Created 02-01-2017 04:07 PM
You can also securely do this via a REST API over HTTP from any node:
1. WebHDFS: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_hdfs_admin_tools/content/ch12.html
2. HttpFS - If you plan on using WebHDFS in a High Availability cluster (Active and Passive NameNodes)
You can also implement Knox for a single and secure Rest access point (with different port numbers) for: - Ambari - WebHDFS - HCatalog - HBase - Oozie -Hive -Yarn -Resource Manager -Storm http://hortonworks.com/apache/knox-gateway/
http://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/
Created 02-03-2017 06:34 PM
Thanks Binu.