Scenario: Daily data source is providing files, which needs to be get ingested into HDFS. CONDITION: data source will push the files somewhere. we can't go and pull the files from the data source. Our solution is that the data source will place the files on edge node via SFTP and then a process will pick up these files and transfer these files using "HDFS put" to the HDFS. Is it a good practice to store data on the Edge Node? Is Edge Node should be thin layer not as a storage? if the process to pick up the files is not running and data starts getting on the edge node, may be disk full issue, impact to other process as out of memory etc. might happen. What is the current industry practice to move files from data sources to HDFS? -- Thanks David.
... View more