Hello Team,
I have .dat files ranging from 2GB - 500GB that are present in SAS server. I need to get these files and load it into Hive table. How can this be automated? I considered few options but they do not transfer the files fast.
Flume and Nifi: These are meant for realtime data ingestion. Though i can use them, but it will be an overkill. They are simply not meant for file transfer.
distcp: It is between HDFS cluster.On one server, i do not have HDFS
scp/winscp: These would be very slow over the network and for files >500G, I dont think anyone would recommend this approach.
Sqoop: I will not have access to tables, If i manage to get aceess, will i be able to transfer files by increasing number of mappers?
Any thoughts/suggestions are welcome and appreciated.