am planning to use a stack which uses Hadoop, Hive and Impala for analysing big data. I have the setup ready and now I am trying to import data from a MySQL table. The table size is more than 500 GB and I am planning to use Sqoop as follows
Now the main issue here is I need to transfer more than 500 GB of data from a remote server to the one which has hadoop installation. Is there any better method for doing this ? Is it possible to compress the data somehow and reduce the size?