am planning to use a stack which uses Hadoop, Hive and Impala for analysing big data. I have the setup ready and now I am trying to import data from a MySQL table. The table size is more than 500 GB and I am planning to use Sqoop as follows
sqoop import --connect jdbc:mysql://remote_host_ip/database_name
--username user_name -P
--table table_name
--hive-import
--compression-codec=snappy
--as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1
Now the main issue here is I need to transfer more than 500 GB of data from a remote server to the one which has hadoop installation. Is there any better method for doing this ? Is it possible to compress the data somehow and reduce the size?