Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Import very large amount of data to Hive


Import very large amount of data to Hive

New Contributor

am planning to use a stack which uses Hadoop, Hive and Impala for analysing big data. I have the setup ready and now I am trying to import data from a MySQL table. The table size is more than 500 GB and I am planning to use Sqoop as follows


sqoop import --connect jdbc:mysql://remote_host_ip/database_name 
--username user_name -P
--table table_name
--as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1


Now the main issue here is I need to transfer more than 500 GB of data from a remote server to the one which has hadoop installation. Is there any better method for doing this  ? Is it possible to compress the data somehow and reduce the size? 

Don't have an account?
Coming from Hortonworks? Activate your account here