Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Import very large amount of data to Hive

Highlighted

Import very large amount of data to Hive

New Contributor

am planning to use a stack which uses Hadoop, Hive and Impala for analysing big data. I have the setup ready and now I am trying to import data from a MySQL table. The table size is more than 500 GB and I am planning to use Sqoop as follows

 

sqoop import --connect jdbc:mysql://remote_host_ip/database_name 
--username user_name -P
--table table_name
--hive-import
--compression-codec=snappy
--as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1


 

Now the main issue here is I need to transfer more than 500 GB of data from a remote server to the one which has hadoop installation. Is there any better method for doing this  ? Is it possible to compress the data somehow and reduce the size? 

Don't have an account?
Coming from Hortonworks? Activate your account here