I have imported some data (30 million rows) to the quickstart VM (single node) using the following command. sqoop import -Dmapreduce.map.memory.mb=1024 -Dmapreduce.map.java.opts=-Xmx7200m -Dmapreduce.task.io.sort.mb=2400 --connect jdbc:mysql://my.ip.address/perkdbdev --username root -P --hive-import --table table_name --as-parquetfile --warehouse-dir=/home/cloudera/hadoop -m 10 It has added the data successfully, but I am facing with an issue when I issue any aggregate function statement from the Hue Query editor interface for Impala. I am getting argument of type 'NoneType' is not iterable Do I need to do anything after importing the data to avoid this error ? I googled for sometime, but the only information I got is that this is some error with Python, but I am not very sure about it. I am getting the error while running the following query . select count(id) from my_table_name;
... View more
am planning to use a stack which uses Hadoop, Hive and Impala for analysing big data. I have the setup ready and now I am trying to import data from a MySQL table. The table size is more than 500 GB and I am planning to use Sqoop as follows sqoop import --connect jdbc:mysql://remote_host_ip/database_name --username user_name -P --table table_name --hive-import --compression-codec=snappy --as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1 Now the main issue here is I need to transfer more than 500 GB of data from a remote server to the one which has hadoop installation. Is there any better method for doing this ? Is it possible to compress the data somehow and reduce the size?
... View more