So i'm having trouble loading the files that sqoop creates when you import from a DB into a Hive table.
It executes with no errors, but doesn't actually move the data to the table.
I have created tables from .csv files before, but not from the files that sqoop creates (they look like this part-m-00000)
When I need to access Sqoop data via Hive, I create an external table that points to the Sqoop directory on HDFS. You need to define your schema and the line/field terminators you used for the Sqoop job.
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User', country STRING COMMENT 'country of origination') COMMENT 'This is the staging page view table' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE LOCATION '<hdfs_location>';
You can also import from Sqoop directly to hive with the --hive-import parameter.
sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --hive-import
So if I run the LOAD DATA command from the command line, it works, but when i run the exact same command from Ambari, it doesn't do anything. Any ideas why that would happen?
What user are you using at the command line? What user are you using in Ambari? Is it possible the Ambari user doesn't have access to the files on HDFS? Do you see any error messages in the Ambari logs?
@Josh Persinger Since the Data you got from Sqoop runs MR job it creates partfiles and logs in the particular directory.
Note: When the directory contains non data file ( in HDFS), Load data fails to execute.
eg: has generated a dir called my_data_dir. Under my_data_dir there are two data files /my_data_dir/part-m-00000 & /my_data_dir/part-m-00001 . There is also a log file named /my_data_dir/_logs
Delete the log file and then try to execute the LOad from the directory, command should works fine.