I have a table CustomerQueries in SQL Server. The columns are
CUSTOMER, THRESHOLD, SQL_QUERY, OUTPUT_FROM_HIVE, HIVE_STATUS
The primary key is a composite key - customer + threshold.
This may hold a couple of hundred records. OUTPUT_FROM_HIVE, HIVE_STATUS should be updated later.
Now my requirement is run the queries in Hive and update the columns OUTPUT_FROM_HIVE, HIVE_STATUS in above table in SQL Server.
The result set from Hive is only a few records (Max 10 records).
Could you please suggest any approach for this?
You can take the advantage of sqoop export. Please check in sqoop documenntation, if more help is required do let me know.
Thanks for your response.
I can do sqoop import the table to Hive. Then I am not sure how to read the table and execute the queries stored in the table, and store the output from hive queries. I need to set up this in Oozie work flow.
If you need to export complete hive table then use this command:
sqoop export --connect 'RDBMS jdbc connection ' --username 'rdbms user' --password 'rdbms password' --table 'rdbms table' --hcatalog-table 'hive table'
Fetch the datafile from the beeline based on using query. Put it on HDFS and then do sqoop export from this file.
sqoop export --connect 'jdbc connection' --username ur_user --password ur_password--table your_table_name --export-dir 'Your HDFS Data Directory' --input-fields-terminated-by 'your csv terminator'
There must be a way to import this file into the SQL server table as well.