Member since
12-09-2015
3
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6015 | 04-11-2016 07:21 AM |
04-11-2016
07:21 AM
1 Kudo
@Eyad Garelnabi Found a solution for the join queries. I am using MySQL DB as source. I tried incremental with Sakila DB which comes with the MySQL installatoin. I thought I will share my experience that why lengthy response. I apologize for the lengthy answer. My first sqoop job was: sqoop job --create myssb1 -- import --connect jdbc:mysql://<hostname>:<port>/sakila --username admin --password admin --driver=com.mysql.jdbc.Driver --query "select * from sakila.address a inner join sakila.city c on a.city_id=c.city_id WHERE $CONDITIONS" --incremental lastmodified --check-column last_update --last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import --hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java address=String It was a straight forward join query. select * from sakila.address a inner join sakila.city c on a.city_id=c.city_id WHERE $CONDITIONS But it gave exception: Imported Failed: Duplicate Column identifier specified: 'city_id' city_id is available in both city as well as address tables. I modified the job to: sqoop job --create myssb1 -- import --connect jdbc:mysql://<hostname>:<port>/sakila --username admin --password admin --driver=com.mysql.jdbc.Driver --query "select a.address_id as address_id, a.address as address, a.district as district, a.city_id as city_id, a.postal_code as postal_code, a.last_update as alast_update, c.city_id as cityid, c.city as city, c.country_id as country_id, c.last_update as clast_update from sakila.address a inner join sakila.city c on a.city_id=c.city_id WHERE $CONDITIONS" --incremental lastmodified --check-column alast_update --last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import --hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java address=String I gave unique alias to individual column in select clause as mentioned in sqoop cook book Chapter 4. "use SQL projection to rename columns in the query so that each column in the output result set has a unique name." The SQL is select a.address_id as address_id, a.address as address, a.district as district, a.city_id as city_id, a.postal_code as postal_code, a.last_update as alast_update, c.city_id as cityid, c.city as city, c.country_id as country_id, c.last_update as clast_update from sakila.address a inner join sakila.city c on a.city_id=c.city_id WHERE $CONDITIONS I am referring alias name for check-column (alast_update) attribute as well in sqoop job. But it gave exception again: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'alast_update' in 'where clause' I was going through lot of online material to resolve this issue. Then I saw this answer in stackoverflow (Click here). Though the answer talks about MS SQL, the solution provided there worked for MySQL and PostreSQL as well. my new job is: sqoop job --create myssb1 -- import --connect jdbc:mysql://<hostname>:<port>/sakila --username admin --password admin --driver=com.mysql.jdbc.Driver --query "SELECT address_id, address, district, city_id, postal_code, alast_update, cityid, city, country_id, clast_update FROM(SELECT a.address_id as address_id, a.address as address, a.district as district, a.city_id as city_id, a.postal_code as postal_code, a.last_update as alast_update, c.city_id as cityid, c.city as city, c.country_id as country_id, c.last_update as clast_update FROM sakila.address a INNER JOIN sakila.city c ON a.city_id=c.city_id) as sub WHERE $CONDITIONS" --incremental lastmodified --check-column alast_update --last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import --hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java address=String The SQL is SELECT address_id, address, district, city_id, postal_code, alast_update, cityid, city, country_id, clast_update
FROM (
SELECT a.address_id AS address_id, a.address AS address, a.district AS district, a.city_id AS city_id, a.postal_code AS postal_code, a.last_update AS alast_update, c.city_id AS cityid, c.city AS city, c.country_id AS country_id, c.last_update AS clast_update
FROM sakila.address a
INNER JOIN sakila.city c
ON a.city_id=c.city_id) AS sub
WHERE $CONDITIONS
The solution worked for us. @Eyad Garelnabi Please let me know your thoughts on this. Thanks again to @Eyad Garelnabi Suresh Babu
... View more
04-01-2016
04:41 AM
Hi Eyad Garelnabi, I apologize for the delay. Your solution works. Thank you. Actually we are trying Join queries. Till now we are not successful with incremental import. I will definitely post the solution once we crack join queries incremental import. Thanks once again, Suresh
... View more
03-21-2016
12:11 PM
2 Kudos
Please point me to documentation for the same. More helpful if I get examples as well
... View more
Labels:
- Labels:
-
Apache Sqoop