sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select max(id) from NodeName a where $CONDITIONS' \
--split-by a.id \
--target-dir /user/lrm0613/queries_output/tt
Generates the following error:
20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)
I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.
Created on 06-11-2020 02:45 PM - edited 06-11-2020 02:46 PM
Sorry, can you try below instead?
select max(id) as id from NodeName a where $CONDITIONS
BTW, do you really just want to import single MAX value into HDFS?
Created on 06-10-2020 03:13 PM - edited 06-10-2020 03:14 PM
Hi Heri,
As mentioned in the error:
if using free form query import (consider adding clause AS if you're using column transformation)
You max(id) aggregate function does not have "AS" clause, please change your query to below and try again:
select max(id) as max_id from NodeName a where $CONDITIONS
Cheers
Eric
Created 06-11-2020 05:12 AM
Thank you Eric,
I changed the form of the query:
--query 'select max(id) as maxid from NodeName where $CONDITIONS'
we had some progress, but for some reason sqoop is translating the query into the following form:
20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1
20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634
20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:lrm0613@JACKSONNATIONAL.COM (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.
20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.
Any ideas?
Created on 06-11-2020 02:45 PM - edited 06-11-2020 02:46 PM
Sorry, can you try below instead?
select max(id) as id from NodeName a where $CONDITIONS
BTW, do you really just want to import single MAX value into HDFS?
Created 06-12-2020 08:06 AM
Thank you Eric,
We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive.
Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive.
We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source:
Append mode for imports is not compatible with HCatalog.
Created 06-12-2020 05:27 PM