Support Questions

cjervis · ‎06-10-2020

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select max(id) from NodeName a where $CONDITIONS' \
--split-by a.id \
--target-dir /user/lrm0613/queries_output/tt

Generates the following error:

20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)

I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.

EricL · ‎06-11-2020

Sorry, can you try below instead?

select max(id) as id from NodeName a where $CONDITIONS

BTW, do you really just want to import single MAX value into HDFS?

View solution in original post

EricL · ‎06-10-2020

Hi Heri,

As mentioned in the error:

if using free form query import (consider adding clause AS if you're using column transformation)

You max(id) aggregate function does not have "AS" clause, please change your query to below and try again:

select max(id) as max_id from NodeName a where $CONDITIONS

Cheers
Eric

Heri · ‎06-11-2020

Thank you Eric,

I changed the form of the query:

--query 'select max(id) as maxid from NodeName where $CONDITIONS'

we had some progress, but for some reason sqoop is translating the query into the following form:

20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1
20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634
20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:[email protected] (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.
20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.

Any ideas?

EricL · ‎06-11-2020

Sorry, can you try below instead?

select max(id) as id from NodeName a where $CONDITIONS

BTW, do you really just want to import single MAX value into HDFS?

Heri · ‎06-12-2020

Thank you Eric,

We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive.

Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive.

We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source:

Append mode for imports is not compatible with HCatalog.

EricL · ‎06-12-2020

Hi Heri,

Glad that it helped and thanks for the info.

Cheers
Eric

Cloudera Community

Support Questions

max() function generating an error in sqoop