Support Questions

Find answers, ask questions, and share your expertise

max() function generating an error in sqoop

avatar
Contributor

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select max(id) from NodeName a where $CONDITIONS' \
--split-by a.id \
--target-dir /user/lrm0613/queries_output/tt

 

 

Generates the following error:

20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)

 

I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.

1 ACCEPTED SOLUTION

avatar
Super Guru

Sorry, can you try below instead?

select max(id) as id from NodeName a where $CONDITIONS



BTW, do you really just want to import single MAX value into HDFS?

View solution in original post

5 REPLIES 5

avatar
Super Guru

Hi Heri,

As mentioned in the error:

if using free form query import (consider adding clause AS if you're using column transformation)



You max(id) aggregate function does not have "AS" clause, please change your query to below and try again:

select max(id) as max_id from NodeName a where $CONDITIONS



Cheers
Eric

avatar
Contributor

Thank you Eric,

 

I changed the form of the query:

 

--query 'select max(id) as maxid from NodeName where $CONDITIONS'

 

we had some progress, but for some reason sqoop is translating the query into the following form:

 

20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1
20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634
20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:lrm0613@JACKSONNATIONAL.COM (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.
20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1
Invalid column name 'id'.

 

Any ideas?

avatar
Super Guru

Sorry, can you try below instead?

select max(id) as id from NodeName a where $CONDITIONS



BTW, do you really just want to import single MAX value into HDFS?

avatar
Contributor

Thank you Eric,

 

 We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive.

 

Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive. 

 

We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source:

Append mode for imports is not compatible with HCatalog.

 

avatar
Super Guru
Hi Heri,

Glad that it helped and thanks for the info.

Cheers
Eric