Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop Throwing Invalid ColumnName for the Derived Column when used in split-by

Sqoop Throwing Invalid ColumnName for the Derived Column when used in split-by

New Contributor

Hello All,

 

I am trying to import data into HDFS using Free Form Query. Since my data is could not be split on any of the available columns due its redundancy. So i have used ROW_NUMBER() to give unique values to the records.

But when i try to use it in query, am facing error for some situation and for other it is working. I know there has to be some sort of tweaking, i request anyone to help me on this! Giving the mimic scenario below.

 

Scenario:

Working Situation:

"select * from (select row_number() over (order by column)1 as rn, column1, column2 from table1) base" --split-by rn

 

Failing Situation: ( I dont want "rn" to be populated)

"select column1,column2 from (select row_number() over (order by column)1 as rn, column1, column2 from table1) base" --split-by rn

 

P.S: I dont want the "rn" column to be populated in the HDFS file, because i have a downstream consumption process which would throw error. Any help would be appreciated.

1 REPLY 1

Re: Sqoop Throwing Invalid ColumnName for the Derived Column when used in split-by

Master Collaborator
Invalid column name is a syntax error, raised probably by your DB engine. You have to be specific and paste the query and logs, otherwise its very hard to help
Don't have an account?
Coming from Hortonworks? Activate your account here