Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I split columns on table that has varchar datatypes only?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
@Christian Lunesa

If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have.

  1. add surrogate int PK and use it also as a split or
  2. split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or
  3. apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column.

Let know if you need any other help!

View solution in original post

4 REPLIES 4

avatar
@Christian Lunesa

If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have.

  1. add surrogate int PK and use it also as a split or
  2. split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or
  3. apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column.

Let know if you need any other help!

avatar
@Christian Lunesa

If the answer helped solving your query, please mark the answer as accepted 🙂

avatar
Contributor

thank you for answering my questions sir, do you mean I have to alter the table in mysql when adding surrogate int PK?

avatar
Explorer

I want to use the third option on the split-by but i am not seeing example of how to pass in that value. I have tried with single quote 'cast(id as unsigned)' and it throws error saying column not found on the sqoop run.