Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How do I split columns on table that has varchar datatypes only?

avatar
New Member
 
1 ACCEPTED SOLUTION

avatar
@Christian Lunesa

If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have.

  1. add surrogate int PK and use it also as a split or
  2. split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or
  3. apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column.

Let know if you need any other help!

View solution in original post

4 REPLIES 4

avatar
@Christian Lunesa

If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have.

  1. add surrogate int PK and use it also as a split or
  2. split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or
  3. apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column.

Let know if you need any other help!

avatar
@Christian Lunesa

If the answer helped solving your query, please mark the answer as accepted 🙂

avatar
New Member

thank you for answering my questions sir, do you mean I have to alter the table in mysql when adding surrogate int PK?

avatar
Visitor

I want to use the third option on the split-by but i am not seeing example of how to pass in that value. I have tried with single quote 'cast(id as unsigned)' and it throws error saying column not found on the sqoop run.