- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sqoop Import: "-Dorg.apache.sqoop.splitter.allow_text_splitter=true"
- Labels:
-
Apache Sqoop
Created ‎10-17-2016 10:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to import oracle table to HDFS directory, but getting the error "Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter"
I fixed the import issue by giving "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" in sqoop import. But why do we need to set this property? I imported other tables without setting this property. When should we set this property?
Created ‎10-17-2016 11:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The property "Dorg.apache.sqoop.splitter.allow_text_splitter=true" is required when you are using --split-by is used on a column which is of text type. There is difference in the TextSplitter class of Sqoop jars in HDP 2.4 and HDP 2.5 because of sqoop command fails without the required argument in HDP 2.5.
Created ‎10-17-2016 11:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The property "Dorg.apache.sqoop.splitter.allow_text_splitter=true" is required when you are using --split-by is used on a column which is of text type. There is difference in the TextSplitter class of Sqoop jars in HDP 2.4 and HDP 2.5 because of sqoop command fails without the required argument in HDP 2.5.
Created ‎10-17-2016 12:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This JIRA has some info about why you need to set the property: https://issues.apache.org/jira/browse/SQOOP-2910
Created ‎10-27-2016 04:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can you tell me exactly how you added the -Dorg.apache.sqoop.splitter.allow_text_splitter=true to your import statement?
I'm trying to create a sqoop job with this import statement but it keeps failing:
16/10/27 17:53:25 ERROR tool.BaseSqoopTool: Error parsing arguments for import: 16/10/27 17:53:25 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dorg.apache.sqoop.splitter.allow_text_splitter=true
my shell script to create the sqoop job looks like this:
#!/bin/bash sqoop job \ --create <tablename> \ -- \ import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \ --connect "jdbc:<URL>"
If I remove the -Dorg line all goes file but offcourse the execution of the job failes thats why I need to pass this allow_text_splitter as a parameter.
I tried with double quotes, without, but no luck unfortunately
Created ‎10-27-2016 04:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try giving it just after sqoop job. Eg: Sqoop job "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" -- import ...
Created ‎10-27-2016 04:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
worked! many thanks, you saved my day.
Created ‎03-06-2020 01:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you include that in a line through code?
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://localhost/sqoop \
--username root \
-passqord 0range \
--split-by id \
--columns id,name \
--table customer \
--target-dir /user/cloudera/ingest/raw/customers \
--fields-terminated-by "," \
--hive-import \
--create-hive-table \
--hive-table sqoop_workspace.customers
Created ‎03-06-2020 06:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since this thread was marked 'Solved' back in 2016, you would have a better chance of receiving a relevant response by posting a new question. This will also provide the opportunity to provide details specific to your environment that could aid other members in providing a more tailored answer to your issue.
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
