Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

sqoop imports data from oracle exadata has duplicates

Highlighted

sqoop imports data from oracle exadata has duplicates

New Contributor

Hi I have used sqoop with oracle exadata which results in complete row duplicate ,at present we are removing using the distinct query and dumping into another target table,Please suggest on this

Background for oracle table :

Oracle used for sqoop import have no primary keys involved (i.e) tables are of scd type2 and have complex keys as primary keys which does not suit split by option and tables are very huge(100gig)

Command used for sqoop import from oracle exadata

sqoop import --connect %s@//%s:%s/%s --username %s -password %s --table %s.%s --fields-terminated-by '%s' --hive-drop-import-delims --hive-import --hive-overwrite --hive-table %s.%s --null-string '\\\N' --null-non-string '\\\N' --m %s --fetch-size=2500