Created 06-08-2018 05:22 AM
Hi All ,
I am using sqoop (1.4.6 Hadoop 2.6.0) import tool to fetch data from oracle db. I dont have any issues till the database been upgraded from 11g to 12c last week.
Sample sqoop script
sqoop import --connect jdbc:oracle:thin@ORACLEDB --query " select * from table_A where id>20002 and $CONDITIONS"
--split-by id -m 20 --hive-database hivedb --hive-table tab --as-parquetfile --hive-overwrite --hive-import --hive-compute-stats --delete-target-dir
[id is unique key field , every time i will drop the table tab separately and create it with sqoop ]
Problem i am facing : Number of records available in db and number of records sqooped is not matching. always i am seeing some records getting missed.,Sqoop import dropping records while importing huge tables from oracle 12c with columns greater than 700.
I reprocess the missing records like "select * from table_A where id = missing_id " it is working fine.
Options Explored :
1. Ran with single mapper by removing --split-by and change to -m 1
2. Instead of --query i replaced with --table --where
3. Tried both 1 & 2 togeather
4. Instead of creating as hive table i tried storing as hdfs file
5. Changed the file format
All the above options didnt help.
6. when i sqooped only with random 200 columns with different datatypes -- IT WORKED 🙂 but when i repeat with full table it doesn't.
7. Copied the table back to 11g and ran the sqoop process it completed with out any record miss
Is there any issue with Sqoop to oracle driver /connector ? do we need to upgrade oracle client in hadoop server ?
is anyone face this issue ? any suggestions to overcome this .
Could you please help me out , i appreciate your help .
Thanks
Created 06-08-2018 07:41 PM
Hey @Guru Karthik!
Do you have any error/warn messages on Sqoop Logs?
Or a bunch of connection reset? And could you try to add the --verbose to your sqoop command?
And from Oracle's side is there any config/parameter that may need to adjust to sent all this data to jdbc(sqoop)?
BTW: Just asking, but are you using the last ojdbc7.jar?
Hope this helps! 🙂
Created 06-13-2018 07:26 PM
Thanks for responding and sorry for late response.
As you mentioned i ran in Verbose and including ojdbc7.jar using -libjars explicitly. I didnt see any error or connection reset and able to see ojdbc7.jar is added to the session.But Still i am having the record miss issue.
I ran with 4 mapper , i observed records missing in all the split.
Records in oracle 37217 42445 40739 38403
Records sqooped 37183 42407 40706 38387
Created 06-13-2018 08:51 PM
Hey @Guru Karthik!
Maybe you can try to put this ojdbc7.jar directly into the Sqoop lib directory (usually /usr/hdp//sqoop/lib and restart sqoop).
BTW, is there any uncommon datatype in this oracle table (blob, clob, binary or xml).
And are you using any extra parameter (E.g. --hive-drop-import-delims)?
One last thing, I made a research here and found this:
25.8.4.3. Consistent Read: All Mappers Read From The Same Point In Time -Doraoop.import.consistent.read={true|false} When set to false (by default) each mapper runs a select query. This will return potentially inconsistent data if there are a lot of DML operations on the table at the time of import. Set to true to ensure all mappers read from the same point in time. The System Change Number (SCN) is passed down to all mappers, which use the Oracle Flashback Query to query the table as at that SCN. NoteValues true | false are case sensitive.<br>By default the SCN is taken from V$database. You can specify the SCN in the following command-Doraoop.import.consistent.read.scn=12345https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_how_the_standard_oracle_manager_works_for_i...
This may be useful for you 🙂
Hope this helps!
Created 10-21-2022 09:47 AM
@karthik8789 Was the above issue fixed? If yes , what was the action taken to solve the issue?