Hi I have a 7 node cluster on RHCP7.1. I have a sqoop job that is trying to import data from an Oracle table into Hive. It is a single table and I am using one partition that has 40 billlion rows in it. the job has 10 mappers and is split by on an ID column that is an Integer. There is an index on that ID column in Oracle Now the problem is, the mappers that have greater than 2 billion rows are completing successfully but the row count in those is much higher. The table below will give you an idea Mappers Sqoop Oracle m0 1709027700 1709027700 m1 340656511 340656511 m2 2147483000 3431813617 m3 2147483000 4649556868 m4 2147483000 4567876345 m5 2147483000 8156384917 m6 2147483000 7844967352 m7 2147483000 4153074965 m8 2147483000 2650539503 m9 1454645905 1454645905 What could it be? I read somewhere that this could be a cluster configuration issue. Someone else suggested this could be a limitation of the driver. Any pointers, anyone?
... View more