Support Questions

Find answers, ask questions, and share your expertise

Sqoop job limiting number of rows

New Contributor

Hi

 

I have a 7 node cluster on RHCP7.1. I have a sqoop job that is trying to import data from an Oracle table into Hive. It is a single table and I am using one partition that has 40 billlion rows in it. the job has 10 mappers and is split by on an ID column that is an Integer. There is an index on that ID column in Oracle

 

Now the problem is, the mappers that have greater than 2 billion rows are completing successfully but the row count in those is much higher. The table below will give you an idea

 

MappersSqoopOracle
m017090277001709027700
m1340656511340656511
m221474830003431813617
m321474830004649556868
m421474830004567876345
m521474830008156384917
m621474830007844967352
m721474830004153074965
m821474830002650539503
m914546459051454645905

 

What could it be? I read somewhere that this could be a cluster configuration issue. Someone else suggested this could be a limitation of the driver. Any pointers, anyone?

2 REPLIES 2

New Contributor

As a work around, we are now running smaller batches that are running successfully. I would, however, like to get to thee bottom of this as we have much larger tables to be ingested

New Contributor

The ResultSet.getRow method (which isn't required to be implemented) returns an int which, being signed, has a max value of 2^31

Since the getRow method, if implemented, returns an int it would NOT be possible for it to return a value larger than Integer.MAXSIZE.

May be related to :

https://community.oracle.com/tech/developers/discussion/4001667/resultset-row-limits-with-ojdbc7-jar

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.