Reply
New Contributor
Posts: 2
Registered: ‎02-01-2018

Sqoop job limiting number of rows

Hi

 

I have a 7 node cluster on RHCP7.1. I have a sqoop job that is trying to import data from an Oracle table into Hive. It is a single table and I am using one partition that has 40 billlion rows in it. the job has 10 mappers and is split by on an ID column that is an Integer. There is an index on that ID column in Oracle

 

Now the problem is, the mappers that have greater than 2 billion rows are completing successfully but the row count in those is much higher. The table below will give you an idea

 

MappersSqoopOracle
m017090277001709027700
m1340656511340656511
m221474830003431813617
m321474830004649556868
m421474830004567876345
m521474830008156384917
m621474830007844967352
m721474830004153074965
m821474830002650539503
m914546459051454645905

 

What could it be? I read somewhere that this could be a cluster configuration issue. Someone else suggested this could be a limitation of the driver. Any pointers, anyone?

Highlighted
New Contributor
Posts: 2
Registered: ‎02-01-2018

Re: Sqoop job limiting number of rows

As a work around, we are now running smaller batches that are running successfully. I would, however, like to get to thee bottom of this as we have much larger tables to be ingested

Announcements
New solutions