Support Questions

vjain · ‎12-03-2015

I am attempting to use SQOOP on a HANA tables of size 180 TB (compressed, 800TB on disk) into a HIVE table. When I pass LIMIT in query argument, the number of rows I get is 4 times the amount passed as LIMIT. So 250 LIMIT fetched 1000 rows. And they are not duplicated.

Another issue I am facing is with fetch-size. When I pass the fetch size, the process errors out with the message, "Search Limit exceeded"

nsabharwal · ‎12-04-2015

@Vedant Jain

sqoop uses 4 mappers by default. Try running with option -m 1 or any other number to see if it makes the difference.

Copying following line from this as it does make sense.

Using the "top x" or "limit x" clauses do not make much sense with Sqoop as it can return different values on each query execution (there is no "order by"). Also in addition the clause will very likely confuse split generation, ending with not that easily deterministic outputs. Having said that I would recommend you to use only 1 mapper (-m 1 or --num-mappers 1) in case that you need to import predefined number of rows

View solution in original post

nsabharwal · ‎12-04-2015

@Vedant Jain

sqoop uses 4 mappers by default. Try running with option -m 1 or any other number to see if it makes the difference.

Copying following line from this as it does make sense.

Using the "top x" or "limit x" clauses do not make much sense with Sqoop as it can return different values on each query execution (there is no "order by"). Also in addition the clause will very likely confuse split generation, ending with not that easily deterministic outputs. Having said that I would recommend you to use only 1 mapper (-m 1 or --num-mappers 1) in case that you need to import predefined number of rows

vjain · ‎12-04-2015

Yes, that solved the problem. Thanks!

Cloudera Community

Support Questions

SQOOP HANA to HIVE ORC