Support Questions
Find answers, ask questions, and share your expertise

Sqoop jobs are failing while loading more data

Sqoop jobs are failing while loading more data

Hi,

We are performing some sort of stress testing. How fast data can ingested in running 5 parallel imports? We have many clients so during POC, we like to check how much load we can put during data ingestion in parallel.

While i was importing the data from single sql server instance it was working fine. Now while i am trying to import data from 5 sql server databases on the same sql server instance using 5 different sqoop import command, no job is finishing its job. All are showing the import failure message.

We have oracle VM virtual box setup with following resources

1. RAM : 24 GB

2. Disk : 256 GB

3. Processors assigned : 6 (Hypertheading enabled)

What could be potential reasons for the data ingestion failure?

Do i need to have more resources assigned to the VM ?

Going to real physical cluster is not an option yet for us. Cloudera forum is mostly unresponsive that why i have been posting questions here

15 REPLIES 15

Re: Sqoop jobs are failing while loading more data

Mentor
@Nirvana India

I guess you have a couple of things to do.

1.Check the VMware memory allocations see attached doc

2. You have to also check the sqoop memory tuning see link

Re: Sqoop jobs are failing while loading more data

@Nirvana India

1) You don't have yarn queues setup and that's why your 4 others jobs is waiting on the 1st job to finish.

Please look into the logs and provide more details.

Re: Sqoop jobs are failing while loading more data

yarn is setup. When i restarted yarn, all jobs failed. 4 of them were showing 0% map, 0% reduce. Do you want Sqoop log or Yarn log ?

Re: Sqoop jobs are failing while loading more data

@Nirvana India I meant Capacity Scheduler. Please errors from sqoop or job log ...Just need to see the error message

Re: Sqoop jobs are failing while loading more data

@Neeraj Sabharwal

not finding any error on /var/log/sqoop2/sqoop.log.

Re: Sqoop jobs are failing while loading more data

@Neeraj Sabharwal

Now when i am trying to run the sqoop import --hive-import --hive-table with other parameters again. Its throwing : Encountered IOException running import job : org.apache.hadoop.mared.FileAlreadyExistsExeption : Output director hdfs://quickstart.cloudera:8020/user/cloudera/tableName alredy exists.

When i am trying to access the table specified in the --hive-table switch in the hive database, there is not such table. I think half the task is done.

Re: Sqoop jobs are failing while loading more data

Earlier i had imported the data from sql server at the same time. Yarn is also setup and FIFO is default Capacity scheduler behavior. I think issue is something else

Re: Sqoop jobs are failing while loading more data

@Nirvana India CDH cluster? I am going to be honest...I have heard Yarn issues in CDH so I don't know how CDH is setup.

hdfs://quickstart.cloudera:8020/user/cloudera/tableName alredy exists

You can drop the table if you find it.

Re: Sqoop jobs are failing while loading more data

@Neeraj Sabharwal :

I dont find any directory like on linux hdfs://quickstart.cloudera:8020/user/cloudera/tableName.

I am new to cloudera quickstart vm and linux too