We are performing some sort of stress testing. How fast data can ingested in running 5 parallel imports? We have many clients so during POC, we like to check how much load we can put during data ingestion in parallel.
While i was importing the data from single sql server instance it was working fine. Now while i am trying to import data from 5 sql server databases on the same sql server instance using 5 different sqoop import command, no job is finishing its job. All are showing the import failure message.
We have oracle VM virtual box setup with following resources
1. RAM : 24 GB
2. Disk : 256 GB
3. Processors assigned : 6 (Hypertheading enabled)
What could be potential reasons for the data ingestion failure?
Do i need to have more resources assigned to the VM ?
Going to real physical cluster is not an option yet for us. Cloudera forum is mostly unresponsive that why i have been posting questions here
yarn is setup. When i restarted yarn, all jobs failed. 4 of them were showing 0% map, 0% reduce. Do you want Sqoop log or Yarn log ?
Now when i am trying to run the sqoop import --hive-import --hive-table with other parameters again. Its throwing : Encountered IOException running import job : org.apache.hadoop.mared.FileAlreadyExistsExeption : Output director hdfs://quickstart.cloudera:8020/user/cloudera/tableName alredy exists.
When i am trying to access the table specified in the --hive-table switch in the hive database, there is not such table. I think half the task is done.
Earlier i had imported the data from sql server at the same time. Yarn is also setup and FIFO is default Capacity scheduler behavior. I think issue is something else
@Nirvana India CDH cluster? I am going to be honest...I have heard Yarn issues in CDH so I don't know how CDH is setup.
hdfs://quickstart.cloudera:8020/user/cloudera/tableName alredy exists
You can drop the table if you find it.