Support Questions
Find answers, ask questions, and share your expertise

Sqoop import from oracle to hdfs: No more data to read from socket error

New Contributor

I'm trying to import data from Oracle to HDFS using Sqoop. Oracle version: 10.2.0.2 Table is not having constraints. When I mention number of mappers(-m) and --split-by parameters, it's showing the error: No more data to read from socket. If I mention -m 1(setting the number of mappers as 1), it's running, but taking too much time. Sqoop command: sqoop import --connect jdbc:oracle:thin:@host:port:SID --username uname --password pwd --table abc.market_price --target-dir /ert/etldev/etl/market_price -m 4 --split-by MNTH_YR

Please help me.


6 REPLIES 6

Mentor

@Irene Mathew

When you add the parameters mappers(-m) and --split-by parameters it expects some input ! If you want to reduce the time increase the mappers.

examples

New Contributor

@Geoffrey Shelton Okot : I do agree with it. Whenever I put a mapper number more than 1, it's showing error: no data to read from socket. Can you suggest a solution?

Mentor

@Irene Mathew

The optimal number of mappers depends on many variables: you need to take into account your database type, the hardware that is used for your database server, and the impact to other requests that your database needs to serve. There is no optimal number of mappers that works for all scenarios. Instead, you’re encouraged to experiment to find the optimal degree of parallelism for your environment and use case. It’s a good idea to start with a small number of mappers, slowly ramping up, rather than to start with a large number of mappers, working your way down.

The link examples in my previous post you have a snippet to calculate the cores hence mappers that could see your import time drop down

Here is the link again

New Contributor

Yeah..thanks for the link. It helps to tune the performance. So my question is not about finding the optimal number of mappers. If I put m=2,3 or 4, it should work instead of throwing SQLException. But it's not working.

Mentor

@Irene Mathew

Yes precisely that doc give you the code to run , the output of

dcli -C 'cat `find /var/ -name yarn-site.xml|grep NODEMAN | sort -k1 -n|head -1`|grep -A 1 "yarn.nodemanager.resource.cpu-vcores"|grep -v name'|awk -F'</?value>' 'NF>1{print $2}' | awk '{a=a+$1} END {print a}'

Should be the value for your mappers .. Try that out and let me know

Mentor

@Irene Mathew

Any updates on this posting.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.