Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn Distributed Shell

avatar
Expert Contributor

We are trying to run sqoop command using Yarn distributed shell. All the command is written within shell script. The sqoop client is not installed in NameNode.But Sqoop is installed all the data node.

While running the script ,we are getting "ExecScript.sh: line 5: sqoop: command not found" error.

P.S-The sqoop command running fine in the individual data node.

Am I missing anything here?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Constantin Stanca : Thanks for your clarifications. We are getting list of shell scripts from an upstream application. These shell scripts contains shell command and Sqoop command as well which needs to run on multiple nodes in a Hadoop cluster.

View solution in original post

5 REPLIES 5

avatar
Super Guru

@Rajib Mandal

You MUST run the script in any node where you have sqoop client deployed. Sqoop client is your entry point. It is not a good practice anyway, even if you had sqoop client installed on NameNode to execute it from there. It is also not a good practice to have clients installed on data nodes. You need to dedicate 1-2 machines as EDGE NODES where you install all clients needed and use those to submit jobs. By running on data nodes you impact the resources of the data node and the job client and data nodes processes can impact each other. You need the isolation between data nodes and client nodes.

If any of the responses to your question addressed the problem don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

avatar
Expert Contributor

@Constantin Stanca

Thanks for your suggestions. Is it possible to execute sqoop command using Yarn distributed shell?

avatar
Super Guru

@Rajib Mandal

I would say, No. Here are the facts as I know them:

1. Sqoop is an application that depends on MapReduce.

2. YARN distributed shell is an example of a non-MapReduce application built on top of YARN. Distributed-Shell is a simple mechanism for running shell commands and scripts in containers on multiple nodes in a Hadoop cluster. There are multiple existing implementations of a distributed shell that administrators typically use to manage a cluster of machines, and this application is a way to demonstrate how such a utility can be implemented on top of YARN.

I expect Sqoop to work from command line and I don't expect, by design, to execute from Yarn distributed shell. Sqoop is installed to use YARN by default and it will allocate containers for tasks executed as part of the MapReduce. Distributed shell does not understand MapReduce and can't dictate which container to use to complete a MapReduce job.

Could you describe for what you are using the distributed shell until attempting to use it for Sqoop?

avatar
Expert Contributor

@Constantin Stanca : Thanks for your clarifications. We are getting list of shell scripts from an upstream application. These shell scripts contains shell command and Sqoop command as well which needs to run on multiple nodes in a Hadoop cluster.

avatar
Super Guru

@Rajib Mandal

I get it, but usually Sqoop jobs are kicked with a Scheduler. As I said, Sqoop is already taking advantage of YARN containers and is MapReduce dependent. Yarn distributed shell is not the appropriate way to handle this type of Sqoop jobs. Again, YARN distributed shell is an example of a non-MapReduce application built on top of YARN.

***

If any of the responses to your question helped don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.