<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: sftp transfer to hdfs in spark as opposed to using a command in a script in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/sftp-transfer-to-hdfs-in-spark-as-opposed-to-using-a-command/m-p/50906#M48618</link>
    <description>Disclosure: Never done this.&lt;BR /&gt;&lt;BR /&gt;I read the readme of that project. If that is what you want to do that would be a way to do it. The note at the bottom spells out the restriction though and it follows what I was thinking. It says that it doesn't run in a spark job but a SparkContext is created and used; so it must. This means that while it runs in a driver or executor it only works in local mode and will only run on the node you launch it from. For me this removes any benefits of using Spark for this piece of the workflow. It would be better to use Flume or some other ingestion tool.&lt;BR /&gt;&lt;BR /&gt;But yes you could use this project or write your own java, scala app to read sftp and write to HDFS.&lt;BR /&gt;&lt;BR /&gt;SFTP files are fetched and written using jsch. It is not executed as spark job. It might have issues in cluster</description>
    <pubDate>Tue, 14 Feb 2017 21:15:32 GMT</pubDate>
    <dc:creator>mbigelow</dc:creator>
    <dc:date>2017-02-14T21:15:32Z</dc:date>
  </channel>
</rss>

