<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Apache spark read in a file from hdfs as one large string in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179317#M64795</link>
    <description>&lt;P&gt;
	Hi,&lt;/P&gt;&lt;P&gt;
	You can do it, by create a simple connection to hdfs with hdfs client. &lt;/P&gt;&lt;P&gt;
	For example in Java, you can do the following:&lt;/P&gt;&lt;PRE&gt;Configuration confFS = new Configuration();
confFS.addResource("/etc/hadoop/conf/core-site.xml");
confFS.addResource("/etc/hadoop/conf/hdfs-site.xml");
FileSystem dfs2 = FileSystem.newInstance(confFS);

Path pt = new Path("/your/file/to/read");

BufferedReader br = new BufferedReader(new InputStreamReader(dfs2.open(pt)));
String myLine;
while ((myLine = br.readLine()) != null) {
	System.out.println(myLine);
}
br.close();
dfs2.close();

&lt;/PRE&gt;&lt;P&gt;This code will create a single connection to hdfs and read a file defined in the variable pt&lt;/P&gt;</description>
    <pubDate>Fri, 14 Jul 2017 16:57:03 GMT</pubDate>
    <dc:creator>msumbul1</dc:creator>
    <dc:date>2017-07-14T16:57:03Z</dc:date>
    <item>
      <title>Apache spark read in a file from hdfs as one large string</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179316#M64794</link>
      <description>&lt;P&gt;I would like to read a large json file from hdfs as a string and then apply some string manipulations.&lt;/P&gt;&lt;P&gt;Not have it transformed into an rdd which is what happens with sc.textFile....&lt;/P&gt;&lt;P&gt;Is there a way I can do that using spark and scala.&lt;/P&gt;&lt;P&gt;Or do I  need to read the file in another way preferably without having to look at configurations of the hive configurations files..&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jul 2017 04:10:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179316#M64794</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2017-07-13T04:10:45Z</dc:date>
    </item>
    <item>
      <title>Re: Apache spark read in a file from hdfs as one large string</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179317#M64795</link>
      <description>&lt;P&gt;
	Hi,&lt;/P&gt;&lt;P&gt;
	You can do it, by create a simple connection to hdfs with hdfs client. &lt;/P&gt;&lt;P&gt;
	For example in Java, you can do the following:&lt;/P&gt;&lt;PRE&gt;Configuration confFS = new Configuration();
confFS.addResource("/etc/hadoop/conf/core-site.xml");
confFS.addResource("/etc/hadoop/conf/hdfs-site.xml");
FileSystem dfs2 = FileSystem.newInstance(confFS);

Path pt = new Path("/your/file/to/read");

BufferedReader br = new BufferedReader(new InputStreamReader(dfs2.open(pt)));
String myLine;
while ((myLine = br.readLine()) != null) {
	System.out.println(myLine);
}
br.close();
dfs2.close();

&lt;/PRE&gt;&lt;P&gt;This code will create a single connection to hdfs and read a file defined in the variable pt&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jul 2017 16:57:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179317#M64795</guid>
      <dc:creator>msumbul1</dc:creator>
      <dc:date>2017-07-14T16:57:03Z</dc:date>
    </item>
    <item>
      <title>Re: Apache spark read in a file from hdfs as one large string</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179318#M64796</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have the same problem. I read a large XML file (~1Gb) and then I do somme calculation.  Have you found a solution ?&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;</description>
      <pubDate>Thu, 05 Apr 2018 19:00:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-spark-read-in-a-file-from-hdfs-as-one-large-string/m-p/179318#M64796</guid>
      <dc:creator>yidhir_moudoub</dc:creator>
      <dc:date>2018-04-05T19:00:20Z</dc:date>
    </item>
  </channel>
</rss>

