Created 07-12-2017 09:10 PM
I would like to read a large json file from hdfs as a string and then apply some string manipulations.
Not have it transformed into an rdd which is what happens with sc.textFile....
Is there a way I can do that using spark and scala.
Or do I need to read the file in another way preferably without having to look at configurations of the hive configurations files..
Thank you
Created 07-14-2017 09:57 AM
Hi,
You can do it, by create a simple connection to hdfs with hdfs client.
For example in Java, you can do the following:
Configuration confFS = new Configuration(); confFS.addResource("/etc/hadoop/conf/core-site.xml"); confFS.addResource("/etc/hadoop/conf/hdfs-site.xml"); FileSystem dfs2 = FileSystem.newInstance(confFS); Path pt = new Path("/your/file/to/read"); BufferedReader br = new BufferedReader(new InputStreamReader(dfs2.open(pt))); String myLine; while ((myLine = br.readLine()) != null) { System.out.println(myLine); } br.close(); dfs2.close();
This code will create a single connection to hdfs and read a file defined in the variable pt
Created 07-14-2017 09:57 AM
Hi,
You can do it, by create a simple connection to hdfs with hdfs client.
For example in Java, you can do the following:
Configuration confFS = new Configuration(); confFS.addResource("/etc/hadoop/conf/core-site.xml"); confFS.addResource("/etc/hadoop/conf/hdfs-site.xml"); FileSystem dfs2 = FileSystem.newInstance(confFS); Path pt = new Path("/your/file/to/read"); BufferedReader br = new BufferedReader(new InputStreamReader(dfs2.open(pt))); String myLine; while ((myLine = br.readLine()) != null) { System.out.println(myLine); } br.close(); dfs2.close();
This code will create a single connection to hdfs and read a file defined in the variable pt
Created 04-05-2018 12:00 PM
Hello,
I have the same problem. I read a large XML file (~1Gb) and then I do somme calculation. Have you found a solution ?
Regards,