Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
Iterate over ADLS files using spark?
Labels:
- Labels:
-
Apache Spark
Master Guru
Created ‎08-20-2018 06:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?
Here is my code:
import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem val path = "adl://mylake.azuredatalakestore.net/" val conf = new Configuration() val fs = FileSystem.get(conf) val p = new Path(path) val ls = fs.listStatus(p) ls.foreach( x => { val f = x.getPath.toString println(f) val content = spark.read.option("delimiter","|").csv(f) content.show(1) } )
and I get the following error:
java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020
It expect hdfs but the prefix for ADLS is adl. Any ideas?
1 ACCEPTED SOLUTION
Master Guru
Created ‎08-20-2018 08:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution:
import scala.sys.process._ val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
2 REPLIES 2
Master Guru
Created ‎08-20-2018 08:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution:
import scala.sys.process._ val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
Explorer
Created ‎08-26-2018 07:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?
