question Iterate over ADLS files using spark? in Archives of Support Questions (Read Only)

Iterate over ADLS files using spark?

sunile_manjee — Tue, 21 Aug 2018 01:44:04 GMT

There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?

Here is my code:

import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )

and I get the following error:

java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020

It expect hdfs but the prefix for ADLS is adl. Any ideas?

Re: Iterate over ADLS files using spark?

sunile_manjee — Tue, 21 Aug 2018 03:26:23 GMT

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

Re: Iterate over ADLS files using spark?

chourasiasakshi — Sun, 26 Aug 2018 14:03:13 GMT

I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?