Support Questions
Find answers, ask questions, and share your expertise

Iterate over ADLS files using spark?

Super Guru

There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?

Here is my code:

import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )


and I get the following error:

java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020

It expect hdfs but the prefix for ADLS is adl. Any ideas?

1 ACCEPTED SOLUTION

Super Guru

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

View solution in original post

2 REPLIES 2

Super Guru

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?

; ;