Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Iterate over ADLS files using spark?

Solved Go to solution
Highlighted

Iterate over ADLS files using spark?

Super Guru

There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?

Here is my code:

import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )


and I get the following error:

java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020

It expect hdfs but the prefix for ADLS is adl. Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Iterate over ADLS files using spark?

Super Guru

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
2 REPLIES 2

Re: Iterate over ADLS files using spark?

Super Guru

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

Re: Iterate over ADLS files using spark?

New Contributor

I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?

Don't have an account?
Coming from Hortonworks? Activate your account here