Support Questions

sunile_manjee · ‎08-20-2018

There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?

Here is my code:

import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )

and I get the following error:

java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020

It expect hdfs but the prefix for ADLS is adl. Any ideas?

sunile_manjee · ‎08-20-2018

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

View solution in original post

sunile_manjee · ‎08-20-2018

I found a solution:

import scala.sys.process._ 


val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

chourasiasakshi · ‎08-26-2018

I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?

Cloudera Community

Support Questions

Iterate over ADLS files using spark?