Created 08-20-2018 06:44 PM
There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?
Here is my code:
import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem val path = "adl://mylake.azuredatalakestore.net/" val conf = new Configuration() val fs = FileSystem.get(conf) val p = new Path(path) val ls = fs.listStatus(p) ls.foreach( x => { val f = x.getPath.toString println(f) val content = spark.read.option("delimiter","|").csv(f) content.show(1) } )
and I get the following error:
java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020
It expect hdfs but the prefix for ADLS is adl. Any ideas?
Created 08-20-2018 08:26 PM
I found a solution:
import scala.sys.process._ val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
Created 08-20-2018 08:26 PM
I found a solution:
import scala.sys.process._ val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
Created 08-26-2018 07:03 AM
I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?