Created 08-20-2018 06:44 PM
There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS?
Here is my code:
import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )
and I get the following error:
java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020
It expect hdfs but the prefix for ADLS is adl. Any ideas?
Created 08-20-2018 08:26 PM
I found a solution:
import scala.sys.process._ 
val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		Created 08-20-2018 08:26 PM
I found a solution:
import scala.sys.process._ 
val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
					
				
			
			
				
			
			
			
			
			
			
			
		Created 08-26-2018 07:03 AM
I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?
 
					
				
				
			
		
