Created 01-05-2018 03:49 AM
My scenario is to check the fileName and then if the file contains specific word then I need to pick that file for processing.
Eg: in mydirectory I have two filenames:
file1: sample1.txt_processed
file2: sample2.txt
Now I need to check the file names with and without "_processed" keyword and pick only files without "_processed" text in file name.
Can any one help me on this scenario.
Created 01-05-2018 05:23 AM
It will be possible in with unix and spark combination.
hadoop fs -ls /filedirectory/*txt_processed
Above command will return the desired file you need. Then pass the result to spark and process the file as you need.
Alternatively in spark you can select the desired file using the below command.
val lsResult =Seq("hadoop","fs","-ls","hdfs://filedirectory/*txt_prcoessed").!!
Hope it helps !
Created 01-05-2018 05:26 AM
This would also work.
import java.io.File
val files = getListOfFiles("/tmp")
def getListOfFiles(dir: File):List[File] = dir.listFiles.filter(_.isFile).toList