Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Streaming - Read all text files

Spark Streaming - Read all text files

New Contributor

Hi All-


We are using Spark streaming saveAsTextFiles method to process incoming sales data(Itemcode, Qty, Amount, SaleDate etc). For e.g every 10 sec.


val ssc = new StreamingContext(sparkConf, Seconds(10))

val lines = ssc.textFileStream(args(0))   --arg(0) : source directory


 //further processing


Let us say, source directory already have files such as text1.txt, text2.txt


As and when there is a new file, text3.txt in the source location, text3.txt will be processed and results will be written in console.


Is there a way to stream all files from the source directory ie merge result of text1.txt, tex2.txt, text3.txt ?






Re: Spark Streaming - Read all text files

New Contributor

Hi, You need to use fileStream instead of text stream.

StreamingContext scc = new StreamingContext(conf, new Duration(10000));



--- some example to use


ClassTag <LongWritable> k = ClassTag$.MODULE$.apply(LongWritable.class);


ClassTag <Text> v = ClassTag$.MODULE$.apply(Text.class);


//ClassTag <InputFormat<LongWritable,Text>> t = (ClassTag<InputFormat<LongWritable,Text>>)(Object)ClassTag$.MODULE$.apply(InputFormat.class);

ClassTag <TextInputFormat> t = ClassTag$.MODULE$.apply(TextInputFormat.class);



    InputDStream<Tuple2<LongWritable,Text>> ans = scc.fileStream("hutashan/folder", f, false, k, v, t);