Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Streaming - Read all text files

Highlighted

Spark Streaming - Read all text files

New Contributor

Hi All-

 

We are using Spark streaming saveAsTextFiles method to process incoming sales data(Itemcode, Qty, Amount, SaleDate etc). For e.g every 10 sec.

 

val ssc = new StreamingContext(sparkConf, Seconds(10))

val lines = ssc.textFileStream(args(0))   --arg(0) : source directory

 

 //further processing

 

Let us say, source directory already have files such as text1.txt, text2.txt

 

As and when there is a new file, text3.txt in the source location, text3.txt will be processed and results will be written in console.

 

Is there a way to stream all files from the source directory ie merge result of text1.txt, tex2.txt, text3.txt ?

 

thanks

 

 

1 REPLY 1

Re: Spark Streaming - Read all text files

New Contributor

Hi, You need to use fileStream instead of text stream.

StreamingContext scc = new StreamingContext(conf, new Duration(10000));

scc.fileStream

 

--- some example to use

 

ClassTag <LongWritable> k = ClassTag$.MODULE$.apply(LongWritable.class);

 

ClassTag <Text> v = ClassTag$.MODULE$.apply(Text.class);

 

//ClassTag <InputFormat<LongWritable,Text>> t = (ClassTag<InputFormat<LongWritable,Text>>)(Object)ClassTag$.MODULE$.apply(InputFormat.class);

ClassTag <TextInputFormat> t = ClassTag$.MODULE$.apply(TextInputFormat.class);

 

 

    InputDStream<Tuple2<LongWritable,Text>> ans = scc.fileStream("hutashan/folder", f, false, k, v, t);

ans.print();

scc.start();

scc.awaitTermination();