Support Questions
Find answers, ask questions, and share your expertise

Text file formatting using Spark

Contributor

Using Scala 2.10 and Spark 1.6.

Trying to format Input_file_001.txt to output.txt as below,

Input_file_001.txt:

Dept 0100 Batch Load Errors for 8/16/2016 4:45:56 AM 

Case 1111111111
Rectype: ABCD Key:UMUM_REF_ID=A12345678,UMSV_SEQ_NO=1UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAID 

Case 2222222222
Rectype: ABCD Key:UMUM_REF_ID=B87654321,UMSV_SEQ_NO=2UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAID 
NTNB ERROR  :Invalid Value                          NTNB_MCTR_SUBJ=AMOD

Case 3333333333
Rectype: WXYZ Key:UMUM_REF_ID=U19817250,UMSV_SEQ_NO=2UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAID 

Output file as

output.txt

file_name~case~Rectype~key,Error
Input_file_001.txt~1111111111~ABCD~UMUM_REF_ID=A12345678,UMSV_SEQ_NO=1~UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAIDInput_file_001.txt~2222222222~ABCD~UMUM_REF_ID=B87654321,UMSV_SEQ_NO=2~UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAID,NTNB ERROR  :Invalid Value                          NTNB_MCTR_SUBJ=AMODInput_file_001.txt~3333333333~WXYZ~UMUM_REF_ID=U19817250,UMSV_SEQ_NO=2~UMSV ERROR  :UNITS_ALLOW must be > or = UNITS_PAID

I was trying to achieve it like below,

val source = sc.textFile("Input_file_001.txt")
val fileread = source.filter(x => ! x.startsWith("Dept"))).filter(_.nonEmpty).map(z => z.trim)

above code giving me Array[String], not able to take it forward. Any help appreciated.

0 REPLIES 0