I have business like getting file in daily basis do some process and and append output file into already processed file and add one more column id dynamically increment by 1 max(lastgenerated output id). See the below problem and facing problem from day2 like below //below issue not facing when run 10 to 20 records
i/p col1, col2 aa,bb cc,dd
o/p id,col1,col2 1,aa,bb 2,cc,dd
day 2(append yesterday data and today processed input)
current spark output generating like below id,col1,col2 1,aa,bb 2,cc,dd 1,ee,ff 3,gg,hh
I'm running straight max query // step 1 val maxVal = select max(id) from output step 2 select row_number() over (order by 1)+maxVal, col1,col2 from todayProcessedData step 3 append step 2 data with yesterday output and store result(day 3 dat will append the this result).
Pleaseeeeeeeeeee help me. why spark sql behaving. even in local machine also