Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Max(column) getting different value every time in spark sql

Highlighted

Max(column) getting different value every time in spark sql

New Contributor

Hi,

I have business like getting file in daily basis do some process and and append output file into already processed file and add one more column id dynamically increment by 1 max(lastgenerated output id).
See the below problem and facing problem from day2 like below
//below issue not facing when run 10 to 20 records

day 1

i/p
col1, col2
aa,bb
cc,dd

o/p
id,col1,col2
1,aa,bb
2,cc,dd

day 2(append yesterday data and today processed input)

i/p
col1, col2
ee,ff
gg,hh

expected o/p
id,col1,col2
1,aa,bb
2,cc,dd
3,ee,ff
4,gg,hh

current spark output generating like below
id,col1,col2
1,aa,bb
2,cc,dd
1,ee,ff
3,gg,hh

I'm running straight max query //
step 1 val maxVal = select max(id) from output
step 2 select row_number() over (order by 1)+maxVal, col1,col2 from todayProcessedData
step 3 append step 2 data with yesterday output and store result(day 3 dat will append the this result).

Pleaseeeeeeeeeee help me. why spark sql behaving. even in local machine also

Thanks
Suresh Selvaraj

Don't have an account?
Coming from Hortonworks? Activate your account here