Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to create Sequence number for block of records

Highlighted

How to create Sequence number for block of records

New Contributor

I want to generate sequence number for a block of record(like A,B,C as one block), input size is around 50 GB

Input:

Adata1

Bdata2

Cdata3

AData4

Bdata5

Cdata6

Cdata7

Output

1Adata1

1Bdata2

1Cdata3

2AData4

2Bdata5

2Cdata6

2Cdata7

I'm not able to achieve the output as above since Parallel processing can't be done. Since the input split into multiple part file, we are not able to achieve the result.

Is there a way we can able to generate the KEY,

when ever the A record come the key has to increment.

Don't have an account?
Coming from Hortonworks? Activate your account here