Created on 09-11-202012:47 PM - edited on 09-16-202008:25 AM by VidyaSargur
Recently I was engaged in a use case where CDE processing was required to be triggered once data landed on s3. The s3 trigger in AWS would be via a Lambda function. As the files/data land in s3, an AWS Lambda function would be triggered to then call CDE to process the data/files. Lambda functions at trigger time include the names and locations of the files the trigger was executed upon. The file locations/names would be passed onto the CDE engine to pick up and process accordingly.
Create a Lambda function on an s3 bucket (Code provided above)
Trigger on put/post
Load a file or files on s3 (any file)
AWS Lambda is triggered by this event which calls CDE. The call to CDE will include the locations and names of all files the trigger was executed upon
CDE will launch, processing the files, and end gracefully
It's quite simple.
Create a CDE Job
Name: Any Name. I called it testjob
Spark Application: Jar file provided above
Main Class: com.cloudera.examples.SimpleCDERun
Lambda
Create an AWS Lambda function to trigger on put/post for s3. The lambda function code is simple. It will call CDE for each file posted to s3. Lambda function provided in the artifacts section above.
The following are the s3 properties:
Trigger CDE
Upload a file to s3. Lambda will trigger the CDE job. For example, I uploaded a file test.csv to s3. Once the file was uploaded, Lambda calls CDE to execute a job on that file
Lambda Log
The first arrow shows the file name (test.csv). The second arrow shows the CDE JobID, which in this case returned the number 14.
In CDE, Job Run ID: 14
In CDE stdout logs show that the job received the location and name of the file which Lambda was triggered upon.
As I said in my last post, CDE is making things super simple. Enjoy.