Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Master Guru

2020-09-11_13-51-24.jpg

 

 

Recently I was engaged in a use case where CDE processing was required to be triggered once data landed on s3. The s3 trigger in AWS would be via a Lambda function. As the files/data land in s3, an AWS Lambda function would be triggered to then call CDE to process the data/files. Lambda functions at trigger time include the names and locations of the files the trigger was executed upon. The file locations/names would be passed onto the CDE engine to pick up and process accordingly.

Prerequisites to run this demo

  • AWS account
  • s3 Bucket
  • Some knowledge of Lambda
  • CDP and CDE

Artifacts

Processing Steps

  1. Create a CDE Job (Jar provided above)
  2. Create a Lambda function on an s3 bucket (Code provided above)
    1. Trigger on put/post
  3. Load a file or files on s3 (any file)
  4. AWS Lambda is triggered by this event which calls CDE.  The call to CDE will include the locations and names of all files the trigger was executed upon
  5. CDE will launch, processing the files, and end gracefully

It's quite simple.  

Create a CDE Job

  • Name: Any Name. I called it testjob
  • Spark Application: Jar file provided above
  • Main Class: com.cloudera.examples.SimpleCDERun 

2020-09-11_14-16-36.jpg

 

Lambda

Create an AWS Lambda function to trigger on put/post for s3.  The lambda function code is simple.  It will call CDE for each file posted to s3. Lambda function provided in the artifacts section above. 

 

The following are the s3 properties:

2020-09-11_14-20-19.jpg

Trigger CDE

Upload a file to s3.  Lambda will trigger the CDE job.  For example, I uploaded a file test.csv to s3. Once the file was uploaded, Lambda calls CDE to execute a job on that file

 

Lambda Log

The first arrow shows the file name (test.csv).  The second arrow shows the CDE JobID, which in this case returned the number 14.

2020-09-11_14-30-57.jpg

 

In CDE, Job Run ID: 14

2020-09-11_14-32-52.jpg

 

In CDE stdout logs show that the job received the location and name of the file which Lambda was triggered upon.

2020-09-11_14-33-17.jpg

 

As I said in my last post, CDE is making things super simple.  Enjoy.

1,379 Views