Using S3 to invoke AWS Lambda
Setting up AWS Lambda Function
Introduction
Serverless computing has become very popular in recent years. Serverless computing allows teams/organizations to develop an application without worrying about the server. Official AWS documentation describes it as - Run Code Without Thinking About Servers. There are many use cases in which serverless computing can be beneficial over the traditional approach. If done in the right way, it can also help small teams save money on infrastructure.
In this article, I am going to provide basic tutorial on how to setup a lambda function on AWS and add python dependancies as well. To understand the basics of lambda function, please refer to this link.
Use case
In this article, we are going to have a simple use case. We want to execute the lambda function when a file is uploaded to S3, then process it using python pandas
and then store it back to another S3 bucket.
This is a fairly easy scenario but the same concept can be applied to perform various actions like pushing data to DB, sending alerts etc.
Prerequisite
You should have an AWS account and know python basics to follow this tutorial.
In addition to that, for our use case, we need to create a role with the following policies:
- AmazonS3ReadOnlyAccess
- AWSLambdaBasicExecutionRole
- S3 Put policy (This is not readily available, follow this youtube video to do that.) - This is required to save processed file on S3, if you just want to read file, you can choose not to do this step.
Developing a lambda function
There are 2 ways to develop a lambda function.
- Using AWS console/editor
- Using
SAM
(Serverless Application Model) to develop it on a local machine
In this article we are going to focus on the 1st approach which is easy but for more complex scenarios not an ideal option. The 2nd approach will allow more flexibility as we can develop and test the lambda function locally as well as SAM
will take care of dependancies as well as deployment.
Steps to create a lambda function for our use case
- Create a lambda function which processes a file.
- Add dependancies (A Layer in the context of lambda function)
- Process data
- Create a trigger to invoke lambda function when a file is uploaded in S3.
Step 1 - Create a lambda function which processes a file:
After loggin in the AWS, search for lambda
in the top search bar in the AWS console. This will redirect you the lambda
dashboard. The dashboard contains various stats related to AWS lambda.
From the left menu, click on the Functions
link to see all the lambda functions associated with the account. On the right hand side, there is a Create function
button. Click on that button.
We have few options here to get started:
- Author from Scratch
- Use blueprint
- Use Container
- Browse Serverless App Repository
We are going to create a lambda function from scratch. Here are the required fileds and put values as I have mentioned below:
Function Name = My-Test-Function
Runtime = Python 3.8
Architecture = x86_64
Permissions = Use the role created from the Prerequisite section
Step 2 - Add dependancies:
There are multiple ways of doing this but I am going to provide the easiest way here. For the sack of understanding, I am providing details on both the approaches. Let’s call them manual process and automated process.
- Manual Process: This is a bit lengthy process. First thing to notice for this process is the architecture that we selected in Step1 -
x86_64
and the second thing is runtime -python 3.8
in our case. Here are the steps that needs to be executed to includepandas
.
- Naviagete to pandas download page.
- Click on the
Download Files
option present on the left side of the page. - Now download using this option
pandas-1.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB view hashes)
. Notice the architecturex86_64
andcp38
which represents the run time. - Create a directory named
python
. - Extract
whl
file, copy the content of the directory to python directory we created in the above step. - Compress the
python
directory to zip and you are all set.
- Automated process: This is an easier process. It does identical steps but in an automated way. For this you need docker as the script I am going to use us is using docker. If you do not have docker then use these steps to download and setup docker.
- Script we are going to use can be downloaded from this location.
- Article associated with this link can be found here.
- Create a
requirements.txt
file and mention all the packages required there.
pandas==1.3.4
- Create a shell script
get_layer_packages
with content provided in the above link. Modify the shell script based on the desired python version. - Fire the following command (make sure to start docker before this):
chmod +x get_layer_packages.sh
./get_layer_packages.sh
- A new python folder will be generated in the same directory.
- Create a zip file of that directory and you are all set.
Once we have zip ready then it’s time to create a layer. So every dependency should included as a layer. Important thing to note here is there is a size limit of 250MB and one lambda function can not have more then 5 layers at a time.
Here are the steps to create a layer:
- Click on the layer from the left menu
- Provice the following details
Name = pandas-layer
description = pandas,numpy
[X] upload a zip file
[X] x86_64
compatible runtimes = python 3.8
- Click on Create button.
Now, we have our layer ready. So we are going to include that layer in the lambda function. Here are the steps to include a layer in the lambda function.
- Navigate to the created lambda function
- Select the
code
(Default option when we navigate to lambda function) and scroll to the bottom of the page, you will see thelayers
section. - Click on
add layer
button present on the right side of the screen. - Select
custom layer
to add the layer we just created. - Under the section,
custom layers
, choose the layerpandas-layer
and select the version (For new layer there will be 1 version). - Click on
add
button. - Verify the layer added from the UI or from the
layers
section.
Step 3 - Process data
In this section, we can write the actual script in the editor based on the use case. In our case, we can keep it really simple by just filtering few records and then storing that file back in the S3.
import json
import pandas as pd
import boto3
import io
import pymysql
def lambda_handler(event, context):
client = boto3.client('s3')
spurce_bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
obj = client.get_object(Bucket=spurce_bucket, Key=key)
data = obj['Body'].read()
df = pd.read_excel(io.BytesIO(data), engine='xlrd')
df = df[df['col'] == 1].copy()
s3_client = boto3.client('s3')
with io.StringIO() as csv_buffer:
result_df.to_csv(csv_buffer, index=False)
response = s3_client.put_object(
Bucket='dest-bucket', Key="Processed_file.csv", Body=csv_buffer.getvalue()
)
return response
The code here is fairly easy to understand. We are reading an excel file and filtering col
for val 1
. The we are saving the file to the desired location on some other location on S3.
Note : Do not use the same bucket for triggering the lambda and saving the file or use proper filters while creating the trigger else the lambda function will be triggered in an infinite loop.
The interesting part here is the event
. In our case, we are going to trigger the lambda function on PUT OBJECT
event on S3. So in order to test the above function, we can simulate the trigger. From the UI, navigate to the test section. Provide a name for the Event
and under the template
- select S3 PUT
as shown in the screenshot.
Now in the Even Json, alter the values as shown below:
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "testConfigRule",
"bucket": {
"name": "Input-Bucket",
"ownerIdentity": {
"principalId": "EXAMPLE"
},
"arn": "arn:aws:s3:::example-bucket"
},
"object": {
"key": "test.xls",
"size": 1024,
"eTag": "0123456789abcdef0123456789abcdef",
"sequencer": "0A1B2C3D4E5F678901"
}
Now we have our lambda function, mock trigger event ready. So upload the file test.xls
in the above mentioned bucket through UI
or AWS CLI
. Once that is done, in the lambda test section, click on the test button with the event
we just created. You should be able to see the output file in the bucket specified in the code.
Step 4 - Create Trigger
Now it’s time to create an actual trigger to invoke the lambda function. On the UI, you can find a button on the left side + Add trigger
. Click on that and you will see the screen as shown below.
Insert the desired information in the trigger.
- Select a trigger -> S3 in our case
- Bucket -> Where file put will happen
- Prefix -> If you want some specific file prefix to trigger the event.
- Suffix -> If you want some specific file type to trigger the event.
Once this is done,you are all set. Just upload a file to S3 and you will see the lambda function triggered from the monitor
section on UI
.
Next Steps
Next step is to use SAM to setup the same program on the local machine and then deploy it on AWS. This will be covered in the next blog on AWS lambda series
.
References:
- https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/begin
- https://medium.com/@qtangs/creating-new-aws-lambda-layer-for-python-pandas-library-348b126e9f3e
- https://gist.github.com/qtangs/69e0db74313e8b97708b88f9a7db9bfb
- https://stackoverflow.com/questions/34749806/using-moviepy-scipy-and-numpy-in-amazon-lambda
- https://www.youtube.com/watch?v=vXiZO1c5Sk0
Share this post
Twitter
Reddit
LinkedIn
Pinterest
Email