Building an Image Processing Pipeline on Aws

Creating AWS resources will incur charges if the configurations are not within the free-tier limits. Always check the AWS pricing before creating any AWS resources.

In this tutorial, you’ll learn how to build a robust file processing pipeline using AWS services: S3, Lambda, SQS, and Step Functions. Specifically, we’ll create a pipeline that processes images uploaded to an S3 bucket by resizing them and applying various filters. The pipeline will also handle failure scenarios by retrying failed tasks via SQS.

By the end of this tutorial, you will have built a scalable pipeline that can be easily adapted to other use cases, such as document processing or video transcoding.

Getting Started

What is a Pipeline?

A pipeline refers to a series of automated processes where data flows from one task to another. Each task in the pipeline performs a specific function and passes the results to the next task. Pipelines are used to handle repetitive data processing tasks efficiently, ensuring scalability, reliability, and automation.

In this tutorial, we will create an image processing pipeline that:

Accepts images uploaded to S3.
Resizes those images into different sizes.
Applies various filters (like grayscale or sepia).
Retries processing in case of failure using SQS.

What AWS Services Will We Be Using?

Here are the main AWS services we’ll be using:

S3 (Simple Storage Service): A scalable object storage service. We’ll use it to store and trigger image uploads.
Lambda: A serverless compute service that runs your code without the need to provision or manage servers. Lambda will resize images and apply filters.
SQS (Simple Queue Service): A message queuing service for reliable message delivery. We’ll use SQS to manage retries if image processing tasks fail.
Step Functions: A serverless orchestration service that lets you coordinate multiple AWS services into workflows. It manages the sequence of tasks in the image processing pipeline.

Steps

We’ll follow these steps to build the pipeline:

Create an S3 Bucket to store and trigger the pipeline when an image is uploaded.
Create a Lambda Function to resize images.
Create a Lambda Function to apply filters to images.
Set up SQS Queues to handle retries in case of failures.
Configure IAM Roles and Permissions for secure access between services.
Define a Step Functions Workflow to orchestrate the pipeline.
Set up S3 Events to trigger the pipeline when new images are uploaded.
Configure a Dead-Letter Queue (DLQ) to capture tasks that fail after retries.
Monitor and Test the Pipeline to ensure everything works as expected.

Step 1: Create an S3 Bucket

We’ll first create an S3 bucket to store the images and trigger the pipeline when new images are uploaded.

Instructions:

Step 1a: Create the S3 Bucket

bash
Loading...

Replace image-processing-pipeline with a unique bucket name and your-region with your AWS region (e.g., us-east-1).
Step 1b: Configure Public Access (Optional)

If you need to disable the block public access settings:

bash
Loading...

Now that your S3 bucket is ready, any time an image is uploaded, it will trigger the pipeline to process the image.

Step 2: Create a Lambda Function for Resizing Images

We’ll create a Lambda function that resizes images to multiple dimensions, such as 100x100, 500x500, etc.

Instructions:

Step 2a: Create the Lambda Function
- Create a new directory for your Lambda function.
  
  bash
  Loading...
- Initialize a new Node.js project.
  
  bash
  Loading...
- Install required dependencies.
  
  bash
  Loading...
- Create the Lambda function code.
  
  Create a file named index.js and add the following placeholder content:
  
  javascript
  Loading...
- Zip the deployment package.
  
  bash
  Loading...
Step 2b: Create an IAM Role for Lambda
- Create a trust policy JSON file named trust-policy.json with the following content:
  
  json
  Loading...
- Create the IAM role.
  
  bash
  Loading...
- Attach policies to the role.
  
  bash
  Loading...
Step 2c: Deploy the Lambda Function
- Deploy the Lambda function using the AWS CLI.
  
  bash
  Loading...
  
  Replace your-account-id with your AWS account ID.

Step 3: Create a Lambda Function for Applying Filters

Next, we’ll create a second Lambda function that applies various filters (like grayscale, sepia, etc.) to the images. This will be the next step in the image processing pipeline.

Instructions:

Step 3a: Create the Lambda Function
- Create a new directory for your Lambda function.
  
  bash
  Loading...
- Initialize a new Node.js project.
  
  bash
  Loading...
- Install required dependencies.
  
  bash
  Loading...
- Create the Lambda function code.
  
  Create a file named index.js and add the following placeholder content:
  
  javascript
  Loading...
- Zip the deployment package.
  
  bash
  Loading...
Step 3b: Deploy the Lambda Function
- Deploy the Lambda function using the AWS CLI.
  
  bash
  Loading...

Step 4: Set Up SQS for Failure Retries

Next, we’ll set up an SQS queue that will handle retries for tasks that fail during processing.

Instructions:

Step 4a: Create the SQS Queue
- Create the queue using the AWS CLI.
  
  bash
  Loading...

Step 5: Configure IAM Roles and Permissions

To ensure secure access between services, you’ll need to configure IAM roles.

Why IAM Roles Are Important:

IAM roles define the permissions and security policies that allow AWS services (like Lambda, Step Functions, and S3) to interact with each other securely. Ensuring these roles are configured properly is critical for a functioning pipeline.

Steps:

Step 5a: Create a Step Functions Execution Role
- Create a trust policy JSON file named step-functions-trust-policy.json with the following content:
  
  json
  Loading...
- Create the IAM role.
  
  bash
  Loading...
- Attach policies to the role.
  
  bash
  Loading...

Step 6: Define a Step Functions Workflow

The Step Functions workflow orchestrates the tasks in your image processing pipeline: resizing images, applying filters, and handling retries.

Instructions:

Step 6a: Create the State Machine Definition
- Create a YAML file named state-machine-definition.yaml with the following placeholder content:
  
  yaml
  Loading...
Step 6b: Create the State Machine
- Create the state machine using the AWS CLI.
  
  bash
  Loading...

Step 7: Set Up S3 Events to Trigger the Pipeline

To trigger the pipeline automatically when new images are uploaded, configure an S3 event to invoke the Step Functions workflow.

Instructions:

Step 7a: Create a Lambda Function to Trigger Step Functions

Since S3 doesn't invoke Step Functions directly, we'll create a Lambda function to start the Step Functions execution.
- Create a new directory for the Lambda function.
  
  bash
  Loading...
- Initialize a Node.js project.
  
  bash
  Loading...
- Install required dependencies.
  
  bash
  Loading...
- Create the Lambda function code.
  
  Create a file named index.js with the following placeholder content:
  
  javascript
  Loading...
- Zip the deployment package.
  
  bash
  Loading...
- Create the Lambda function.
  
  bash
  Loading...
Step 7b: Add Permissions for Step Functions Invocation
- Attach the AWSStepFunctionsFullAccess policy to the LambdaExecutionRole if not already done.
  
  bash
  Loading...
Step 7c: Configure S3 Event Notification
- Create an event notification configuration JSON file named notification.json:
  
  json
  Loading...
- Update the S3 bucket notification configuration.
  
  bash
  Loading...

Step 8: Configure a Dead-Letter Queue (DLQ)

For tasks that fail even after retries, you can configure a Dead-Letter Queue (DLQ) to capture the failed messages for further inspection.

Instructions:

Step 8a: Create the DLQ
- Create the DLQ using the AWS CLI.
  
  bash
  Loading...
Step 8b: Configure Lambda Functions to Use the DLQ
- Update the ResizeImageFunction configuration.
  
  bash
  Loading...
- Update the ApplyFiltersFunction configuration.
  
  bash
  Loading...

Step 9: Monitor and Test the Pipeline

Now it’s time to test the pipeline and ensure everything works as expected.

Instructions:

Step 9a: Upload an Image to S3
- Upload an image to your S3 bucket.
  
  bash
  Loading...
Step 9b: Monitor the Step Functions Execution
- List recent executions.
  
  bash
  Loading...
- Get execution history.
  
  bash
  Loading...
Step 9c: Check Processed Images in S3
- List objects in the resized images folder.
  
  bash
  Loading...
- List objects in the filtered images folder.
  
  bash
  Loading...
Step 9d: Simulate a Failure
- Upload a non-image file.
  
  bash
  Loading...
- Check the SQS retry queue.
  
  bash
  Loading...
- Check the DLQ for failed messages.
  
  bash
  Loading...
Step 9e: Monitor Logs with CloudWatch
- Describe log groups.
  
  bash
  Loading...
- Tail Lambda function logs.
  
  bash
  Loading...

You can use these logs to monitor Lambda performance, check for errors, and troubleshoot any issues.

Conclusion

Congratulations! You’ve built a fully functional image processing pipeline using AWS services: S3, Lambda, SQS, and Step Functions. This pipeline resizes and applies filters to uploaded images, handles failure scenarios, and automatically retries using SQS.

Customizing the Pipeline

You can easily modify this pipeline for other use cases:

Document Processing: Convert or extract data from documents like PDFs or Word files.
Video Processing: Encode videos into different formats or resolutions.
Data Processing: Transform and analyze large datasets like logs or sensor data.

Feel free to expand the pipeline to suit your specific needs. Happy building!

Reference

bash

javascript

bash

json

bash

javascript

bash

json

bash

yaml

bash

javascript

bash

json

bash