ARC 326 - Create a Serverless Image Processing Pipeline

Prerequisites

In order to complete this lab, you will need the following:

An AWS account and an IAM user with admin access.
Sign into the AWS Console using your IAM credentials.
An SSH Client. OSX: Terminal, Windows: Putty
An EC2 Key Pair in your account. For more information on generating a key pair, click here

Resources Created

This workshop will create resources in the us-east-1 (N. Virginia) region in a custom VPC. It is strongly suggested that you use a non-production account to create the resources. This lab will create a new VPC, 1 Instance, and an Amazon Elasticsearch Service cluster. This lab can be run concurrently in the same account, but aware that you may run into account limits.

A note on Extra Credit

Oftentimes labs can be point and click without knowing why you are doing things. The extra credit sections are there to offer ideas of ways to extend the solution, but intentionally don't give step by step instructions. Supporting documentation will be included, but the solutions are the the challenge. These challenges don't fit in an hour-long workshop, but offer a chance to do a deep dive into the platform.

Prepare the environment

Lab Settings

Stack Name:

Click the link below to create the cloud formation stack that prepares your environment for this workshop.
Select your Key Pair from the Key Name dropdown list.
On the following page, click 'I acknowledge that AWS CloudFormation might create IAM resources with custom names.'
'Create Stack' to create the stack.
The stack details page will open. Keep this page open to watch the progress of the stack creation.

Create Stack

Capture Cloud Formation Output

After the cloud formation script completes, copy and paste the results here:

Environment Data:
Bastion Host IP:
```
54.156.172.190
```

Cognito Identity Pool:

us-east-1:8afea265-732a-4448-a1c7-8119d739a1ce

Cognito User Pool Id:
```
us-east-1_877wP1DvS
```
Cognito User Pool Client Id:
```
4l0cj2d2hn8703n78be4h1gh7p
```

ElasticSearch Endpoint:

search-arc326l8836search-ywozvbuwaxk7mnhzvdq67wntxu.us-east-1.es.amazonaws.com

Begin the Lab

Deploy the Lambda Function

The lambda function will be deployed from the bastion host. To find the IP address of the Bastion Host, go to the Stack Details page and expand the 'Outputs' section. The bastion host is listed under 'BastionHost' and the IP is to the right of it. For more information on how to SSH into an instance, click here.

ssh -i keyfile.pem ec2-user@54.156.172.190

Once you have successfully SSH'd into the host:

Install the node dependencies:

cd process-image/lambda-function
npm install
cd ..

Package the template:

aws cloudformation package --template-file process-image.yaml --s3-bucket arc326l8836-deploy \
--s3-prefix lambda-deploy --output-template-file process-image.packaged.yaml

This will package the Lambda function and copy the lambda code to S3. The result is a modified CloudFormation script that can be deployed.

Deploy the packaged template:

aws cloudformation deploy --stack-name arc326l8836-lambdas --template-file process-image.packaged.yaml \
 --parameter-overrides ESDomainEndpoint=search-arc326l8836search-ywozvbuwaxk7mnhzvdq67wntxu.us-east-1.es.amazonaws.com \
ParentStack=arc326l8836 --capabilities CAPABILITY_NAMED_IAM --region us-east-1

This will deploy the packaged script from the previous step. It will take a minute or two to complete. Monitor the progress here

Click on arc326l8836-lambdas.
Once the stack is in the state 'CREATE_COMPLETE', look at the outputs and capture the value of the ApiGateway key:
Api Gateway Id:

Optional: Explore Lambda Function Code (expand for details)

The entry point to this code is the exports.handler function. S3 will post a JSON document in the event parameter, which the function extracts the object key and bucket name for the S3 bucket. The image is resized to the thumbnail size and a document is created that is posted to elasticsearch. It is important to note that the postDocumentToES function will include the SigV4 signiture that Amazone Elasticsearch Service uses to authenticate the request.

var Promise = require("bluebird");
var AWSXRay = require('aws-xray-sdk');
var AWS = AWSXRay.captureAWS(require('aws-sdk'));
var fs = require('fs');
var elasticsearch = require('./elasticsearch.js')
var crypto = require('crypto');

'use strict';
AWS.Config.credentials.refresh();

const s3 = new AWS.S3({ region: 'us-east-1' });
const rekognition = new AWS.Rekognition({ apiVersion: '2016-06-27', region: 'us-east-1' });
const randomstring = require("randomstring");
var gm = require('gm').subClass({
    imageMagick: true
});

AWS.Config.credentials = new AWS.EnvironmentCredentials('AWS');

var metadata = Promise.promisify(require('im-metadata'));
Promise.promisifyAll(gm.prototype);

/**
 * This event fires when an object gets added to s3.
 */
exports.handler = (event, context, callback) => {
    // extract the bucket and key from the json document
    const bucket = event.Records[0].s3.bucket.name;
    const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
    var hashKey = crypto.createHash('md5').update(key).digest('hex');
    const params = {
        Bucket: bucket,
        Key: key
    };

    // Use the temp file to store the file and thumbnail
    var basefilemame = randomstring.generate(10);
    var filename = '/tmp/' + basefilemame + '.jpg';
    var thumbFilename = '/tmp/' + basefilemame + '_thumb.jpg';

    var elasticDocument = {
        filename: filename
    };

    var throwError = false;

    if (throwError) {
        callback("This is a contrived error. Set throwError = false to fix;")
        return;
    }

    var subsegment = null;

    // Retrieve the image from S3, write the image to the temp folder, and extract the metadata
    s3.getObject(params).promise()
        .then(function(data) {
            subsegment = createSubsegment('write image');
            fs.writeFileSync(filename, data.Body);
            subsegment = closeSubsegment(subsegment);

            subsegment = createSubsegment('read metadata');
            return metadata(filename, { exif: true })
        })

        // The imageData will contain the EXIF metadata tags, and populate them into the elasticsearch json document
        .then(function(imageData) {
            subsegment = closeSubsegment(subsegment);
            elasticDocument.url =  event.Records[0].s3.object.key;
            elasticDocument.thumbnail =  event.Records[0].s3.object.key.replace('upload/', 'thumbnail/');

            elasticDocument.size = imageData.size;
            elasticDocument.format = imageData.format;
            elasticDocument.colorspace = imageData.colorspace;
            elasticDocument.height = imageData.height;
            elasticDocument.width = imageData.width;
            elasticDocument.orientation = imageData.orientation;


            for (var k in imageData.exif) elasticDocument[k] = imageData.exif[k];

            // Use Rekognition to detect scene attributes on the image
            var params = {
                Image: {
                    S3Object: {
                        Bucket: bucket,
                        Name: key
                    }
                },
                MaxLabels: 123,
                MinConfidence: 70
            };
            return rekognition.detectLabels(params).promise();
        })
        // Add the labels to the elasticsearch document
        .then(function(rekognitionData) {
            elasticDocument.labels = [];

            for (var i = 0; i < rekognitionData.Labels.length; i++) {
                elasticDocument.labels.push(rekognitionData.Labels[i].Name);
            }

            // Post the document to Elasticsearch
            return elasticsearch.postDocumentToES(hashKey, elasticDocument);
        })
        // Create the thumbnail
        .then(function(esData) {
            console.log(esData);
            subsegment = createSubsegment('create thumbnail');
            return gm(filename).thumbAsync(80, 80, thumbFilename, 95);
        })
        // Copy the thumbnail up to s3
        .then(function(gmData) {
            subsegment = closeSubsegment(subsegment);
            var photoKey = key.replace('upload/', 'thumbnail/');

            var file = fs.readFileSync(thumbFilename);

            return s3.upload({
                Bucket: bucket,
                Key: photoKey,
                Body: file,
                ACL: 'public-read'
            }).promise();
        })
        // return
        .then(function(s3Upload) {

            // If this was processed off of the DLQ, this would be the logical step to DeleteMessage and pass in the message handle.

            callback(null, "success");
        })
        .catch(function(err) {
            subsegment = closeSubsegment(subsegment);
            callback(err);
        });
}

function createSubsegment(name) {
    var currentSegment = AWSXRay.getSegment();

    if (currentSegment != undefined) {
        return currentSegment.addNewSubsegment(name)
    } else {
        return null;
    }
}

function closeSubsegment(segment) {
    if (segment != null) {
        segment.close()
    }
    return null;
}

Optional: Explore Cloud Formation Template

This cloud formation template creates the lambda functions, API Gateway endpoints, and S3 bucket the application uses.

S3Bucket: A public S3 bucket that will contain the application javascript, html, and images. Images will be uploaded into the 'upload' folder of this bucket.

S3BucketPolicy: The policy that permits public access to the bucket.

DLQ: The dead letter queue that contains the event data for any lambda functions that fail.

LambdaFunction: The lambda function that processes the image upload.

PingFunction: A lambda function that returns the number of images in the Amazon Elasticsearch Service cluster. Used for testing the security configuration of an endpoint. Triggered by an API Gateway endpoing.

SearchImages: A lambda function that queries the Elasticsearch cluster for images matching the query values. Tiggered by an API Gateway endpoint.

LoadTester: A Lambda function that uploads a set of random images to the /upload folder to simulate usage.

CognitoApiPolicy: This policy defines the access to the API gateway endpoints. Notice how the effect of the first item is Deny? We'll see how that works in the next section.

ImageApi: This contains the OpenAPI aka swagger definition of the API gateway endpoint. It sets up the StageName to be prod, and swagger definition of all the endpoints. The securityDefinition defines the Authentication and Authorization scheme, in this case, awsSig4. It also configures the CORS rules for the endpoints as well. If the CORS rules are not included, other websites cannot call these endpoints directly. The PingFunction, SearchImages, CognitoApiPolicy, and ImageApi, all work together to configure the API Gateway endpoings, lambda functions, and authentication and authorization.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: The template creates the roles, buckets, and elasticsearch domain for
  the rekognition to elasticsearch demo
Parameters:
  ESDomainEndpoint:
    Type: String
    Description: The endpoint of the Elasticsearch domain.
  ParentStack:
    Type: String
    Description: The name of the parent stack for this template
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: PublicRead
      BucketName: !Join ['', [!Ref 'ParentStack', '-workshop']]
      CorsConfiguration:
        CorsRules:
        - AllowedHeaders:
            - '*'
          AllowedMethods:
            - GET
            - POST
            - PUT
          AllowedOrigins:
            - '*'
          ExposedHeaders:
            - ETag
          MaxAge: 3000
      WebsiteConfiguration: 
        IndexDocument: index.html
        ErrorDocument: error.html
  S3BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref 'S3Bucket'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action: s3:GetObject
            Resource: !Join ['', ['arn:aws:s3:::', !Ref S3Bucket, /*]]
  DLQ:
    Type: "AWS::SQS::Queue"
    Properties:
      DelaySeconds: 10
      QueueName: !Sub ${ParentStack}-process-image-dlq
      VisibilityTimeout: 60
  LambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs6.10
      MemorySize: 512
      Timeout: 60
      FunctionName: !Sub ${ParentStack}-process-image
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      DeadLetterQueue: 
        Type: SQS
        TargetArn: !GetAtt 'DLQ.Arn'
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        PhotoUpload:
          Type: S3
          Properties:
            Bucket: !Ref 'S3Bucket'
            Events: s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: prefix
                    Value: upload/
  PingFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: ping.handler
      Runtime: nodejs6.10
      MemorySize: 1536
      Timeout: 3
      FunctionName: !Sub ${ParentStack}-LambdaServiceRole-ping
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        GetResource:
          Type: Api
          Properties:
            Path: /ping
            Method: get
            RestApiId: !Ref ImageApi
  SearchImages:
    Type: AWS::Serverless::Function
    Properties:
      Handler: search.handler
      Runtime: nodejs6.10
      MemorySize: 128
      Timeout: 3
      FunctionName: !Sub ${ParentStack}-LambdaServiceRole-search
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        GetResource:
          Type: Api
          Properties:
            Path: /images
            Method: get
            RestApiId: !Ref ImageApi
  LoadTester:
    Type: AWS::Serverless::Function
    Properties:
      Handler: load.handler
      Runtime: nodejs6.10
      MemorySize: 1536
      Timeout: 240
      FunctionName: !Sub ${ParentStack}-LoadTester
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
  CognitoApiPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      ManagedPolicyName: !Sub ${ParentStack}-lambdas-CognitoApiPolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Deny
            Action:
              - 'execute-api:Invoke'
            Resource:
              - !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${ImageApi}/prod/GET/ping
          - Effect: Allow
            Action:
              - 'execute-api:Invoke'
            Resource:
              - !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${ImageApi}/prod/GET/images
          - Effect: Allow
            Action:
              - 'lambda:InvokeFunction'
            Resource:
              - !GetAtt 'LoadTester.Arn'
      Description: elasticsearch and rekognition lambda role
      Roles:
        - !Sub ${ParentStack}-CognitoUnauthRole
  ImageApi:
    Type: AWS::Serverless::Api
    Properties:
        StageName: prod
        DefinitionBody:
          swagger: "2.0"
          securityDefinitions:
            sigv4:
              type: "apiKey"
              name: "Authorization"
              in: "header"
              x-amazon-apigateway-authtype: "awsSigv4"
          info:
            version: "1.0"
            title: !Ref 'AWS::StackName'
          paths:
            /images:
              get:
                security:
                - sigv4: []
                parameters:
                - name: search
                  in: query
                  description: The string to search for
                  required: true
                  schema:
                    type: string
                responses: {}
                x-amazon-apigateway-integration:
                  uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${SearchImages.Arn}/invocations
                  passthroughBehavior: "when_no_match"
                  httpMethod: "POST"
                  type: "aws_proxy"
              options:
                consumes:
                - application/json
                produces:
                - application/json
                responses:
                  '200':
                    description: 200 response
                    schema:
                      $ref: "#/definitions/Empty"
                    headers:
                      Access-Control-Allow-Origin:
                        type: string
                      Access-Control-Allow-Methods:
                        type: string
                      Access-Control-Allow-Headers:
                        type: string
                x-amazon-apigateway-integration:
                  responses:
                    default:
                      statusCode: 200
                      responseParameters:
                        method.response.header.Access-Control-Allow-Methods: "'DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT'"
                        method.response.header.Access-Control-Allow-Headers: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
                        method.response.header.Access-Control-Allow-Origin: "'*'"
                  passthroughBehavior: when_no_match
                  requestTemplates:
                    application/json: "{\"statusCode\": 200}"
                  type: mock
            /ping:
              get:
                responses: {}
                security:
                - sigv4: []
                x-amazon-apigateway-integration:
                  uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PingFunction.Arn}/invocations
                  passthroughBehavior: "when_no_match"
                  httpMethod: "POST"
                  type: "aws_proxy"
              options:
                consumes:
                - application/json
                produces:
                - application/json
                responses:
                  '200':
                    description: 200 response
                    schema:
                      $ref: "#/definitions/Empty"
                    headers:
                      Access-Control-Allow-Origin:
                        type: string
                      Access-Control-Allow-Methods:
                        type: string
                      Access-Control-Allow-Headers:
                        type: string
                x-amazon-apigateway-integration:
                  responses:
                    default:
                      statusCode: 200
                      responseParameters:
                        method.response.header.Access-Control-Allow-Methods: "'DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT'"
                        method.response.header.Access-Control-Allow-Headers: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
                        method.response.header.Access-Control-Allow-Origin: "'*'"
                  passthroughBehavior: when_no_match
                  requestTemplates:
                    application/json: "{\"statusCode\": 200}"
                  type: mock
Outputs:
  ApiGateway:
    Description: The arn of the api gateway
    Value: !Ref 'ImageApi'

Test the lambda function

We'll test the lambda function through the console

Download this file: Image1
Open the S3 Bucket Console
Upload a sample image - IMG_0200.jpg
Open the AWS Lambda console
Click the monitoring tab
Click on View logs in CloudWatch. You should see a successful record. If you check the thumbnail folder in s3, you should see IMG_0200.jpg there.
Return to the AWS Lambda console

Configure the 'Select a Test Event' for the Lambda Function by clicking the dropdown and clicking 'Configure Test Events'.

JSON

You can test the lambda function in the console by passing in the JSON event that triggers the lambda function. Configure this test into your lambda function by clicking 'Configure Test' on the upper right of the screen. Provide a test name and paste the snippet below as the test data.

{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "requestParameters": {
        "sourceIPAddress": "127.0.0.1"
      },
      "s3": {
        "configurationId": "testConfigRule",
        "object": {
          "eTag": "0123456789abcdef0123456789abcdef",
          "sequencer": "0A1B2C3D4E5F678901",
          "key": "upload/IMG_0200.jpg",
          "size": 1024
        },
        "bucket": {
          "arn": "arn:aws:s3:::arc326l8836-workshop",
          "name": "arc326l8836-workshop",
          "ownerIdentity": {
            "principalId": "EXAMPLE"
          }
        },
        "s3SchemaVersion": "1.0"
      },
      "responseElements": {
        "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH",
        "x-amz-request-id": "EXAMPLE123456789"
      },
      "awsRegion": "us-east-1",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "EXAMPLE"
      },
      "eventSource": "aws:s3"
    }
  ]
}

Configure Kibana

Now, we can configure Kibana in addition to view the images

Open Kibana
Deselect 'Index contains time based events'
Add the pattern 'images*'
Click 'Create'
Filter for 'Thumbnail', and click 'Edit' (It is the pencil icon).
Format: Url
Type: Image
Url Template: http://arc326l8836-workshop.s3-website-us-east-1.amazonaws.com/{{value}}
Label Template: <Empty String>
Click Update Field
Click 'Discover' on the left menu

Deploy the Application Website Function

The application is a single page application (SPA) based in the Vue framework. It will be hosted from s3 using the website hosting feature of S3. Additional Info

The bucket has already been configured for website hosting, but only to the IP address assocated with this machine. Since this demo in not available to the public, IP changes on your machine will render the site inaccessible.

Build and deploy the application

Return to your bastion host to continue deploying the website.

Configure the sample application:

cd ~/website
sed  -i -e 's/COGNITO_ITENTITY_POOL/us-east-1:8afea265-732a-4448-a1c7-8119d739a1ce/g' src/components/conf.json
sed  -i -e 's/USER_POOL_ID/us-east-1_877wP1DvS/g' src/components/conf.json
sed  -i -e 's/COGNITO_CLIENT_ID/4l0cj2d2hn8703n78be4h1gh7p/g' src/components/conf.json
sed  -i -e 's/STACK_NAME/arc326l8836/g' src/components/conf.json
sed  -i -e 's/APIGATEWAY_ID/tketg11hsh/g' src/components/conf.json

Note: the sed command will replace the tokens embedded in the code with their configuration settings.

How do you know to do this?

This the configuration of your application. For the purposes of the workshop, the JSON file gets the configuration values directly. An automated build process would pull the correct file down for the environment and build the code.

Install nodejs dependencies:
```
npm install
```
Builds the website:
```
npm run build
```

Deploys the static website to S3:

aws s3 sync dist/ s3://arc326l8836-workshop

Open the sample web application

Test the Application Website Function

Initially, there is no requirement to sign in, as the system is open to unauthenticated users.

The web application has two components, the image uploader and the image search screen. Since the index is empty, the search will not yield any results. Let's load some sample images.

Load Sample Images
The same file can be uploaded multiple times, it will appear multiple times in the search index.

Save these files to disk: Lighthouse 1, Lighthouse 2, Fire truck
Upload three files
Below the upload, Search for Lighthouse in the search box. If it doesn't appear right away, wait a few seconds.

Well-Architected Review

For the rest of this workshop, we will assess the architecture against the AWS Well-Architected framework. While there isn't enough time to do a complete review, we will highlight specific areas that are relevant to serverless architectures. For more details, review the Well-Architected Framework whitepaper and the Serverless Lens.

General Serverless Principles

The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud for serverless applications:

Speedy, simple, singular: Functions are concise, short, single purpose and their environment may live up to their request lifecycle. Transactions are efficiently cost aware and thus faster executions are preferred.
Think concurrent requests, not total requests: Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency.
Share nothing: Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage are not guaranteed. State can be manipulated within a state machine execution lifecycle, and persistent storage is preferred for highly durable requirements.
Assume no hardware affinity: Underlying infrastructure may change. Leverage code or dependencies that are hardware-agnostic as CPU flags, for example, may not be available consistently.
Orchestrate your application with state machines, not functions: Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows.
Use events to trigger transactions: Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. This asynchronous event behavior is often consumer agnostic and drives just-in-time processing to ensure lean service design.
Design for failures and duplicates: Operations triggered from requests/events must be idempotent as failures can occur and a given request/event can be delivered more than once. Include appropriate retries for downstream calls.

Security

The security of any application is a primary concern. Security assessments cover many dimensions:

Authentication: Who is using the system
Authorization: Are the users permitted to perform this action
Encryption at Rest: Is the data at rest safe?
Encryption in Transit: Is the data in transit safe?
Intrusion Detection
Data Loss Prevention

For this lab, we will focus on AWS Well-Architected Security Question 3:

[SEC3] How are you limiting automated access to AWS resources? (e.g. applications, scripts, and/or third party tools)

Controlling access to AWS resources consists of the authentication and authorization mechanisms in the application.

Authentication

Determining the actor depends on the application that is accessing a resource. For mobile or web applications, Amazon Cognito offers an authentication system that can scale to millions of users. Cognito allows you to integrate with SAML2 identify providers, OAuth providers such as Facebook and Amazon, or creating your own custom authentication method.

Once a user authenticated, Cognito generates a set of short-lived IAM credentials that get stored on the web page or inside the mobile application.

For simplicity, this lab uses an unauthenticated user to create the IAM credentials. Unauthenticated users do not require an authentication step, but still receive a set of short-lived IAM credantials.

For any custom API endpoints, this lab uses Amazon API Gateway to expose and secure the endpoints.

With AWS API Gateway, you aren't required to use Cognito for authentication, you can use any authentication method listed below.

AWS: Each call with be encoded with a SigV4 signiture and pass the AWS access key. This can be generated using Cognito, or generated through the AWS API. This method provides access to non-API Gateway AWS resources using the same credentials.
API Key: Generate an API key to authenticate API
Custom Authorizer: Write custom code to authenticate the using user headers, cookies, etc.

Authorization

Once you have a set a credentials, you have access to any AWS resources that the underlying role has a policy that allows access to that resource. That may be an API Gateway endpoint, but you can use any AWS service. It is common to write directly to S3 or SQS queues in addition to API Gateway endpoints.

Example IAM Policy

The following policy allows GET and POST access to the pets endpoint for any IAM user, role, or group that has the policy attached to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1507229653000",
            "Effect": "Allow",
            "Action": [
                "execute-api:Invoke"
            ],
            "Resource": [
                "arn:aws:execute-api:us-east-1:a111111111111:h38ks93e/Prod/POST/pets",
                "arn:aws:execute-api:us-east-1:a111111111111:h38ks93e/Prod/GET/pets"
            ]
        }
    ]
}

Lambda Credentials

Lambda functions run under an IAM role assigned to the function. When the function executes, it is assigned a set of short-lived IAM credentials that allow it to authenticate to other AWS resources such as S3, DynamoDb, or API Gateway endpoints. Additional Info

Lab Instructions

In this lab, we will demonstrate the authorization capabilities of IAM.

Open your sample application page.
Click the 'Test API Endpoint' button. An error will be thrown because the role assigned to this page does not have access to that API gateway endpoint.
Go to the AWS IAM Console for the unauthenticated cognito role.
Click on the at the policy 'arc326l8836-lambdas-CognitoApiPolicy' and click 'Edit policy'.
The policy editor will open up, click the 'JSON tab'
Change Deny to Allow. Click Save.
Refresh the sample application page
Click the 'Test API Endpoint' button. You should see a count of the number of images in the Amazon Elasticsearch Service cluster. If it doesn't work right away, wait 15 seconds, refresh and try again.
Extra Credit: Create an authenticated Cognito user with a login page
Supporting Documentation
1. What is Cognito?
2. Authentication Flow
3. Cognito SDK
4. Open the web application, click 'Sign In'
5. Depending on your browser, you may see a warning that you are using an unsecure page. You can ignore this warning, since all of the Cognito API called are secured, just the static content is not. In the Additonal Ideas section, you can learn about using cloudfront to cache the content on the CDN and provide HTTPS support.
6. You will have to register a user with a real email address. Do this by clicking 'Register' on the login screen.
7. You will receive a confirmation email from Cognito to ensure the validity of the email. Enter the code in the confirmation textbox that appears after you press 'Register'.
8. Log Into the web application.
9. Do the integrations to API Gateway work? Why is everything broken? Which Role are we assuming?
10. Hint...
  Attach the IAM policies associated to the unauthenticated role to the authenticated role. Detach the policies from the unauthenticated role. Now unauthenticated users will be denied access!

[SEC11] How are you encrypting and protecting your data in transit?

Using the browser developer tools, verify all API calls are over HTTPS.

Reliability

[REL 7] How does your system withstand component failures?

While the serverless architecture is very reliable, much of reliability must come from the application itself. Resiliant applications will continue to operate in a degraded capacity in the case of a component failure. In the following example, we will intentially break the connection between lambda and the Amazon Elasticsearch cluster.

Lab Instructions

Go to your SSH session

Break your lambda function:

cd ~/process-image
sudo chmod +x break-lambda.sh
./break-lambda.sh

What's this doing???

The shell script uses sed to change the code to always throw an exception.

sed -i -e 's/throwError = false/throwError = true/g' lambda-function/index.js

The lambda code will execute:

    var throwError = true;

    if (throwError) {
        callback({error: "This is a contrived error. Set throwError = false to fix;"})
        return;
    }

Package the broken function:

aws cloudformation package --template-file process-image.yaml --s3-bucket arc326l8836-deploy \
--s3-prefix lambda-deploy --output-template-file process-image.packaged.yaml

Deploy the broken function:

aws cloudformation deploy --stack-name arc326l8836-lambdas --template-file process-image.packaged.yaml \
 --parameter-overrides ESDomainEndpoint=search-arc326l8836search-ywozvbuwaxk7mnhzvdq67wntxu.us-east-1.es.amazonaws.com \
ParentStack=arc326l8836 --capabilities CAPABILITY_NAMED_IAM --region us-east-1

This may take a minute or two to complete

Right Click 'Save Link As' Image 488 to download a random sample image.
Upload some images using the sample web application, this will appear successful
Check the lambda logs. It may take a couple of minutes for the errors to make it into the logs.
Lambda will try twice, so you will see 2 entries in the log., then send the message to the Dead Letter Queue (DLQ) is configured.
What happened? Check the SQS Console for the Dead Letter Queue.
Search for arc326l8836-process-image-dlq in the search window. You should see some messages appear in the queue after a couple of tries in lambda. It usually takes about 2-5 minutes for the queue items to appear.
Additional Info on dead letter queues
Go back to your SSH window

Fix the lambda function:

sudo chmod +x fix-lambda.sh
./fix-lambda.sh

What's this doing now???

The shell script uses sed to disable the code that always throw an exception.

sed -i -e 's/throwError = true/throwError = false/g' lambda-function/index.js

The lambda code will execute:

    var throwError = false;

    if (throwError) {
        callback({error: "This is a contrived error. Set throwError = false to fix;"})
        return;
    }

Package the fixed function:

aws cloudformation package --template-file process-image.yaml --s3-bucket arc326l8836-deploy \
--s3-prefix lambda-deploy --output-template-file process-image.packaged.yaml

Deploy the fixed function:

aws cloudformation deploy --stack-name arc326l8836-lambdas --template-file process-image.packaged.yaml \
 --parameter-overrides ESDomainEndpoint=search-arc326l8836search-ywozvbuwaxk7mnhzvdq67wntxu.us-east-1.es.amazonaws.com \
ParentStack=arc326l8836 --capabilities CAPABILITY_NAMED_IAM --region us-east-1

Extra Credit: reprocess the DLQ using a lambda function triggered by cloudwatch log events.
Supporting Documentation
- Create a schedule triggered lambda function
- Sending and receiving SQS messages in Javascript.
- The lambda function should call the 'recievemessage API'. Keep polling until the queue is empty or your lambda function times out.
- You don't need to create a new process-image function, you can call the same one. If you call it async, make sure you pass in the message handle. If you call synchonously, consider the impacts on lambda concurrency, cost, and scalibility.

The result of this failure is not a total system failure. Users can still log into the application and upload their photos, but there will be a backlog of images that require processing. Once the problem is resolved, the backlog of images will be processed scaling up using lambda.

A second dimension to reliability is throttling requests. One consideration is the elasticity of the downstream systems from the lambda functions. Even though the lambda functions can scale rapidly, the downstream dependencies may not. In this case, you are depending on S3, Rekognition, and the ElasticSearch cluster.

There are two ways to handle the throttling:

Throttle the input to the system. In our example, this is handled by API gateway search function
Queue the work prior to the constraint. In out example, if the Elasticsearch cluster returns an error, the function is placed on the dead letter queue for later processing.

The main consideration is the transactional nature of the request. In the case of an upload, we can use s3 to scale up and receive the files as the users upload them. Since there isn't an expectation that the images will be immediately available in the search results. In the case of the search query, the API Gateway will provide that throttle to insulate the Elasticsearch cluster from an overload of queries.

Extra Credit: Throttle the S3 update lambda function [Optional]

Review Securing Serverless Architectures
Create an API gateway endpoint that exposes the lambda process-images lambda function.
Create a new lambda function that processes the s3 create event. The lambda function calls the API gateway endpoint that triggers process-images. If the system returns a HTTP code of 429, the call has been throttled. Return an error and the DLQ should handle it

[REL3] How does your system adapt to changes in demand?

A scalable system can provide elasticity to add and remove resources automatically so that they closely match the current demand at any given point in time.

Best practices:

Automated Scaling - Computer scaling handled by lambda. What about downstream systems?
Load Test

Lab Instructions

We have supplied a load tester using AWS Lambda, of course. Because it can scale quickly and run concurrently, lambda functions are an excellent tool for running load tests. In the case, the lambda function will randomly pull an image from a set of sample images and upload it to Amazon S3. This will trigger the image processing workflow.

Through testing, each load test lambda function can support up to 10 uploads per second on a consistent basis. This will fluctuate depending on the load test being run.

Open the web application
Find the section 'Load Test'
Run a load test with various parameters. The default limit is 1,000 concurrent lambda functions, so you can test this by running more than 125 requests per second. This is because the typical process-image function takes about 8 seconds to complete, meaning there are about 8*125 lambda functions running at any given time. More information on concurrent execution.
View the results in in the monitoring section..

Performance Efficiency

[PERF 7] How do you monitor your resources post-launch to ensure they are performing as expected?

Lab Instructions

Extra Credit: Make a cloudwatch dashboard for your load test

From a performance perspective, here are a few pointers:

Don't put the lambda functions in your VPC unless you need to. It will increase you cold start time and service limits for ENIs and other network constrains will impact your ability to scale.
Reduce the size of your code deployable
Take advantage of /tmp space
Reuse database connections and static initialization of cacheable data
For more information, review the lambda best practices

Cost Optimization

[COST 2] Have you sized your resources to meet your cost targets?

Lambda is a very cost-effective solution, but at scale, costs can be a consideration. By opimizing the code to use only the CPU and Memory required, you can have a significant impact on the overall cost to serve.

Using the cloudwatch metrics, enter in the average processing time for the lambda function (in ms):
Enter the amount of memory reserved for the function
If you are processing 200 million images, the cost to serve will be: $13369.13
Now, go to the Lambda Console and change the memory to 1024 MB in the Basic Settings section.
Test the lambda function again to get the time to process the image
Replace the values above and compare which is more cost effective

Operations Excellence

[OPS 2] How are you doing configuration management for your workload?

In this lab, we used the Serverless Application Model (SAM), which is an extension of CloudFormation. It allows you to check in your infrastucture code into a version control system and use automation to deploy the code. The lab has you enter in the command line commands to get a sense for what is happening behind the scenes. However, in a typical development scenario, you would commit the CloudFormation code into your version control system and the continuous integration / continuous delivery system would deploy the changes and run validation tests.

Extra Credit: CI/CD Pipeline

Create a code commit project. Code commit getting started.
Create a AWS Code Pipeline job to deploy the this lab. Code Pipeline Documentation
Create a code pipeline job that uses a cloudformation deployment action.

Lab Instructions

[OPS 4] How do you monitor your workload to ensure it is operating as expected?

Cloudwatch has the capability to notify you when metrics reach certain thresholds. In this case, it would make sense

Open Cloud Watch
Click Alarms on the left side menu
Click 'Create Alarm'
In 'Browse Metrics' type 'arc326l8836-process-image' and press enter
Select the metric with the FunctionName of arc326l8836-process-images and a Metric Name of Errors
Click Next in the lower right
Name the alarm Process Image Errors
Set the threshold to Whenever errors is >=10 errors for 1 consecutive period.
Under 'Additional Settings': Treat Missing Data as 'Good'.
Under Actions, choose Send Notification To and click New List
Provide a name for the topic: arc326l8836-errors
Provide your email address in the email list. You will need to response to the verification email for it to work.
Click Create Alarm
Extra Credit: Use the section above to break the lambda function, and run a load test

Application Performance Monitoring

AWS X-Ray gives teams visibility into the performance and service dependencies and is fully integrated into AWS Lambda.

Click here for more information on x-ray and lamnda

Go to the Lambda Console and under the 'Debugging and error handling' section, verify the 'Enable Active Tracing' is selected.
Visit the X-ray console. View the service map to view the traces in the application.
Adjust the memory allocated to the lambda function and review the impact on the X-Ray traces.
How could you optimize the cost and performance of the function? Which segments are affected by the lamba function's memory allocation and which are not?

Other Considerations

Limits Monitoring. For those with Business or Enterprise Support, look at the AWS Limit Monitor.

Infrastructure as Code. AWS Serverless Application Model.. This workshop used the AWS Serverless Application Model to deploy all the changes to the web application. This file, process-image.yaml includes defining the lambda functions, API Gateway endpoints, SQS Queue, and S3 buckets. Pay particulat attention to the security definitions in the ImageApi section. It includes the specification of the sigv4 authentication in the securityDefinitions entry.

process-image.yaml

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: The template creates the roles, buckets, and elasticsearch domain for
  the rekognition to elasticsearch demo
Parameters:
  ESDomainEndpoint:
    Type: String
    Description: The endpoint of the Elasticsearch domain.
  ParentStack:
    Type: String
    Description: The name of the parent stack for this template
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: PublicRead
      BucketName: !Join ['', [!Ref 'ParentStack', '-workshop']]
      CorsConfiguration:
        CorsRules:
        - AllowedHeaders:
            - '*'
          AllowedMethods:
            - GET
            - POST
            - PUT
          AllowedOrigins:
            - '*'
          ExposedHeaders:
            - ETag
          MaxAge: 3000
      WebsiteConfiguration: 
        IndexDocument: index.html
        ErrorDocument: error.html
  S3BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref 'S3Bucket'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action: s3:GetObject
            Resource: !Join ['', ['arn:aws:s3:::', !Ref S3Bucket, /*]]
  DLQ:
    Type: "AWS::SQS::Queue"
    Properties:
      DelaySeconds: 10
      QueueName: !Sub ${ParentStack}-process-image-dlq
      VisibilityTimeout: 60
  LambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs6.10
      MemorySize: 512
      Timeout: 60
      FunctionName: !Sub ${ParentStack}-process-image
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      DeadLetterQueue: 
        Type: SQS
        TargetArn: !GetAtt 'DLQ.Arn'
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        PhotoUpload:
          Type: S3
          Properties:
            Bucket: !Ref 'S3Bucket'
            Events: s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: prefix
                    Value: upload/
  PingFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: ping.handler
      Runtime: nodejs6.10
      MemorySize: 1536
      Timeout: 3
      FunctionName: !Sub ${ParentStack}-LambdaServiceRole-ping
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        GetResource:
          Type: Api
          Properties:
            Path: /ping
            Method: get
            RestApiId: !Ref ImageApi
  SearchImages:
    Type: AWS::Serverless::Function
    Properties:
      Handler: search.handler
      Runtime: nodejs6.10
      MemorySize: 128
      Timeout: 3
      FunctionName: !Sub ${ParentStack}-LambdaServiceRole-search
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
      Tracing: Active
      Environment:
        Variables:
          ELASTICSEARCH_ENDPOINT: !Ref 'ESDomainEndpoint'
      Events:
        GetResource:
          Type: Api
          Properties:
            Path: /images
            Method: get
            RestApiId: !Ref ImageApi
  LoadTester:
    Type: AWS::Serverless::Function
    Properties:
      Handler: load.handler
      Runtime: nodejs6.10
      MemorySize: 1536
      Timeout: 240
      FunctionName: !Sub ${ParentStack}-LoadTester
      Role: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${ParentStack}-LambdaServiceRole'
      CodeUri: lambda-function
  CognitoApiPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      ManagedPolicyName: !Sub ${ParentStack}-lambdas-CognitoApiPolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Deny
            Action:
              - 'execute-api:Invoke'
            Resource:
              - !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${ImageApi}/prod/GET/ping
          - Effect: Allow
            Action:
              - 'execute-api:Invoke'
            Resource:
              - !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${ImageApi}/prod/GET/images
          - Effect: Allow
            Action:
              - 'lambda:InvokeFunction'
            Resource:
              - !GetAtt 'LoadTester.Arn'
      Description: elasticsearch and rekognition lambda role
      Roles:
        - !Sub ${ParentStack}-CognitoUnauthRole
  ImageApi:
    Type: AWS::Serverless::Api
    Properties:
        StageName: prod
        DefinitionBody:
          swagger: "2.0"
          securityDefinitions:
            sigv4:
              type: "apiKey"
              name: "Authorization"
              in: "header"
              x-amazon-apigateway-authtype: "awsSigv4"
          info:
            version: "1.0"
            title: !Ref 'AWS::StackName'
          paths:
            /images:
              get:
                security:
                - sigv4: []
                parameters:
                - name: search
                  in: query
                  description: The string to search for
                  required: true
                  schema:
                    type: string
                responses: {}
                x-amazon-apigateway-integration:
                  uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${SearchImages.Arn}/invocations
                  passthroughBehavior: "when_no_match"
                  httpMethod: "POST"
                  type: "aws_proxy"
              options:
                consumes:
                - application/json
                produces:
                - application/json
                responses:
                  '200':
                    description: 200 response
                    schema:
                      $ref: "#/definitions/Empty"
                    headers:
                      Access-Control-Allow-Origin:
                        type: string
                      Access-Control-Allow-Methods:
                        type: string
                      Access-Control-Allow-Headers:
                        type: string
                x-amazon-apigateway-integration:
                  responses:
                    default:
                      statusCode: 200
                      responseParameters:
                        method.response.header.Access-Control-Allow-Methods: "'DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT'"
                        method.response.header.Access-Control-Allow-Headers: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
                        method.response.header.Access-Control-Allow-Origin: "'*'"
                  passthroughBehavior: when_no_match
                  requestTemplates:
                    application/json: "{\"statusCode\": 200}"
                  type: mock
            /ping:
              get:
                responses: {}
                security:
                - sigv4: []
                x-amazon-apigateway-integration:
                  uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PingFunction.Arn}/invocations
                  passthroughBehavior: "when_no_match"
                  httpMethod: "POST"
                  type: "aws_proxy"
              options:
                consumes:
                - application/json
                produces:
                - application/json
                responses:
                  '200':
                    description: 200 response
                    schema:
                      $ref: "#/definitions/Empty"
                    headers:
                      Access-Control-Allow-Origin:
                        type: string
                      Access-Control-Allow-Methods:
                        type: string
                      Access-Control-Allow-Headers:
                        type: string
                x-amazon-apigateway-integration:
                  responses:
                    default:
                      statusCode: 200
                      responseParameters:
                        method.response.header.Access-Control-Allow-Methods: "'DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT'"
                        method.response.header.Access-Control-Allow-Headers: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
                        method.response.header.Access-Control-Allow-Origin: "'*'"
                  passthroughBehavior: when_no_match
                  requestTemplates:
                    application/json: "{\"statusCode\": 200}"
                  type: mock
Outputs:
  ApiGateway:
    Description: The arn of the api gateway
    Value: !Ref 'ImageApi'

Additional Ideas

Now that you have been introduced to the serverless image processing pipeline, think of the following enhancements:

Host the Website on CloudFront to Cache the static content at the edge. Documentation
Host the website on a custom domain name. Documentation
Sentiment Analysis: Capture the emotion of any faces in the images.
Face Recognition: Create a graph using DynamoDb and D3 showing images containing the same faces.
Coordinate image processing with Step Functions rather than a single lambda function.
Detect if the same image is uploaded multiple times (Hint: find any images that are already in the index by S3 Etag).
Remove an image from the index if it was deleted from the S3 bucket. Don't forget to delete the thumbnail too.
Filter using moderation labels with Rekognition
Detect Text using Rekognition
Use your imagination!!!

Show us something cool!

If you have extra time, team up with people near you and create an interesting extension to the workshop.

Don't be shy, we'd love to see your creations.

Cleanup

In your SSH window, past the following commands:

aws s3 rm --recursive s3://arc326l8836-workshop
aws s3 rm --recursive s3://arc326l8836-deploy
aws cloudformation delete-stack --stack-name arc326l8836-lambdas --region us-east-1

Once those resources have been deleted, complete the cleanup by removing the stack we created at the beginning of the workshop by selecting arc326l8836 and choose Actions->Delete Stack. Link to Console