AWS DeepLens: Getting Hands-on (Part 1 of 2)

aws-deeplens-techconnect-1-1030x686

TechConnect recently acquired two AWS DeepLens to play around with. Announced at Re:Invent 2017, the AWS DeepLens is a small Intel Atom powered Deep Learning focused device with an embedded High-Definition video camera. The DeepLens runs AWS Greengrass, allowing quick compute for local events without having to send a large amount of data for processing on the cloud. This can substantially help businesses reduce costs, sensitive information transfer, and latency response for local events.

Zero to Hero (pick a sample project)

What I think makes the DeepLens special, is how easy it is to get started using computer vision models to process visual surroundings on the device itself. TechConnect has strong capabilities in Machine Learning, but I myself haven’t had much of a chance to play around with Deep Learning frameworks like MxNet or Tensorflow. Thankfully, AWS provides a collection of pre-trained models and projects to help anyone get started. But if you are quite savvy already in those frameworks, you can train and use your own models too.

Face Detection Lambda Function

An AWS DeepLens project consists of a trained model and an AWS Lambda function (written in Python) at its core. These are deployed to the device to run via AWS Greengrass, where the AWS Lambda function continually processes each frame of coming in from the video feed using the awscam module.

The function can access the model that is downloaded as an accessible artifact on the device at the path. This location and others (example: “/tmp”) have permissions granted to the function from the AWS Greengrass group which is associated with the DeepLens project. I chose the face detection sample project, which processes faces in a video frame captured from the DeepLens camera and draws a rectangle around them.


from threading import Thread, Event
import os
import json
import numpy as np
import awscam
import cv2
import greengrasssdk

class LocalDisplay(Thread):
    def __init__(self, resolution):
    ...
    def run(self):
    ...
    def set_frame_data(self, frame):
    ....

def greengrass_infinite_infer_run():
    ...
    # Create a local display instance that will dump the image bytes
    # to a FIFO file that the image can be rendered locally.
    local_display = LocalDisplay('480p')
    local_display.start()
    # The sample projects come with optimized artifacts,
    # hence only the artifact path is required.
    model_path = '/opt/awscam/artifacts/mxnet_deploy_ssd_FP16_FUSED.xml'
    ...
    while True:
        # Get a frame from the video stream
        ret, frame = awscam.getLastFrame()
        # Resize frame to the same size as the training set.
        frame_resize = cv2.resize(frame, (input_height, input_width))
        ...
        model = awscam.Model(model_path, {'GPU': 1})
        # Process the frame
        ...
        # Set the next frame in the local display stream.
        local_display.set_frame_data(frame)
        ...

greengrass_infinite_infer_run()

Extending the original functionality: AWS DeepLens Zoom Enhance!

I decided to have a bit of fun and extend the original application functionality by cropping and enhancing a detected face. The DeepLens project video output set to 480p definition, but the camera frames from the device are much higher than this! So reusing the code from the original sample that drew a rectangle around each detected face, I was able to capture a face and display that on the big screen. The only difficult thing was centring the captured face and adding padding, bringing back bad memories of how hard centring an image in CSS used to be!


from threading import Thread, Event
import os
import json
import numpy as np
import awscam
import cv2
import greengrasssdk

class LocalDisplay(Thread):
    def __init__(self, resolution):
    ...
    def run(self):
    ...
    def set_frame_data(self, frame):
        # Get image dimensions
        image_height, image_width, image_channels = frame.shape

        # only shrink if image is bigger than required
        if self.resolution[0] < image_height or self.resolution[1] < image_width:
            # get scaling factor
            scaling_factor = self.resolution[0] / float(image_height)
            if self.resolution[1] / float(image_width) < scaling_factor:
                scaling_factor = self.resolution[1] / float(image_width)

            # resize image
            frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)

        # Get image dimensions and padding after scaling
        image_height, image_width, image_channels = frame.shape

        x_padding = self.resolution[0] - image_width
        y_padding = self.resolution[1] - image_height

        if x_padding <= 0:
            x_padding_left, x_padding_right = 0, 0
        else:
            x_padding_left = int(np.floor(x_padding / 2))
            x_padding_right = int(np.ceil(x_padding / 2))

        if y_padding  detection_threshold:
            # Add bounding boxes to full resolution frame
            xmin = int(xscale * obj['xmin']) \
                   + int((obj['xmin'] - input_width / 2) + input_width / 2)
            ymin = int(yscale * obj['ymin'])
             max = int(xscale * obj['xmax']) \
                   + int((obj['xmax'] - input_width / 2) + input_width / 2)
            ymax = int(yscale * obj['ymax'])

            # Show Enhanced Face
            crop_img = frame[ymin - 45:ymax + 45, xmin - 30:xmax + 30]
            local_display.set_frame_data(crop_img)
            time.sleep(5)
        ...

def greengrass_infinite_infer_run():
    ...
    while True:
        # Get a frame from the video stream
        ret, frame = awscam.getLastFrame()
        # Resize frame to the same size as the training set.
        frame_resize = cv2.resize(frame, (input_height, input_width))
        ...
        model = awscam.Model(model_path, {'GPU': 1})
        # Process the frame
        ...
        # Set the non-cropped frame in the local display stream.
        local_display.set_frame_data(frame)
        
        # Get the detected faces and probabilities
        for obj in parsed_inference_results[model_type]:
           if obj['prob'] > detection_threshold:
               # Add bounding boxes to full resolution frame
               xmin = int(xscale * obj['xmin']) \
                      + int((obj['xmin'] - input_width / 2) + input_width / 2)
               ymin = int(yscale * obj['ymin'])
               xmax = int(xscale * obj['xmax']) \
                      + int((obj['xmax'] - input_width / 2) + input_width / 2)
               ymax = int(yscale * obj['ymax'])

               # Add face detection to iot topic payload
               cloud_output[output_map[obj['label']]] = obj['prob']

               # Zoom in on Face
               crop_img = frame[ymin - 45:ymax + 45, xmin - 30:xmax + 30]
               local_display.set_frame(crop_img)

        # Send results to the cloud
        client.publish(topic=iot_topic, payload=json.dumps(cloud_output))

greengrass_infinite_infer_run()