Mark Richards

System Outline

This is an extension of the “AWS Area Eligibility API“ project that required a slightly modified architecture to implement. It performs the same basic function that the eligibility API does but it allows users to upload a file to the lambda function. This file is then processed, and a file is returned to the user with the results of the processing. All of this is done without the use of S3 or signed URLs to minimize complexity and cost.

This uses the same static page as the Eligibility API with the small modification that DNS is now handled by AWS Route 53 instead the original Google Domains. The difference is the removal of the AWS API Gateway and the reconfiguration of all sub API Calls performed by the function.

The API call made to the Census bureau now uses its batch processing service where a CSV is supplied, and matching coordinates and geographies are returned. Input files can be of arbitrary size as the Lambda function will partition them and recombine them after the API call.

The calls DynamoDB to also now operate in partitioned batched to stay under the 100 item/16mb query limits.

The API call to ESRIs public APIs for hosted feature layers is a new addition that is necessary for performing point in polygon checks of rural designation.

All of these pieces get combined and a CSV file showing the results is returned to the user to complete the API call.

Try it here: Summer Food Batch Processing

Implementation Lessons

I had implemented this in a different code base with a different language and falsely assumed that translating it would be a simple task. The architectural changes proved to be fairly challenging to overcome.

POST Multipart

Parsing multipart form data is more complex than it needs to be. Parsing the form data in python required multiple packages that must be added as layers to the lambda function.

Cant use API Gateway with long processing times

Once moving past a test file with 5 addresses there was an issue with service timeout. It turns out there is a 30 second time limit for responses when using the API gateway. Need to use function URLs for longer running processes.

Lambda timeout

There is a lambda function timer that needs to be extended for processes that will run longer than the default 3 seconds. Because of the multiple external API calls I had to extend this multiple times to up over 5 minutes.

Lambda Memory

The default is to give lambda 128mb of ram. When performing merging of two data frames the function was failing. This was because it needed more than the 128mb of ram. using CloudWatch you can see how much memory was used so after bumping to 256mb I could see that with a large file the function used 180mb. I decided to keep it at 256 just to have a small buffer.

Boto3 DynamoDB batch_get_item

One headache I hadnt expected and spent more time than I care to admit working out was deduplicating a batch lookup of dynamo items. I had not considered that I would have duplicates but this causes the lookup to fail and it does so silently. So initial tests worked as there was no location overlaps, but in QA testing the entire function would fail with no feedback.

Numpy int64 and truncation

one of the steps in the process needs to take some numbers and convert them to strings then concatenate them. This became awkward when it would create concatenated strings of NaNNaNNaN or NoneNaNNaN etc. So each row needed to be processes. but that added a new problem of numbers being given decimals of ".0" which would then be added to strings. To solve this the Math.Trunc() function was used. This appeared fine at first but later would fail silently. Leading to a considerable amount of time being spent debugging. In the end the Numpy.Array() module was used and then cast to the int type with .astype(int)

Retrospective

I spent multiple weeks on what I had initially assumed would be a simple conversion. I had created API calls in lambda before. I had attached functions to API gateways before. I had all the component pieces I needed and just had to put them together. Or so I thought.

Hours were spent in VS code and postman submitting POST requests to the endpoints and reviewing CloudWatch logs to understand what was happening. There was a lot of time spent switching between local development IDEs and deploying new versions of the code in lambda. This was a reminder that things may not always go as planned, but with time and focus you can find your way though.

In the future I might try the local development tools for lambda to spend less time in the AWS console.

Return To AWS Projects