A weekly AWS Lambda function to retrieve all ONS Digital GitHub usernames and ONS verified emails from GitHub GraphQL API.
This allows Digital Landscape's Address book page to translate between usernames and email addresses for easy identification of different repo owners in the ONS.
- GitHub Repository Address Book Lambda
- Table of Contents
- Prerequisites
- Makefile
- Documentation
- Development
- Running the Project
- Deployment
- Linting and Testing
- GitHub Actions
- Linters Used
- Running Linting and Tests Locally
- FAQs and troubleshooting tips
- Why do I get an S3 permissions error when writing outputs?
- Why are some users missing email addresses in the outputs?
- How do I run locally and see logs?
- MkDocs won’t serve or pages 404 locally.
- Where are the outputs written?
- A Docker Daemon (Colima is recommended)
- Terraform (For deployment)
- Python >3.12
- Make
This repository makes use of a Makefile to execute common commands. To view all commands, execute make all.
make allThis project uses MkDocs for documentation. The documentation is located in the docs directory. To view the documentation locally, you can run the following commands:
-
Install MkDocs and its dependencies:
make install-docs
-
Serve the documentation locally:
make docs-serve
-
Open your web browser and navigate to
http://localhost:8000.
Optional:
- Build static site:
make docs-build - Deploy to GitHub Pages manually:
make docs-deploy(CI also deploys on pushes tomain) - Deply Mkdocs Action
To work on this project, you need to:
-
Create a virtual environment and activate it.
Create:
python3 -m venv venv
Activate:
source venv/bin/activate
-
Install dependencies
Production dependencies only:
make install
Dependencies including dev dependencies (used for Linting and Testing)
make install-dev
To run the project during development, we recommend you run the project outside of a container
To run the Lambda function outside of a container, we need to execute the lambda_handler() function.
-
Uncomment the following at the bottom of
lambda_function.py(in./src/folder).... # if __name__ == "__main__": # try: # lambda_handler(event={}, context=None) # except Exception as e: # print(f"Error running lambda_handler locally: {e}") ...
Please Note: If uncommenting the above in
lambda_function.py, make sure you re-comment the code before pushing back to GitHub. -
Export the required environment variables:
export AWS_ACCESS_KEY_ID=<access_key_id> export AWS_SECRET_ACCESS_KEY=<secret_access_key> export AWS_REGION=eu-west-2 export AWS_SECRET_NAME=<secret_name> export S3_BUCKET_NAME=<bucket_name> export GITHUB_ORG=<org> export GITHUB_APP_CLIENT_ID=<client_id> export GITHUB_APP_ID=<app_id> export GITHUB_APP_CLIENT_SECRET=<app_client_secret>
-
Run the script.
poetry run python3 src/lambda_function.py
To run the project, a Docker Daemon is required to containerise and execute the project. We recommend using Colima.
Before the doing the following, make sure your Daemon is running. If using Colima, run colima start to check this.
-
Containerise the project.
docker build -t address-book-lambda . -
Check the image exists (Optional).
docker images
Example Output:
REPOSITORY TAG IMAGE ID CREATED SIZE github-repository-address-book-synchroniser-script latest b4a1e32ce51b 12 minutes ago 840MB
-
Run the image.
docker run --platform linux/amd64 -p 9000:8080 \ -e AWS_ACCESS_KEY_ID=<access_key_id> \ -e AWS_SECRET_ACCESS_KEY=<secret_access_key> \ -e AWS_REGION=<region> \ -e AWS_SECRET_NAME=<secret_name> \ -e GITHUB_ORG=<org> \ -e GITHUB_APP_ID=<app_id> \ -e GITHUB_APP_CLIENT_ID=<client_id> \ -e S3_BUCKET_NAME=<bucket_name>\ -e GITHUB_APP_CLIENT_SECRET=<app_client_secret> github-repository-address-book-synchroniser-script
When running the container, you are required to pass some environment variables:
Variable Description GITHUB_ORG The organisation you would like to run the tool in. GITHUB_APP_CLIENT_ID The Client ID for the GitHub App which the tool uses to authenticate with the GitHub API. GITHUB_APP_ID Numeric ID of the GitHub App used for authentication. GITHUB_APP_CLIENT_SECRET Client secret for the GitHub App OAuth authentication. AWS_REGION The AWS Region which the Secret Manager Secret is in. AWS_SECRET_NAME Name of the AWS Secrets Manager secret to retrieve. S3_BUCKET_NAME The name of the S3 bucket the Lambda writes AddressBook JSON files to. AWS_ACCESS_KEY_ID AWS access key ID for the configured IAM credentials AWS_SECRET_ACCESS_KEY AWS secret access key for the configured IAM credentials Once the container is running, a local endpoint is created at
localhost:9000/2015-03-31/functions/function/invocations. -
Check the container is running (Optional).
docker ps
Example Output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ca890d30e24d github-repository-address-book-synchroniser-script "/lambda-entrypoint.…" 5 seconds ago Up 4 seconds 0.0.0.0:9000->8080/tcp, :::9000->8080/tcp recursing_bartik
-
Post to the endpoint (
localhost:9000/2015-03-31/functions/function/invocations).curl "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
This will run the Lambda function and, once complete, will return a success message.
-
After testing stop the container.
docker stop <container_id>
When the Lambda runs successfully, it writes three JSON files into your configured S3 bucket under the
AddressBook/prefix:- AddressBook/addressBookUsernameKey.json: username -> list of verified org emails
Example:
{ "alice": ["alice@org.com", "alice2@org.com"], "bob": ["bob@org.com"] } - AddressBook/addressBookEmailKey.json: email -> username
Example:
{ "alice@org.com": "alice", "bob@org.com": "bob" } - AddressBook/addressBookIDKey.json: username -> GitHub account ID
Example:
{ "alice": 101, "bob": 202 }
Note: The
AddressBook/path is an S3 key prefix used to group these files in the bucket. - AddressBook/addressBookUsernameKey.json: username -> list of verified org emails
Example:
This repository is designed to be hosted on AWS Lambda using a container image as the Lambda's definition.
There are 2 parts to deployment:
- Updating the ECR Image.
- Updating the Lambda.
Before following the instructions below, we assume that:
- An ECR repository exists on AWS that aligns with the Lambda's naming convention,
{env_name}-{lambda_name}(these can be set within the.tfvarsfile. See example_tfvars.txt). - The AWS account contains underlying infrastructure to deploy on top of. This infrastructure is defined within sdp-infrastructure on GitHub.
- An AWS IAM user has been setup with appropriate permissions.
Additionally, we recommend that you keep the container versioning in sync with GitHub releases. Internal documentation for this is available on Confluence (GitHub Releases and AWS ECR Versions). We follow Semantic Versioning (Learn More).
When changes are made to the repository's source code, the code must be containerised and pushed to AWS for the lambda to use.
The following instructions deploy to an ECR repository called sdp-dev-address-book-synchroniser. Please change this to <env_name>-<lambda_name> based on your AWS instance.
All of the commands (steps 2-5) are available for your environment within the AWS GUI. Navigate to ECR > {repository_name} > View push commands.
-
Export AWS credential into the environment. This makes it easier to ensure you are using the correct credentials.
export AWS_ACCESS_KEY_ID="<aws_access_key_id>" export AWS_SECRET_ACCESS_KEY="<aws_secret_access_key>"
-
Login to AWS.
aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com
-
Ensuring you're at the root of the repository, build a docker image of the project.
docker build -t sdp-dev-address-book-synchroniser .Please Note: Change
sdp-dev-address-book-synchroniserwithin the above command to<env_name>-<lambda_name>. -
Tag the docker image to push to AWS, using the correct versioning mentioned in prerequisites.
docker tag sdp-dev-address-book-synchroniser:latest <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-dev-address-book-synchroniser:<semantic_version>
Please Note: Change
sdp-dev-address-book-synchroniserwithin the above command to<env_name>-<lambda_name>. -
Push the image to ECR.
docker push <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-dev-address-book-synchroniser:<semantic_version>
Once pushed, you should be able to see your new image version within the ECR repository.
Once AWS ECR has the new container image, we need to update the Lambda's configuration to use it. To do this, use the repository's provided Terraform.
Within the terraform directory, there is a service subdirectory which contains the terraform to setup the lambda on AWS.
-
Change directory to the service terraform.
cd terraform/service -
Fill out the appropriate environment variables file
env/dev/dev.tfvarsfor sdp-dev.env/prod/prod.tfvarsfor sdp-prod.
These files can be created based on
example_tfvars.txt.It is crucial that the completed
.tfvarsfile does not get committed to GitHub. -
Initialise the terraform using the appropriate
.tfbackendfile for the environment (env/dev/backend-dev.tfbackendorenv/prod/backend-prod.tfbackend).terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
Please Note: This step requires an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be loaded into the environment if not already in place. This can be done using:
export AWS_ACCESS_KEY_ID="<aws_access_key_id>" export AWS_SECRET_ACCESS_KEY="<aws_secret_access_key>"
-
Refresh the local state to ensure it is in sync with the backend, using the appropriate
.tfvarsfile for the environment (env/dev/dev.tfvarsorenv/prod/prod.tfvars).terraform refresh -var-file=env/dev/dev.tfvars
-
Plan the changes, using the appropriate
.tfvarsfile.i.e. for dev use
terraform plan -var-file=env/dev/dev.tfvars
-
Apply the changes, using the appropriate
.tfvarsfile.i.e. for dev use
terraform apply -var-file=env/dev/dev.tfvars
Once applied successfully, the Lambda and EventBridge Schedule will be created.
To delete the service resources, run the following:
cd terraform/service
terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
terraform refresh -var-file=env/dev/dev.tfvars
terraform destroy -var-file=env/dev/dev.tfvarsPlease Note: Make sure to use the correct .tfbackend and .tfvars files for your environment.
This file contains 2 GitHub Actions to automatically lint and test code on pull request creation and pushing to the main branch.
This repository uses the following linting and formatting tools:
Python:
- Black: code formatter (used for formatting and format checks)
- Ruff: Python linter (also used for autofixes and import sorting)
- Mypy: static type checker (configured via
mypy.ini)
Other Languages:
- MegaLinter: runs a broad set of linters.
Configuration notes:
- Mypy reads settings from
mypy.ini.
To lint and test locally, you need to:
-
Install dev dependencies
make install-dev
-
Run all the linters
make lint
-
Run all the formatting
make format
-
Run all the tests
make test -
Run Megalinter
make megalint
Please Note: This requires a docker daemon to be running. We recommend using Colima if using MacOS or Linux. A docker daemon is required because Megalinter is ran from a docker image.
- Ensure the Lambda execution role has
s3:PutObjectpermission on the target bucket and theAddressBook/prefix.
- Only verified organisation emails are included. Users without a verified org email will not appear in
addressBookUsernameKey.jsonoraddressBookEmailKey.json.
- Export the required environment variables (see above) and run the handler locally.
- Run
make install-docsfirst, thenmake docs-serve. Verifymkdocs.ymlnav matches files underdocs/.
- To your configured S3 bucket under the
AddressBook/prefix as three JSON files.
For more Q&A: see the dedicated FAQ.