Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@ variable "kms_deletion_window_in_days" {

variable "enable_backups" {
type = bool
description = "Enable AwS cloud backup"
description = "Enable AWS cloud backup"
default = false
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,5 @@ module "ref-pointers-table" {

module "perftest-pointers-table" {
source = "../modules/pointers-table"
name_prefix = "nhsd-nrlf--perftest"
name_prefix = "nhsd-nrlf--perftest-baseline"
}
2 changes: 2 additions & 0 deletions terraform/infrastructure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ replacing `{ENV_NAME}` with the environment name (e.g. `dev`, `qa`, `qa-sandbox`

To tear down the infrastructure, you need to use Terraform to destroy the resources in your Terraform workspace.

First `make build-artifacts`. Then assume management and run `make get-s3-perms ENV={ENV_NAME}` in the project root.

To teardown the infrastructure, do the following:

```
Expand Down
2 changes: 1 addition & 1 deletion terraform/infrastructure/etc/perftest.tfvars
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
account_name = "perftest"
aws_account_name = "test"

dynamodb_pointers_table_prefix = "nhsd-nrlf--perftest"
dynamodb_pointers_table_prefix = "nhsd-nrlf--perftest-baseline"

domain = "perftest.record-locator.national.nhs.uk"
public_domain = "perftest.api.service.nhs.uk"
Expand Down
50 changes: 42 additions & 8 deletions tests/performance/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,44 @@
# Performance Testing

some high level context short
We have performance tests which give us a benchmark of how NRLF performs under load for consumers and producers.

## Run perf tests
## Run performance tests

### Prep the environment

Perf tests are generally conducted in the perftest env. There's a selection of tables in the perftest env representing different pointer volume scenarios e.g. perftest-baseline vs perftest-1million (todo: update with real names!).

To reset this table to the expected state for perftests, restore the table from a backup.
#### Point perftest at a different pointers table

In the steps below, make sure the table name is the table your environment is pointing at. You might need to redeploy NRLF lambdas to point at the desired table.
We (will) have multiple tables representing different states of NRLF in the future e.g. all patients receiving an IPS (International Patient Summary), onboarding particular high-volume suppliers.

In order to run performance tests to get figures for these different states, we can point the perftest environment at one of these tables.

Currently, this requires tearing down the existing environment and restoring from scratch:

1. Follow instructions in terraform/infrastructure/readme.md to tear down the perf test environment.
- Do **not** tear down shared account-wide infrastructure
2. Update `perftest-pointers-table.name_prefix` in `terraform/account-wide-infrastructure/test/dynamodb__pointers-table.tf` to be the table name you want, minus "-pointers-table"
- e.g. to use the baseline table `nhsd-nrlf--perftest-baseline-pointers-table`, set `name_prefix = "nhsd-nrlf--perftest-baseline"`
3. Update `dynamodb_pointers_table_prefix` in `terraform/infrastructure/etc/perftest.tfvars` same as above.
- e.g. to use the baseline table `dynamodb_pointers_table_prefix = "nhsd-nrlf--perftest-baseline"`
4. Commit changes to a branch & push
5. Run the [Deploy Account-wide infrastructure](https://github.com/NHSDigital/NRLF/actions/workflows/deploy-account-wide-infra.yml) workflow against your branch & `account-test`.
- If you get a terraform failure like "tried to create table but it already exists", you will need to do some fanangaling:
1. make sure there is a backup of your chosen table or create one if not. In the AWS console: dynamodb > tables > your perftest table > backups > create backup > Create on-demand backup > leave all settings as defaults > create backup. This might take up to an hour to complete.
2. once backed up, delete your table. In the AWS console: dynamodb > tables > your perftest table > actions > delete table
3. Rerun the Deploy Account-wide infrastructure action.
4. Terraform will create an empty table with the correct name & (most importantly!) read/write IAM policies.
5. Delete the empty table created by terraform and restore from the backup, specifying the same table name you've defined in code.
6. Run the [Persistent Environment Deploy](https://github.com/NHSDigital/NRLF/actions/workflows/persistent-environment.yml) workflow against your branch & `perftest` to restore the environment with lambdas pointed at your chosen table.
7. You can check this has been successful by checking the table name in the lambdas.
- In the AWS console: Lambda > functions > pick any perftest-1 lambda > Configuration > Environment variables > `TABLE_NAME` should be your desired pointer table e.g. `nhsd-nrlf--perftest-baseline-pointers-table`

If you've followed these steps, you will also need to [generate permissions](#generate-permissions) as the organisation permissions will have been lost when the environment was torn down.

### Prepare to run tests

#### Pull certs for env
#### Pull certs for perftest

```sh
assume management
Expand All @@ -26,14 +50,14 @@ make truststore-pull-all ENV=perftest
You will need to generate pointer permissions the first time performance tests are run in an environment e.g. if the perftest environment is destroyed & recreated.

```sh
make generate permissions # makes a bunch of json permission files
make generate permissions # makes a bunch of json permission files for test organisations
make build # will take all permissions & create nrlf_permissions.zip file

# apply this new permissions zip file to your environment
cd ./terraform/infrastructure
assume test # needed?
assume test
make init TF_WORKSPACE_NAME=perftest-1 ENV=perftest
tf apply
make ENV=perftest USE_SHARED_RESOURCES=true apply
```

#### Generate input files
Expand All @@ -49,3 +73,13 @@ make perftest-prepare PERFTEST_TABLE_NAME=perftest-baseline
make perftest-consumer ENV_TYPE=perftest PERFTEST_HOST=perftest-1.perftest.record-locator.national.nhs.uk
make perftest-producer ENV_TYPE=perftest PERFTEST_HOST=perftest-1.perftest.record-locator.national.nhs.uk
```

## Assumptions / Caveats

- Run performance tests in the perftest environment only\*
- Both producer & consumer tests are repeatable
- These tests work on the assumption that all nhs numbers in the test data are serial and lie within a fixed range i.e. picking any number between NHS_NUMBER_MINIMUM & NHS_NUMBER_MAXIMUM will yield a patient with pointer(s).
- Configure scenarios in the `consumer/perftest.config.json` & `producer/perftest.config.json` files. This does not alter the number of stages per scenario, that's fixed in `perftest.js`.
- Consider running these tests multiple times to get figures for a warm environment - perftest, unlike prod, is not well-used so you will get cold-start figures on your first run

\*These performance tests are tightly coupled to the seed scripts that populate test data. This means these tests can only be run in an environment containing solely test data created by the seed data scripts. `perftest` is a dedicated environment to do this in, but in theory any environment could be populated with the seed data and used.
209 changes: 208 additions & 1 deletion tests/performance/producer/seed_nft_tables.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,32 @@
import csv
from datetime import datetime, timedelta, timezone
from itertools import cycle
from math import gcd
from random import shuffle
from typing import Any, Iterator

import boto3
import fire

# import json
import numpy as np

from nrlf.core.constants import (
CATEGORY_ATTRIBUTES,
SNOMED_SYSTEM_URL,
TYPE_ATTRIBUTES,
TYPE_CATEGORIES,
)
from nrlf.core.dynamodb.model import DocumentPointer
from nrlf.core.logger import logger
from nrlf.tests.data import load_document_reference

dynamodb = boto3.client("dynamodb")
resource = boto3.resource("dynamodb")

logger.setLevel("ERROR")

# DOC_REF_TEMPLATE = load_document_reference("NFT-template")
DOC_REF_TEMPLATE = load_document_reference("NFT-template")

CHECKSUM_WEIGHTS = [i for i in range(10, 1, -1)]

Expand Down Expand Up @@ -66,3 +88,188 @@
"TRXT": 1,
}, # summary record currently has only one supplier
}

DEFAULT_COUNT_DISTRIBUTIONS = {"1": 91, "2": 8, "3": 1}


class TestNhsNumbersIterator:
def __iter__(self):
self.first9 = 900000000
return self

def __next__(self):
if self.first9 > 999999999:
raise StopIteration
checksum = 10
while checksum == 10:
self.first9 += 1
nhs_no_digits = list(map(int, str(self.first9)))
checksum = (
sum(
weight * digit
for weight, digit in zip(CHECKSUM_WEIGHTS, nhs_no_digits)
)
* -1
% 11
)
nhs_no = str(self.first9) + str(checksum)
return nhs_no


def _make_seed_pointer(
type_code: str, custodian: str, nhs_number: str, counter: int
) -> DocumentPointer:
"""
Populates the example pointer template with test data to create a valid NRL 3.0 pointer
"""
doc_ref = DOC_REF_TEMPLATE
doc_ref.id = f"{custodian}-{str(counter).zfill(12)}" # deterministic to aid perftest script retrieval
doc_ref.subject.identifier.value = nhs_number
doc_ref.custodian.identifier.value = custodian
doc_ref.author[0].identifier.value = "X26NFT"
doc_ref.type.coding[0].code = type_code
doc_ref.type.coding[0].display = TYPE_ATTRIBUTES.get(
f"{SNOMED_SYSTEM_URL}|{type_code}"
).get("display")
type_url = f"{SNOMED_SYSTEM_URL}|{type_code}"
category = TYPE_CATEGORIES.get(type_url)
doc_ref.category[0].coding[0].code = category.split("|")[-1]
doc_ref.category[0].coding[0].display = CATEGORY_ATTRIBUTES.get(category).get(
"display"
)
nft_pointer = DocumentPointer.from_document_reference(doc_ref, source="NFT-SEED")
return nft_pointer


def _populate_seed_table(
table_name: str,
px_with_pointers: int,
pointers_per_px: float = 1.0,
type_dists: dict[str, int] = DEFAULT_TYPE_DISTRIBUTIONS,
custodian_dists: dict[str, dict[str, int]] = DEFAULT_CUSTODIAN_DISTRIBUTIONS,
):
"""
Seeds a table with example data for non-functional testing.
"""
if pointers_per_px < 1.0:
raise ValueError("Cannot populate table with patients with zero pointers")
# set up iterations
type_iter = _set_up_cyclical_iterator(type_dists)
custodian_iters = _set_up_custodian_iterators(custodian_dists)
# count_iter = _set_up_cyclical_iterator(DEFAULT_COUNT_DISTRIBUTIONS)
count_iter = _get_pointer_count_poisson_distributions(
px_with_pointers, pointers_per_px
)
# count_iter = _get_pointer_count_negbinom_distributions(px_with_pointers, pointers_per_px)
testnum_cls = TestNhsNumbersIterator()
testnum_iter = iter(testnum_cls)

px_counter = 0
doc_ref_target = int(pointers_per_px * px_with_pointers)
logger.log(
f"Will upsert ~{doc_ref_target} test pointers for {px_with_pointers} patients."
)
doc_ref_counter = 0
batch_counter = 0
unprocessed_count = 0

pointer_data: list[list[str]] = []

start_time = datetime.now(tz=timezone.utc)

batch_upsert_items: list[dict[str, Any]] = []
while px_counter < px_with_pointers:
pointers_for_px = int(next(count_iter))

if batch_counter + pointers_for_px > 25 or px_counter == px_with_pointers:
response = resource.batch_write_item(
RequestItems={table_name: batch_upsert_items}
)

if response.get("UnprocessedItems"):
unprocessed_count += len(
response.get("UnprocessedItems").get(table_name, [])
)

batch_upsert_items = []
batch_counter = 0

new_px = next(testnum_iter)
for _ in range(pointers_for_px):
new_type = next(type_iter)
new_custodian = next(custodian_iters[new_type])
doc_ref_counter += 1
batch_counter += 1

pointer = _make_seed_pointer(
new_type, new_custodian, new_px, doc_ref_counter
)
put_req = {"PutRequest": {"Item": pointer.model_dump()}}
batch_upsert_items.append(put_req)
pointer_data.append(
[
pointer.id,
pointer.type,
pointer.custodian,
pointer.nhs_number,
]
)
px_counter += 1

if px_counter % 1000 == 0:
logger.log(".", end="", flush=True)
if px_counter % 100000 == 0:
logger.log(
f" {px_counter} patients processed ({doc_ref_counter} pointers)."
)

logger.log("Done.")

end_time = datetime.now(tz=timezone.utc)
logger.log(
f"Created {doc_ref_counter} pointers in {timedelta.total_seconds(end_time - start_time)} seconds (unprocessed: {unprocessed_count})."
)

with open("./dist/seed-nft-pointers.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(["pointer_id", "pointer_type", "custodian", "nhs_number"])
writer.writerows(pointer_data)
logger.log(f"Pointer data saved to ./dist/seed-nft-pointers.csv") # noqa


def _set_up_cyclical_iterator(dists: dict[str, int]) -> Iterator[str]:
"""
Given a dict of values and their relative frequencies,
returns an iterator that will cycle through a the reduced and shuffled set of values.
This should result in more live-like data than e.g. creating a bulk amount of each pointer type/custodian in series.
It also means each batch will contain a representative sample of the distribution.
"""
d = gcd(*dists.values())
value_list: list[str] = []
for entry in dists:
value_list.extend([entry] * (dists[entry] // d))
shuffle(value_list)
return cycle(value_list)


def _get_pointer_count_poisson_distributions(
num_of_patients: int, pointers_per_px: float
) -> Iterator[int]:
p_count_distr = np.random.poisson(lam=pointers_per_px - 1, size=num_of_patients) + 1
p_count_distr = np.clip(p_count_distr, a_min=1, a_max=4)
return cycle(p_count_distr)


def _set_up_custodian_iterators(
custodian_dists: dict[str, dict[str, int]],
) -> dict[str, Iterator[str]]:
custodian_iters: dict[str, Iterator[str]] = {}
for pointer_type in custodian_dists:
custodian_iters[pointer_type] = _set_up_cyclical_iterator(
custodian_dists[pointer_type]
)
return custodian_iters


if __name__ == "__main__":
fire.Fire(_populate_seed_table)