Skip to content

Improve scalability #2

@manuelkasper

Description

@manuelkasper

This issue currently has low priority as the existing server capacity is probably sufficient to handle a 10x user growth even without any optimizations. It is here primarily to keep in mind if we refactor the server anyway.

Currently, the HamAlert server spawns a number of child processes to spread the work of matching spots against triggers across multiple CPU cores. The main process distributes the spots to be processed to the child processes in a round-robin fashion via IPC, and gets back the results (matching users/triggers). It then performs rate-limiting and sends out the appropriate notifications.

The downside to this approach is that each matcher child process needs to build a (big) data structure, consisting of a multi-dimensional hashmap and Roaring Bitmaps, on its own, based on the trigger definitions from all users read from MongoDB. Furthermore, in order for trigger changes by the users to take effect in a reasonable amount of time, these data structures need to be rebuilt periodically, currently once a minute. This causes a significant amount of baseline CPU load and memory usage. In theory the work would only need to be done once, and not individually by each matcher process, as the resulting data structure is always the same.

Unfortunately, with Node.js IPC, transferring the data structure from the parent process to the children (with required marshalling/unmarshalling as Roaring Bitmaps cannot be transferred as-is) doesn't save much time compared to simply rebuilding them from scratch in the child.

Another parallel processing option in Node.js are worker threads. While these allow sharing data among threads, this is limited to raw SharedArrayBuffers, and does not apply to more complex data structures like multi-dimensional hashmaps and Roaring Bitmaps.

Implementing delta updates to the data structures, based on a change stream (old trigger, new trigger), would alleviate the need to periodically regenerate the entire data structures, thus reducing the CPU load but not the memory usage. It could be implemented with the help of MongoDB Change Streams (although those are only available in replica sets or sharded clusters, which we currently don't have – to be investigated). Alternatively, changes could be distributed to the matcher processes through PubSub, e.g. via Redis. But then one has to be careful to keep the data consistent, and perhaps still do a full refresh every now and then.

However, generally thinking about ways to horizontally scale the spot processing, there are basically two possible approaches:

  1. Shard by users: e.g. have 4 processes, with each process only handling 25% of the users (trigger definitions), but processing every incoming spot.
  2. Shard by spots: e.g. have 4 processes, with each process only handling 25% of the spots, but matching each spot against all users/triggers.

Option 1 is better if the number of users grows quicker than the number of spots, and for Option 2 vice-versa. One could also envision sharding in both dimensions. For example, if each spot got assigned a letter A, B, C or D in turn, and the set of all users was divided in four groups 1, 2, 3 and 4, then one could run 16 matcher processes to handle each possible combination of spot or user (A1, A2, A3, A4, B1, B2, ...). This would be a generic solution, and by tuning the number of user and spot shards, one could adapt to a growth either in users or in spots.

A challenge with sharding by spots is rate-limiting, which needs to be done per user. This would require the processes to share information on which spots have already been sent to a given user recently, or the rate-limiting (and notifying) to be done in separate processes sharded only by user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions