Idempotence

All systems should be idempotent and properly handle concurrency in a heavily asynchronous, distributed system.

Idempotence#

In a distributed system, requests can fail at any time or be duplicated (pub/sub on SNS/SQS is "at least once delivery"). In order to insulate the system from discrepancies, we need mechanisms in place to ensure idempotence. on an atomic level.

Examples of not having idempotent logic with the example here being about email click events incrementing a single resource:

Duplication#

At least once delivery:

click event is created
click event job triggered via pub/sub (even if this is not the mechanism, may be the case in the future)
three click event jobs trigger
metrics are updated by +3 instead of by +1

Consistent Failure#

email event is created
click event job triggered via pub/sub
one click event jobs triggers
click event job updates metric properly, but job does not exit cleanly
- why? worker autoscaling group is scaled down, one-off exception, etc
click event gets retried automatically due to MQ retry mechanisms, potentially failing multiple more times
results in metric incrementing multiple times

The result of not having properly locked idempotent processing logic could result in inconsistent summary data.

Strategies#

To ensure concurrency, there are a number of strategies we use. Here are some:

Redis as a distributed mutex
- Given its single-threaded nature, Redis can be used as a distributed mutex
- Purchase::ComputeJob
- https://redis.io/topics/distlock
Amex API Request Header
- Amex requires a single API header to be shared for all requests on the same second. We use Redis to synchronize this request header across our fleet of servers.
- Provider::Anon::Client::Oauth
Database index with a uniqueness constraint
- ActiveRecord's find_or_create logic is not atomic