Idempotence
All systems should be idempotent and properly handle concurrency in a heavily asynchronous, distributed system.
#
IdempotenceIn a distributed system, requests can fail at any time or be duplicated (pub/sub on SNS/SQS is "at least once delivery"). In order to insulate the system from discrepancies, we need mechanisms in place to ensure idempotence. on an atomic level.
Examples of not having idempotent logic with the example here being about email click events incrementing a single resource:
#
DuplicationAt least once delivery:
- click event is created
- click event job triggered via pub/sub (even if this is not the mechanism, may be the case in the future)
- three click event jobs trigger
- metrics are updated by +3 instead of by +1
#
Consistent Failure- email event is created
- click event job triggered via pub/sub
- one click event jobs triggers
- click event job updates metric properly, but job does not exit cleanly
- why? worker autoscaling group is scaled down, one-off exception, etc
- click event gets retried automatically due to MQ retry mechanisms, potentially failing multiple more times
- results in metric incrementing multiple times
The result of not having properly locked idempotent processing logic could result in inconsistent summary data.
#
StrategiesTo ensure concurrency, there are a number of strategies we use. Here are some:
- Redis as a distributed mutex
- Given its single-threaded nature, Redis can be used as a distributed mutex
- Purchase::ComputeJob
- https://redis.io/topics/distlock
- Amex API Request Header
- Amex requires a single API header to be shared for all requests on the same second. We use Redis to synchronize this request header across our fleet of servers.
- Provider::Anon::Client::Oauth
- Database index with a uniqueness constraint
- ActiveRecord's find_or_create logic is not atomic