Skip to main content

Idempotence

All systems should be idempotent and properly handle concurrency in a heavily asynchronous, distributed system.

Idempotence#

In a distributed system, requests can fail at any time or be duplicated (pub/sub on SNS/SQS is "at least once delivery"). In order to insulate the system from discrepancies, we need mechanisms in place to ensure idempotence. on an atomic level.

Examples of not having idempotent logic with the example here being about email click events incrementing a single resource:

Duplication#

At least once delivery:

  • click event is created
  • click event job triggered via pub/sub (even if this is not the mechanism, may be the case in the future)
  • three click event jobs trigger
  • metrics are updated by +3 instead of by +1

Consistent Failure#

  • email event is created
  • click event job triggered via pub/sub
  • one click event jobs triggers
  • click event job updates metric properly, but job does not exit cleanly
    • why? worker autoscaling group is scaled down, one-off exception, etc
  • click event gets retried automatically due to MQ retry mechanisms, potentially failing multiple more times
  • results in metric incrementing multiple times

The result of not having properly locked idempotent processing logic could result in inconsistent summary data.

Strategies#

To ensure concurrency, there are a number of strategies we use. Here are some: