I am researching Atom feeds as a way of distributing event data as part of our organisation's internal REST APIs. I can control the feeds and ensure:
- there is a "head" feed containing time-ordered events with an etag which updates if the feed changes (and short cache headers).
- there are "archive" feeds containing older events with a fixed etag (and long cache headers).
- the events are timestamped and immutable, i.e. they happened and can't change.
The question is, what must the consumer remember to be sure to synchronize itself with the latest data at any time, without double processing of events?
- The last etag it processed?
- The timestamp of the last event it processed?
I suppose it needs both? The etag to efficiently ask the feed if there's been any changes, (using HTTP If-None-Match) and if so, then use the datestamp to apply only the changes from that updated feed that haven't already been processed...
The question is nothing particularly to do with REST or the technology used to consume the feed. It would apply for anyone writing code to consume an Atom based RSS feed reader, for example.
UPDATE
Thinking about it - some of the events may have the same timestamp, as they get "detected" at the same time in batches. Could be awkward then for the consumer to rely on the timestamp of the last event successfully processed in case its processing dies half way through processing a batch with the same timestamp... This is why I hate timestamps!
In that case does the feed need to send an id with every event that the consumer has to remember instead? Wouldn't that id have to increment to eternity, and never ever be reset? What are the alternatives?