Dsco provides several different API flavors for getting data. Some of the approaches are traditional and others are more cutting edge. Dsco strongly recommends not just falling back to the more traditional approaches but to work with Dsco to choose the method that most closely matches the partner's use case.
For each object type - order, item inventory, item catalog, invoice, etc. - Dsco allows partners to get exactly one object by its unique key. For example, a specific order can be retrieved by it's purchase order number and the inventory for a single item may be retrieved by its sku. These calls are synchronous, returning the object, if found, as the body of the response.
This API is not designed to be hit massively - it may only be called 10 times a second. Please choose one of the other API approaches to detect change.
Get Change as Webhook Batch
The principle Dsco partner use case is to simply to find out when change occurs in an object type, say anytime the inventory changes on any items shared with the partner. In the past, many Dsco partners have been reticent to have Dsco push this change as it occurs to them as individual webhook http calls. Partners provided feedback that webhook adoption was difficult for two reasons. The first is that Dsco was previously sending each change as a separate HTTP call to the partner's callback URL and this could put quite a burden on the partner's infrastructure. The second reason is that Dsco did not have very resilient retries when the partner's infrastructure got backed up and rejected Dsco's call. So, Dsco provided another API to find out which webhook events were rejected by the partner's gateway so the partner could retrieve missed webhook change events. This is a pain.
This new approach solves these issues. Here's how it works...
- The partner will create a webhook based on some query params that detect when objects are created and updated, say when item inventory is updated, passing in the callback URL and the frequency the change events are to be sent
- Dsco detects new and changed objects that match the query params and queues them up
- Every X amount of time where X is the frequency specified by the partner when he/she created the webhook, Dsco will send a single webhook event to the partner's callback URL
- The webhook event contains only a dedicated URL that the partner may do an HTTP GET on to get all the change events that have queued up since the last webhook in a single shot
- The partner then performs the HTTP GET on the dedicated URL received in the webhook event and processes the change events
Why not just send the change events as the body of the webhook? Why add the complexity of asking the partner to turn around and retrieve the results? The answer is that depending on the amount of change, the size of the objects and the frequency the webhooks are requested for there's no way the data would fit in a single request. Batched data should be retrieved as a single GET request all at once whenever possible. It's just simpler, safer and more performant. Also, it puts the partner in the driver seat about pulling the data in at its own pace, while still getting pushed the fact that the next batch is ready to go.
The partner may also call a webhook API to update the state of the webhook to cause it on the next interval to not send only changes detected during that interval but to send *all* data, effectively allowing a true up of all data. This could by a partner to periodically true up, say nightly, or after a failure on the partner's side to true up.
Also, a webhook may be created with a configuration paramater that says, don't send me changes each interval period, send me all data, such as all inventory data.
Get Change as Stream
While the Get Change as Batch Webhook is a great approach, it does have two drawbacks. The first is the latency in waiting for the batch to grow until it's time to send the batch. For the second, there's no easy way for a partner to go back in time should a failure occur and get all the change that occurred from a specific point in time. This approach has all the benefits of the Webhook Batch approach with no drawbacks, except for getting comfortable with stream processing.
The partner needs to understand what a stream is and how to use it and while that may sound scary, the investment will in most cases be worth the time. Start by reading about the general computing concept of streams made popular in Unix back in the day. Today, platforms like the Leo Platform and Apache Kafka help developers make use of stream processing.
Here's how it works...
- The partner calls the Create Stream API, passing in what type of changed objects should flow in the stream, say updated item inventory, and any additional query params to customize the data in the stream
- Dsco will then return the Stream JSON object as the response body of the Create Stream API call which will include the dedicated stream URL
- Dsco will immediately begin gathering all changed objects that match the query params and feeding them into the stream
- Dsco maintains a pointer, called a checkpoint, which is where in the stream the partner has read up to and initially that pointer will be at the very beginning of the stream
- The partner will use sample code provided by Dsco in one of several languages to setup attaching to the stream and will begin pulling from wherever the checkpoint is in the stream
- The partner will pull down records at its own pace and as it pulls them down will make tiny HTTP calls back to the dedicated stream URL to update where the checkpoint, only doing so once the partner is sure it has successfully processed the records up to that point (Dsco's sample boilerplate stream code can hide this detail from the developer)
- The records will flow down to the partner just as fast as the partner can handle them and when there are no change records, the stream just hangs out, waiting for new change records
Stream processing and reactive approaches is taking over the world of computing. Yes, it is something new to many and it sounds complicated but the problem at hand, getting change from Dsco, is ideally suited to this design pattern/approach. Here's a few reasons why...
- This approach provides the benefits of batch processing but without having inherent delays, as soon as Dsco discovers a changed object it flows down to the partner through the stream and if the partner is backed up processing the stream the changed object automatically queues up and waits
- The partner is completely in the driver seat about how much data to get at what pace and as long as the partner can keep up with the change over time, everything runs like a smooth machine
- Let's say the partner's streaming app that receives these change events crashes; no problem, all the events are sitting in the stream waiting to be pulled down and when the streaming app comes back up they can be handled
- Let's say the partner can't pull down the events fast enough one after the other; no problem, streams are designed to be processed in parallel and as long as the partner's infrastructure has the capacity, the objects can be pulled down and processed in parallel
- Let's say the partner's stream processing app was unaware that the partner's upstream back office system was failing for the past hour and the last hour's events need to be reprocessed; no problem, the partner's stream processing app could send a call to the dedicated stream URL, resetting the checkpoint for the stream back in time one hour to just before the failure occurred, and then the stream will begin feeding events from that time and continue forward from that point, catching up as quickly as it can