Apr 2, 2014

REST, ROA, and HATEOAS often leads to bad webservice design

This is not to say that they are bad, but I find that all too frequently the resulting API's are poorly designed due to forgetting one thing, RPC (Remote Procedure Call) is expensive. Now by RPC, I do not mean custom messaging formats such as SOAP, or XML-RPC, I mean calling a method on a remote server. Do not think that just because you are using HTTP as the message format with something like XML or JSON, that calling GET /resource, is significantly all that different from calling get_resource in a SOAP call. The frequent idempotence also does not mean that you're not actually doing RPC as often good method design server side also implies idempotence, e.g. adding an object to a Set in Java will not result in the object being added twice if you add it twice. All calls to a remote is a form of RPC. The most expensive part of RPC is creating a new connection, just how depends on the protocol. This is why web sockets, for instance, is much cheaper than repeated calls (there are other reasons and expenses too, like maintaining many connections).

I've worked with a few Resource Oriented Architecture (ROA) web services, and they each suffered from the same flawed design, an excessive number of RPC calls was required to do seemingly simple tasks. This is caused by the, misguided, belief that every single aggregate should be it's own resource and that components of the aggregate should also have it's own resource, and that those should be the only access to the underlying aggregate. In one case working with an ROA we had to do about 5 RPC calls for every single product we wanted to create, and we were bulk creating. This problem was aggravated by the lack of an idempotent PUT for most resources.

The reality is, with a good API design we could have created all, of the objects we needed with a single API call to a bulk interface. I'm talking the RESTful equivalent to a Java Collection.addAll( objs[] ). In fact if you use addAll on a Set, the result of multiple same calls is idempotent, the same object will not be added twice. It would be really easy to write this given a good ORM, and a good interface so that you could do a POST or PUT to /entities. this is a significant improvement to a design where you'd have to do a PUT or POST for every single item you wanted to create. DELETE may be the only place where I'd consider not doing a bulk request, and it is generally able to be completed asynchronously. You may of course consider limiting the number of entities acted on in a request, so if you need to create 1000 entities, it might take 10 requests doing 100 at a time, this is still better for both the client and the server than doing 1000 requests.

The choice between PUT and POST depends on whether you believe that the call to GET must return the exact same view as PUT, meaning that a PUT would delete resources not included (for a single aggregate that's probably true), or should the behavior be equivalent to addAll or replacing the reference to the collection with a new one. Remember PUT must be idempotent, this only means that subsequent calls using the exact same arguments should result in the exact same result. You may want to consider using a different URI for manipulating your entity collections in these ways.

Another problem that was encountered with a web service we encountered is it had sub resources, that had to exist prior to creating the resource we needed to create, akin to tags. Not having a idempotent put to that resource meant we were doing create on exception update. But given the simplicity of this resource it would have been even better to just allow the api to take the final object representation of that resource, instead of requiring the id, and done a lookup by name, or a create or update, under the hood. Doing this is more difficult logic wise, and impossible if there's no natural key (because you can't look it up).

You probably are asking yourself, but how do I handle errors for these things. Well, the way I see it you have three options. One requests are a transaction, so you wrap your database code with a transaction, and it either succeeds or fails, you can return a 200 on success, ensure HATEOAS, with links to any new resources in the response. Two, you could allow partial success, and return the successful objects. Three you could return a custom message envelope payload, this isn't very RESTful because it's a protocol on top of HTTP (it's more like SOAP).

I'm currently working on designing a new REST Web Service, and I've decided that no page load, or "single conceptual action" should take more than 6 API requests. This number is not arbitrary, it's the median concurrent connection amount, per host name, for consumer web browsers. Even that number is too many, but I felt that I needed to alot more than one request allowed due to some completely different actions that may need to occur on a page load.

Keep on with the Resource Oriented REST with HATEOAS, just try to think of how to minify the number of calls could by designing less granular resources

2 comments:

  1. Typos: s/indem/idem/g

    Data::HAL author here. I wish you would go into more concrete detail so we can compare notes. Talk to me in Freenode #rest if you like.

    I'm not sure whether you are in control of the software running on the server; the body of the article reads as if you are hampered by one bad implementation of an ROA. In contrast the heading reads as if necessarily all ROA designs turn out bad, and RPC ones don't -- which I don't see as true.

    > The choice between PUT and POST depends on whether you believe that the call to GET must return the exact same view as PUT

    It says nowhere that GET must return the same entity. PATCH is appropriate for mangling certains fields or parts of an entity, POST is appropriate for creating a new one. Some people also use POST for mangling which comes at the cost of a new media type (whether explicit or implicit), which I find is lame.

    Related to this, have a look at draft-snell-http-prefer §4, this gives the client more control over the returned entity for write operations.

    > partial success

    Not a good idea because it very much complicates programming on the consuming side. HTTP doesn't have facilities for partial success anyway. So if the bulk transaction fails, reject the request altogether with the appropriate 4xx client status code. That way the user agent can safely retry the whole thing, or perhaps try again with a smaller window of bulk inserts/updates.

    > Keep on with the Resource Oriented REST with HATEOAS, just try to think of how to minify the number of calls could by designing less granular resources

    I agree that how clients and user agents use the interface must inform the evolution of design decisions, but removing granularity isn't the solution. One should design and offer both granular and aggregate resources appropriate to the use cases.

    ReplyDelete
  2. Thank you for the spelling correction, I should know better, apparently I have it's spelling stuck in my brain wrong.

    Yes the title is meant to be provocative.

    I've seen 3 ROA designs turn out bad, and none good. That of course doesn't mean they can't turn out good. Simply that I believe that the theory as taught encourages granular resources, which is bad web service design. (Again both RESTful HTTP and Custom Message Envelopes are a form of RPC ) 2 of them were the same software with competing APIs... I just have a much higher success rate with older SOAP and other "Custom Message Envelope" APIs.

    No I suppose nothing says they must, but it's encouraged to use the same representation for multiple operations on the same resource.

    PATCH is only variably available, not all REST libraries support it, for example AngularJS doesn't.

    I probably wouldn't recommend partial success either. I thought I made it clearer that I'd go with a full transaction for the add? if not certainly I don't recommend it, it's just an option of sorts, and not very RESTful

    I'm not necessarily referring to removing granularity, rather that adding bulk interfaces is a better thought, and a well designed bulk interface can be used granularly. There is no reason to say that just because , for example, you are POSTing to /entities that you have to have more than one at a time should you desire.

    The problem I've seen though, and this is even seen in my earlier naivety (this blog is meant to help people learn from my own mistakes as well), is that most REST/ROA design discussions encourage granularity to exclusivity. Even the thesis says use HATEOAS URIs unless 80% of calls need the full body of the resource, the problem with that statement is that how can you know what 80% is unless you've measured, how can you have measured unless you built it already. When are you actually going to bother to go back and fix that once you've measured? If you're shipping a product that's not SAAS, how are you going to measure. If your API is at all consumed by external clients, how do you know what the needs are ahead of time, you don't. However since a bulk operation can be used granularly, it could kill both birds with one stone if designed that way in the beginning.

    ReplyDelete

No trolling, profanity, or flame wars :: My Blog, my rules! No crying or arguing about them.