Accessing RESTful information efficiently

Alright, so I appreciate the idea of RESTful Web Services, but I’ve got a small dilemma I’d appreciate some opinions on.

In the RESTful Web Services book, by Leonard Richardson and Sam Ruby, there’s emphasis on making the programmable web look like the human web, by following an architecture oriented to having addressable resources rather than oriented to remote procedure calls. Through the book, the RPC (or REST-RPC, when mixed with some RESTful characteristics), is clearly downplayed. In some cases, though, it’s unclear to me what’s the extent of this advice. Humans and computers are of course very different in the nature of tasks they perform, and how well they perform them. To illustrate the point clearly, let me propose a short example.

Let’s imagine the following scenario: we are building a web site with information on a large set of modern books. In this system, we want to follow RESTful principles strictly: each book is addressable at http://example.com/book/<id>, and we can get a list of book URIs by accessing http://example.com/book/list?filter=<words>.

Now, we want to allow people to easily become aware of the newest edition of a given book. To do that, we again follow RESTful characteristics and add a, let’s say, new-editions field to the data which composes a book resource. This field contains a list of URIs of books which are more recent editions of the given book. So far so good. Looks like a nice design.

Now, we want to implement a feature which allows people to access the list of all recent editions of books in their home library, given that they know the URIs for the books because a client program stored the URIs locally in their machines. How would we go about implementing this? We certainly wouldn’t want to do 200 queries to learn about updates for 200 books which a given person has, since that’s unnecessarily heavy on the client computer, on the network, and on the server. It’s also hard to encode the resource scope (as defined in the book) in the URI, since the amount of data to define the scope (the 200 books in our case) can be arbitrarily large. This actually feels like perfectly fit for an RPC: “Hey, server, here are 200 URIs in my envelope.. let me know what are the updated books and their URIs.” I can imagine some workarounds for this, like saving a temporary list of books with PUT, and then doing the query on that temporary list’s URI, but this feels like a considerably more complex design just for the sake of purity.

When I read examples of RESTful interfaces, I usually see examples about how a Google Search API can be RESTful, for instance. Of course, Google Search is actually meant to be operated by humans, with a simple search string. But computers, unlike humans, can thankfully handle a large volume of data for us, and let us know about the interesting details only. It feels a bit like once the volume of data and the complexity of operations on that data goes up, the ability for someone to do a proper RESTful design goes down, and an RPC-style interface becomes an interesting option again.

I would be happy to learn about a nice RESTful approach to solve this kind of problem, though.

This entry was posted in Architecture. Bookmark the permalink.

28 Responses to Accessing RESTful information efficiently

  1. Adenilson Cavalcanti says:

    Gustavo

    Google calendar protocol uses the concept of ‘batch request’, where you can perform several operations by POST-ing an event feed with an entry for each operation that you want to perform.

    Concerning your example of updates of 200 books, it solves this for calendar events by executing a query by updated, where all changes (e.g. in your example, books that has new editions) will be retrieved.

    Not sure if this solves your problem…
    :-)

    Best regards

    Adenilson

  2. Sean Gillies says:

    If you make the list of latest editions of your collected books a resource of its own (say, /collection/latest), where all the URIs are known to this resource, it’s only one request to get all the latest editions. A bit like a news feed, yeah? This is a common RESTful design pattern.

  3. Adenilson,

    This is a very interesting pattern I didn’t think of. Thanks for bringing this up! Unfortunately, though, this doesn’t solve the main dilemma, since as the name implies (batch request), this is actually a remote procedure call which diverges from the principles of truly RESTful architectures.

    Sean,

    Please note that the URIs are in the client side, not in the server side. Your suggestion is the same I mentioned in the blog post I believe:

    I can imagine some workarounds for this, like saving a temporary list of books with PUT, and then doing the query on that temporary list’s URI, but this feels like a considerably more complex design just for the sake of purity.

    This suggestion will, unfortunately, change a read-only service into a read-write one, and make things considerably more complex (to implement, to use, to maintain, …). If that’s the best RESTful pattern in these cases, then there’s good value in the RPC style which is being overlooked in some discussions.

  4. Sean Gillies says:

    Gustavo, a read-only service doesn’t have any social value (in my opinion). I’ve pointed to a few read-write examples (LibraryThing, Zotero, OpenLibrary) at http://sgillies.net/blog/916/collections-queries-and-rest/.

  5. Sean, do you mean Google Search has no social value? :-)

    In your blog post you basically ignore my use case entirely and then wander on to say how wonderful RESTful architectures are in general. You’ve basically replaced the simple query I was looking for with user accounts, authentication, writable user collections, and so on.

    Again, if that’s the best RESTful pattern in these cases, then there’s good value in the RPC style which is being overlooked in some discussions, and it should come as no surprise that people are implementing REST-RPC interfaces, as described in the book.

  6. Sean Gillies says:

    I suggested a way to make things more efficient by keeping more app state, including user-generated state, on your server. That’s all. Well, maybe I went off on too much of a “Web 2.0” rant. If you want to keep your server side light, then yes, I agree, you’ll want to use RPC and HTTP POST.

  7. I don’t see what’s the problem with the 200 queries. They will be cached, proxied, load-balanced, and otherwise made efficient by the architecture of the web. If later the client adds one book to his library, and he makes an «update the list of editions» query, only one book update will be transfered — all the remaining 200 will result in a «not changed» and save effort. Under your «200 in a single query» model, anything that happen to any book require sending the data about the 200 again.

    In any case, if you really want to treat the list of books as a first-class object, then the solution is (as people pointed above) to make it a resource. IMO in a real REST application the list would be stored in the server anyway, because hypermedia is the engine of application state — storing the list locally sounds too much like distributing application state, like cookies (and thus breaking the whole system of the consistency of the web — bookmarking, back-button, caching, et cetera.) If you want to store a lot of data locally, you’re not making a web program, you’re coercing HTTP into the transport protocol of your non-web program.

  8. By the way, you’ll only understand what’s the appeal of REST when you grok hypermedia as the engine of application state.

    Contrariwise to much popular opinion, REST is not just having URLs for things, is not about CRUDing over HTTP, and is definitely not about «making the programmable web more human-friendly» (it’s the opposite, if anything; as people have pointed, guessable URI schemes are kind of un-RESTful, because they encourage humans to break HATEOAS). «Restful Web Services» is a good start, but why don’t you try the source? (Also the source’s blog).

  9. In particular, any reader of «RESTful Web Services» should be required to read these two posts & discussions:

    http://roy.gbiv.com/untangled/2008/on-software-architecture

    http://intertwingly.net/blog/2008/03/23/Connecting

    (Hey look, it’s me there asking noob questions… I forgot how much I struggled with this, it feels so natural now 8)

  10. Leonardo,

    Thank you very much for that view. The fact that there’s caching and whatnot should certainly be taken into consideration in such a problem, and you’re right, it’s probably indeed a good choice to just do the 200 requests blindly in many situations.

    But again, I think this isn’t a global answer. Downloading 200 small resources from a well connected remote web site serially can take a long time to process, which certainly restricts the kind of user interaction this will provide. This becomes more obvious as you increase the number of items, and consider that it’s easy to optimize requests for various items at once in most databases.

    I’m also a bit surprised by this statement:

    If you want to store a lot of data locally, you’re not making a web program, you’re coercing HTTP into the transport protocol of your non-web program.

    This is a very limited view of what the web means. By definition, the web is only the web because there are links between resources. For service A to link to service B, it has to know about what’s in service B, and thus store information locally. This happens all the time, in many different ways. If you want the silliest possible example, look at your browser’s bookmarks.

    Also, on:

    IMO in a real REST application the list would be stored in the server anyway

    Again, you’re limiting a lot what is described in these papers you point to.

    I’ve read the source also. :-) I’ll check out these links you provide too, thank you!

  11. > By definition, the web is only the web because there are links between resources.

    Yes, of course. And what is a «list of resources» if not a resource? When you make this object a) a first-class object and b) stored outside the web, you’re making a non-web application (or, if you want, an application that talks to the web, but still is not a web application). That’s why my firefox bookmarks are not a web application, but del.icio.us is.

  12. So the conclusion is that a browser is a non-web application. I don’t think I want to argue about this one. :-)

    del.icio.us has hundreds of thousands of local (in their servers) links to remote resources. This is good enough for the point I’m making.

  13. As for your example of web service A talking to web service B —if both web services use REST, due to HATEOAS, web service A won’t store the «list of links» at all. Instead, it will just store web service’s B URL, and starting from it web service B will provide all needed resources/states for A. This is what Roy calls «late binding of application alternatives». If I store, say, a list of links to Amazon products, then I have to upgrade my application if Amazon changes. That is un-RESTful. If instead the list of links is stored in Amazon, I can move to another terminal and use the application in the same way. If I store a list of links to each of a banking’s operation, I have local application state I need to babysit, but if I let the operations use hypertext as the engine, I can work anywhere.

    It’s really just a choice of how to design applications. You’re focusing in a particular way of doing things and complaining that REST does not do it. Try to focus instead on the fact that REST is an *architectural style* (not an architecture, nor *the* architecture), and think of what REST architectures buy you. For example, think of the upcoming Google Chrome OS — can you see why HATEOAS is interesting for it? Consider caches, proxies, browser implementors, web server implementors, load-balancers, your back button. What happens to each of these guys when you break HATEOAS?

  14. You’re purposefully changing the definition of local/remote to suit your argument. First you say a bookmark is local because it is in my machine, then it is local because it’s on del.icio.us machine. Which one is local? If I make a del.icio.us link to my computer, then del.icio.us is local and my computer is remote?

    I think you’re misunderstanding what «application state» means in the REST thesis (not an uncommon misunderstanding). What part of the save-a-bookmark flow I am at the moment is application state. What links are stored in del.icio.us are *resource* state. HATEOAS is a constraint on application state (indeed it would be hard to apply it to resource state!) If this sounds too abstract, think of your original question as it relates to del.icio.us. Suppose they want to store a cache of HTML s for all the URLs they store. How do you think they would query a web service, say Amazon, for such a list of titles? Picture your list of thousands of links in a single RPC request, and compare that to the REST approach of using the constrained HTTP verbs. Why would anyone *not* want to use the RPC request? Again, think of the interplay of agents in the web: caches, concurrent servers, concurrent clients, et cetera.

  15. > So the conclusion is that a browser is a non-web application. I don’t think I want to argue about this one. :-)

    A browser’s bookmarks certainly are not an web application. Cookies are also not a web application. Javascript and Java Applets are frequently used to create non-web applications, as is Flash. My Firefox plugin that make English words hover over Japanese text is not a web application. The part of the browser that does HTTP GETs and POSTs is, however, a web application. Browsers, like Emacs, are an application environment; both web and non-web applications can live in them. That’s why Google is calling Chrome an “OS” now.

    All of these non-web browser apps could theoretically be made into web applications to create an web-based web browser. Del.icio.us is web bookmarks, rikai.com is web rikaichan. But making a pure-REST browser probably isn’t interesting. RESTful architectures are not the silver bullet.

    One way to understand this is to think: what part of your browser would benefit if a super-fast-smart cache was added between you and the Internet? That part is the web (i.e. REST) part.

  16. As I said, I don’t want to argue about this. You’re diving into terminology and creating a particular notion of what The Web means that doesn’t have to be shared by others, and that doesn’t contribute to the points I’m raising in this post.

  17. You’re purposefully changing the definition of local/remote to suit your argument.

    Not at all. Local is relative. But again, we’re diving into terminology here, and that’s not useful for the point I was trying to understand.

    My understanding of the status quo is that if I want to query information about 200 resources at once in a responsive manner from a read-only system, none of that helps, and RPC is your friend.

  18. One very pragmatic tool is the URI template (on its way to being standardised, but probably not without significant change unfortunately). I see them as analogous to HTML forms using the GET method, with the added ability to populate path segments as well as query parameters.

    I’ve built a site discovery, resource metadata and “instant client API” system that wraps URI Templates in JSON and points to them with link headers (much nearer standardisation) – there’s an overview at http://positiveincline.com/?p=440. I’m on the lookout for anyone interested in collaborating on either implementing it for their open source Rails application or in porting it to Django (some of the core pieces exist in Python already).

  19. Actually why can we not treat this as a finder with OR (just like google search terms ) ? Seems like a straightforward solution to me. I mean 200 URIs separated by an OR or equivalent clause is a large data but not really that large. Or is it ?

  20. Mike,

    It’s an interesting idea, but it’s not clear to me how that would help with the issue above, perhaps due to my lack of understanding of what URI templates will bring in addition to the templating itself.

    Dhananjay,

    It is a possibility, but one of the issues with this approach is that there are arbitrary and not-well established limits to how long a URI can be. There’s even an error code (413) which is used when the URI is too long.

    I guess a possibility would be to ensure clients implement proper batching of the requests, on a trial and error basis. Not so great, but could work.

    Update: hmm.. on a second thought, I fail to see how that would be an advantage over usual RPC over HTTP. The arbitrary way in which URIs would be built would kill most of the benefits of REST-styled architectures.

  21. Gustavo,

    It is not unacceptable to have optional scoping parameters being passed as a part of the URL (eg. for pagination). If thats kosher for ReST would it be OK to pass the parameters as a part of POST instead of GET ?

    Now maybe that breaks the URI notion of ReST. But what if I remodeled the functionality differently (I’m just thinking aloud). What if I PUT a “Query” resource and the response is a List of Books feed ? Is that acceptable ? Maybe and maybe not. Let us go with the latter. Now, I can PUT a Query resource which has a list of older URIs – the response returns me a QueryID. Note that the Query is not a typical query but is a collection of URIs. Could I now do a GET on a BookSet resource using that QueryID ? Perhaps. I could even extend it further by allowing to append more URIs by doing a POST using the earlier obtained QueryID. (Note that the QueryID could interchangeably be also referred to as a BookSetID – just depends upon which way one wants to view it). Now one gets good batching as well.

    That seems to work for me.

    Cheers.

  22. I still get confused by put and post :) Meant a POST instead of PUT in the comment above.

  23. If thats kosher for ReST would it be OK to pass the parameters as a part of POST instead of GET ?

    You certainly can, but that’s what’s referred to as the RPC-style, because you’re doing a remote call to the final web server and expecting an answer to your specific query which has to be computed. All of the RESTful properties are gone in this case, because there’s no layering, caching, etc.

    Please check out the book or some of the resources Leonardo Boiko suggested above for a more in-depth description of the reasoning behind the REST-style.

  24. You certainly can, but that’s what’s referred to as the RPC-style, because you’re doing a remote call to the final web server and expecting an answer to your specific query which has to be computed.

    Well RPC style primarily differentiates itself by a profusion of verbs – thats not what we are seeing here. If variable optional arguments is RPC style then any pagination implementation that is otherwise ReSTful should be treated as RPC style. So I would not jump so far as to say it is RPC style. However I would admit that that was not the most satisfactory solution (which is why I went ahead with offering two more refinements on the same).

    All of the RESTful properties are gone in this case, because there’s no layering, caching, etc.

    If you review the final solution I suggested you will find that all the ReSTful properties are still available. You are responding to only the intermediate solution I discuss in a conversational style.

    And yes, I did review the book and the resources referred to, many times over ;) In fact while not on the same specific issue, I did offer a whole bunch of opinions on ReST here http://blog.dhananjaynene.com/2009/06/musings-on-rest/ and here : http://blog.dhananjaynene.com/2009/07/presentation-rest-explained/

  25. Sorry, I guess I misunderstood what your actual suggestion was.

    What you describe is the same I mentioned in the blog post I guess:

    I can imagine some workarounds for this, like saving a temporary list of books with PUT, and then doing the query on that temporary list’s URI

    I probably wasn’t clear, since that was brought up by Leonardo too.

  26. Gustavo, you’re right – on too a quick reading I recognized the search requirement (and search does generalize quite nicely) but in my haste overlooked the problem that there are hard limits to what can be coded in a URI.

    To be honest, I think a big POST is fine in your example, since at the time your first request is made the server has no concept at all of the local “resource”. Afterwards though, there’s nothing then preventing the server from maintaining a version of the of it as an identifiable resource on the web that could support further interaction.

  27. John Morrissey says:

    I’ve implemented Ian Bicking’s multipart MIME idea (http://blog.ianbicking.org/restful-transactions.html) in a couple of cases.

    It breaks some RESTful principles (proxying, most notably), but each MIME part amounts to a separate HTTP request that can be replayed into the same handlers that would be triggered by that request when submitted “normally.”

    It’s also nice since you don’t need to keep a cross-request transaction open when you want to make multiple updates transactionally.

  28. Steve Alexander says:

    What’s the point of using a REST architectural style?

    Roy Fielding writes:
    “REST is an architectural style that, when followed, allows components to carry out their functions in a way that maximizes the most important architectural properties of a multi-organizational, network-based information system. In particular, it maximizes the growth of identified information within that system, which increases the utility of the system as a whole.”

    http://roy.gbiv.com/untangled/2008/on-software-architecture

    Here’s how I apply this to your example. You already decided that you don’t want to make the list of 200 books into a piece of “identified information”. This list exists only on the client. So, there is no benefit in applying the REST architectural style to this part of the overall “information system about books” problem.

    It’s great that a book is individually addressable for CRUD operations (that is, each book has become “identified information”).

    But, unless the list of 200 books that the client is interested in becomes a piece of identified information, this part of the system is just part of the client, not part of the web. So, if the regular REST API of the server is insufficient for the client, because it is too inefficient for your overall design, it makes sense to have a specific operation between client and server to support this feature of the client. Nothing has been lost by having the operation fall outside of the REST architectural style, because the information it is the “subject” of the interaction is not identified information on the web.

Leave a Reply

Your email address will not be published. Required fields are marked *