Do not understand stateful vs stateless servlets

I don't understand what stateful and stateless servlets are about. As far as I can tell, the documentation at Web Applications in Racket describes a lot of functions that can be used to interact with the web server, but doesn't provide any conceptual model for the server itself, or how all thee functions interact. What is a servlet, for example. What does it mean for it to be stateful or stateless? How do they interact? How are these continuations kept for when they are to be used again?, etc., etc.
Is there other documentation somewhere?

5 Likes

I'm a big advocate of the Racket web server, but I want to start by acknowledging that there is a real learning curve. While the reference documentation is very thorough, it can also be genuinely confusing, in part, I think, because there is a lot of history that can sometimes occlude the currently-recommended ways of doing things. I'm interested in working on documentation improvements, though my time is constrained; you can see some of my thoughts in more detail at web-server/servlet/web seems to inhibit stateless operation · Issue #126 · racket/web-server · GitHub (where you can also see that even a co-author of papers about the web server can get confused!). Hearing more about where the gaps are from someone encountering the material for the first time would be very helpful in thinking about improvements.

More Documentation

In case you haven't seen them, some other pieces of documentation about the web server are:

  • Web Server: HTTP Server is the reference manual, roughly, for the more "back end" parts of the web server. (One problem is that it is not always clear what goes in which document. There is also especially much documentation here for deprecated features.)
  • Continue: Web Applications in Racket is a general Racket tutorial, but uses as its running example a simple application with the web server.
  • More: Systems Programming with Racket is, as its name suggests, really a tutorial about Racket's features for systems programming, but it walks through the implementation of a simplified version of the Racket web server, so it can be useful if you want to know how things work "under the hood".

What is a servlet?

In simplest terms, a servlet is a function from a request value to a response value. (IIUC, the terminology came from Java, which may have been a more useful reference point c. 2001–2006 than it is today.)

Broadly speaking, the servlet function implements your application logic, roughly corresponding to the functionality you might otherwise implement in a pile of CGI scripts, PHP files, a Python Flask application, etc. I'm trying to very loosely distinguish this from both what the Python and Ruby communities often call "middleware", and from the kind of logic you might otherwise need to implement in your Apache configuration file: those are handled by other parts of the web server, which I'll address later.

You can write a servlet with using continuations at all. You might use the 5 URL-Based Dispatch library for a match-like way to parse and construct URL paths, or you can more explicitly use functions to access a request's uri, method, headers, query bindings, and more. You can use any techniques you may be familiar with from other kinds of web programming.

Continuations are useful to implement interactions that take place over multiple HTTP requests. Because HTTP is a stateless protocol, when sending a response, you have to somehow persist any local variables and other state you need and find them again when you get a corresponding new request. This creates an "inversion of control" and often means needing to program in something like continuation-passing style and invent ad-hoc serialization schemes. In particular, it is easy to write code with subtle bugs in the face of browser features like the "Back" button and "Duplicate Tab".

If you choose to use functions like send/suspend and send/suspend/dispatch, the Racket web server manages all of this automatically, allowing you to program as though sending an HTTP response and awaiting a follow-up HTTP request were a normal, synchronous operation. Conceptually, the fact that these functions are implemented using continuations is something of an implementation detail. However, it is relevant because the Racket web server offers multiple implementation strategies with different trade-offs:

Stateful Servlets

If you write your servlet in #lang racket or similar and use the variants of send/suspend et al. from (require web-server). you are writing a stateful servlet using native continuations. In this case, functions like send/suspend create continuations in the ordinary Racket sense. The continuations are basically saved in a hash table with an automatically generated key. (This is a slight simplification; more complex scenarios are supported.) The key is embedded into the "path parameter" part of the URL send/suspend gives to your response-generating callback. The web server, in one of the layers I will explain later, automatically looks for such a path parameter in incoming requests and, if one is present, resumes the corresponding continuation with the request, which becomes the return value of send/suspend.

The main advantage of this strategy is that you can use all of Racket freely to implement your servlet, with the usual semantics and no special constraints.

However, there are two significant, closely-related disadvantages:

  1. The native continuations exist only as runtime state within the Racket process. If the server restarts, all previously-created continuations expire. There is no direct support for running multiple instances of the server behind a load balancer.

  2. Each saved native continuation uses memory on the server. Because users may keep references to the corresponding URLs in bookmarks, links from other pages, or even their brains, there is no generally-correct way for garbage collection to automatically reclaim them. The server provides a "manager" interface for defining an expiration policy, with built-in support for timeouts, memory limits, and LRU caching. Still, there is a fundamental tradeoff between using excessive memory or continuations expiring annoyingly quickly. As a community, we probably also have limited experience with what configurations and resource limits work well now, as opposed to 13 years ago.

This is the implementation strategy described in the paper Implementation and Use of the PLT Scheme Web Server and the even older papers The Continue Server (or, How I Administered PADL 2002 and 2003) and Programming the Web with High-Level Programming Languages. While I found all of these papers useful in providing context and a conceptual overview, do note their age: none of them are exactly current best practice for the Racket web server, even if you use stateful servlets.

Stateless Servlets

If you use the variants of send/suspend et al. from #lang web-server or #lang web-server/base, you are writing a stateless servlet using serializable continuations. This implementation strategy addresses the disadvantages of stateful servlets by creating Automatically RESTful Web Applications. (As Jay explains in the first footnote of the paper, representational state transfer means storing state on the client; it does not necessarily imply some particular url naming scheme.) In this implementation strategy, after macro expansion, #lang web-server performs a whole-module transformation based on A-normal form, making continuations explicit and transforming closures to use serializable structs. The stateless version of send/suspend uses the replacement versions of call/cc etc. that operate on #lang web-server's serializable continuations. When send/suspend generates a url, instead of a key referring to a native continuation in memory, it instead directly embeds the serialized continuation, which includes serializing all of the values the continuation closes over. A serialized continuation thus consumes no resources on the server.

(Occasionally, particularly if the continuation is very large, it may be preferable not to actually send the whole serialized continuation to the client. The stuffer abstraction, among many other features, provides optional support for using content-addressed storage in the filesystem or a database and sending only a hash of the serialized continuation to the client. Strictly speaking, this option does use some server resources, but it still avoids the problematic resource issues with stateful servlets.)

There are several small differences between the implementation strategies, but two notable limitations mean that stateless servlets are not strictly better than stateful ones:

  1. The most significant requirement is that all of the values closed over by the continuation must be serializable in the sense of racket/serialize. Usually this is not too onerous and amounts to using serializable-struct instead of struct. Still, it's something you have to consider even in parts of your application that otherwise might be quite independent of web interaction.

  2. A more obscure limitation is that only continuation frames transformed by #lang web-server (or #lang web-server/base, or other languages that use their variant of #%module-begin) are serializable, so the continuation to be serialized must not include any frames from untransformed modules.

    For a concrete example, consider map. Most uses of map in #lang web-server are perfectly fine. However, you must not use send/suspend inside the callback that you pass to map. If you did so, the continuation you attempted to capture would include between your call to map and map's call to your callback frames from inside the implementation of map: since map is not written in #lang web-server, that portion of the continuation would not be serializable.

    This problem does not come up very often in practice, and usually it can be avoided without much trouble. On those few occasions when you really want to do this, you can use serial->native, native->serial, and/or define-native to create a hybrid continuation that stores the native part as with stateful servlets and serializes the serializable parts: see Jay's paper The Two-State Solution: Native and Serializable Continuations Accord for details.


I've got to go for now, but I'll plan to follow up with more about the other layers of the web server.

11 Likes