URL-builder macro

Hi, Racket Discourse.

I have been tinkering with this idea for the past couple of weeks, because I realized I was abstracting the URLs of API endpoints at the wrong level in my HTTP requests library, which made it hard to patch features as they became necessary.

The new system allows one to extend URLs incrementally (via copying), which was not possible with the previous iteration.

I haven't gotten around to properly implementing the "file" aspects of URL parsing in the url library, but for now the feature-set is complete enough to start using again.

There is quite a bit of room for optimization left on the table, in terms of the way values are converted to and from strings in the macro's internals, but that will hopefully become more elegant as time goes on.


The macro is called url-builder and looks something like this:

(url-builder
 #:http (as sky) (at www) (on 801)
 #:path / cgi-bin / finger [xyz #false]
 #:query [name "shriram"] [host "nw"]
 #: top)

;=> (url "http" "sky" "www" 801 #t (list (path/param "cgi-bin" '()) (path/param "finger" '("xyz"))) '((name . "shriram") (host . "nw")) "top")

Using the example URL from the docs, we can see that in this macro:

  • the scheme is indicated by the keyword, #:http,
  • the user, sky, is indicated by the (as ...) syntax,
  • the host, www, is indicated by the (at ...) syntax,
  • the port, 801, is indicated by the (on ...) syntax,
  • the path is reasonably close to a normal URL path, except that the parameters look like:
   [name arg ...]
   ;; which is equivalent to "name=arg, ..."
  • the query follows the same logic, except that it's parameters have no path-element component,
  • the fragment is indicated by the empty keyword, #:.

One is free to use strings and identifiers as literal values in the syntax, such as sky or www, but unquoting and splicing values are also permitted:

(define subdomain 'www)
(define domain    'host)
(define tld       'com)

(define path*  '(cgi-bin finger))
(define param* '(() ("xyz")))

(url-builder
 #:http (as "sky") (at ,subdomain ,domain ,tld) (on 801)
 #:path / ,path* ,param* ... ...
 #:query [name "shriram"] [host "nw"]
 #: top)

;=> (url "http" "sky" "www.host.com" 801 #t (list (path/param "cgi-bin" '()) (path/param "finger" '("xyz"))) '((name . "shriram") (host . "nw")) "top")

Furthermore, one may unpack the values of a URL:

(define a-url (string->url "http://sky@www:801/cgi-bin/finger;xyz?name=shriram;host=nw#top"))

(url-builder
 #:scheme `(scheme ,a-url) (as `(user ,a-url)) (at `(host ,a-url)) (on `(port ,a-url))
 #:path / `(path ,a-url) ...
 #:query `(query ,a-url) ...
 #: `(fragment ,a-url))

;=> (url "http" "sky" "www" 801 #t (list (path/param "cgi-bin" '()) (path/param "finger" '("xyz"))) '((name . "shriram") (host . "nw")) "top")

Attempting to use an invalid field is a syntax error:

(url-builder
 #:scheme `(scheme ,a-url) (as `(user ,a-url)) (at `(host ,a-url)) (on `(port ,a-url))
 #:path / `(path ,a-url) ...
 #:query `(thing ,a-url) ...
 #: `(fragment ,a-url))

;=>
url-builder: expected a foreign clone expression from `url-query'
  parsing context: 
   while parsing a sequence of url query parameters, optionally ending on a query splice expression
   while parsing an optional, keyword-delimited url query in: (quasiquote (thing (unquote a-url)))

In a case such as the above, where one is only copying from a single URL, a shorthand exists to use it as a "prototype":

(url-builder
 #:use a-url
 #:scheme `* (as `*) (at `*) (on `*)
 #:path / `* ... #:query `* ... #: `*)

;=> (url "http" "sky" "www" 801 #t (list (path/param "cgi-bin" '()) (path/param "finger" '("xyz"))) '((name . "shriram") (host . "nw")) "top")

Further miscellanies include that:

  • the scheme can be set via an unquote after the #:scheme keyword,
  • the host may contain multiple values, which are concatenated by ".",
  • the host may specify an IPv6 value, as in (at [2001:db8::7]),
  • the path can be spliced in three different ways:
    • <elem> <param> ... (one path element, multiple parameters)
    • <elems> <params> ... ... (multiple path elements, multiple parameters)
    • <path> ... (complete path/param list)
  • non-obvious prototype path-copying includes:
   `(elems ,x) `(params ,x) ... ...
   
   ;; as opposed to
   
   `(path ,x) ...
  • relative paths can be indicated by dropping the leading /:
   (url-builder
    #:https (at www)
    #:path `(path ,a-url) ...)

   ;=> (url "https" #f "www" #f #f (list (path/param "cgi-bin" '()) (path/param "finger" '("xyz"))) '() #f)
  • the query may end in a query-splice expression:
   (url-builder
    #:https (at www)
    #:path #:query [param #false] `(query ,a-url) ...)

   ;=> (url "https" #f "www" #f #f '() '((param . #f) (name . "shriram") (host . "nw")) #f)

There is still some work to be done with the unquoting operations, to ensure that it does not fail silently (which is very confusing to say the least), but apart from that--and the inefficient conversions--it seems pretty solid.

Pros:

  • the syntax is more "tangible" than using strings, and may catch some errors before runtime,
  • the syntax allows one to freely interpolate strings, symbols and in certain cases numbers,
  • the result is a (hopefully) valid url struct, which is used ubiquitously.

Cons:

  • @-syntax exists, if interpolating strings is that big of a deal,
  • the procedures are not very efficient at this point in time,
  • the syntax is much more elaborate than a URL-string, although the jury is out on whether the added sophistication outweighs this cost,
  • no exhaustive test-suite as of yet, which makes a lot of these statements speculation, at best.

So, what do you think about this syntax: Does it add anything to the problem of building URLs that you would consider to be a boon, or do you feel like this is much ado about nothing?

Have you come across similar macros for building URLs, or perhaps implemented some of your own?

P.S. I have uploaded the code to a gist, but I emphasize that it will probably not remain this way for very long.

3 Likes

This looks cool! A few miscellaneous thoughts:

  • In my experience there's usually value to both "extremes" of representation:

    • a hot bowl of porridge (a DWIM string or lang-in-a-string like URL strings or regular expression strings)

    • a cold bowl of porridge (a struct or dict or tuple where the parts are super obvious)

    But not usually a warm bowl. Anyway, it sounds like you want to make "a better cold bowl", which sounds great.


  • Tiny first reaction 1: I wondered why at, as, on weren't also keywords #:at, #:as, #:on?

  • Tiny first reaction 2: #:use seems to be "struct-copy by other means", but that's probably fine.


  • For structs generally, sometimes a keyword constructor can be more readable, especially when a struct has many members. Adding one for url could be handy. Names are hard; let's pencil in url/kw.

  • Given such a url/kw plain old function... even just that might suffice for some people.

  • Wrapping that in a macro could add syntactic sugar. Keyword args of plain old functions take just one value, so must be a list e.g. #:path '("path" "to" "here"). Whereas, as you know, a macro could consume multiple values up to the next keyword, like #:path "path" "to" "here" (and allow unquoted symbols like #:path path to here, as you do).

  • IOW this might be one of those times for a division of labor between function and syntax wrapper, and provide both.

Sorry for the brain-dump! I hope at least some of that might be useful?

2 Likes

Hi, @greghendershott.

What a wonderful analogy, you might even say it's just right!

So, I kind of did start out with a warm-bowl-of-porridge approach. I was annoyed by the fact that the arguments to parameters in a url are strings, and set out to make an intermediate struct called urx which stored the parameters more uniformly in both the path and query fields as association-lists.

But I soon realized that this was equally annoying, because I had then to convert back to normal url structs in any case, which seemed to be 6 of the one and half a dozen of the other.

I ended up reasoning that what I actually wanted was to have a lightweight (but more structured) way of filling in a template for a url instead of necessarily having something which was editable after the fact. If I can make them faster than I can break them, why break them at all.

It was "accidental". I started out with the authority sequence looking like:

<user> #:at <host> #:on <port>

where the user and host were strings and the port a natural number, or unquoted values. This follows directly from the way it appears in a URL.

I had also by this time decided to make the scheme a keyword of the scheme, or #:scheme ,x for unquoting.

Then, when I decided to allow the host to possibly contain separate parts, I thought that perhaps the bare sequences like:

#:http ,sub ,domain ,tld #:on ,port ...

might become too flat and noisy, and therefore defeat the "more structured" goal somewhat.

Shotgun in hand, I wrapped up all those bad boys in named parentheses and called it a day; the scheme kind of acting as a keyword (when not unquoting) for the authority sequence, if you squint.

I was not aware of struct-copy, honestly. That's crazy. It would be trivial to attach a flag to the values being copied (or not) and then use this to fill-in in the struct-copy macro, instead of deriving identifiers on the fly. Good reference.

To my first, warm-boal approach, this is in fact what I should have considered next, since a constructor procedure can do what it wants, it can convert any intermediate values as required, which somewhat resolves the tension between "editing" and "structure", combined with the make-don't-break concept.

The sugar on top is then that, on top.

Thank you for the brain-dump. Good thoughts.

My reaction here is: cool, I need to remember to try this out the next time I go on a bender and start writing something related to web apis. Oh! I'm doing some (ugh, blecch) canvas work right now, maybe I can use this... but not until it's a library, probably.

@greghendershott , I'm curious about your dislike for warm porridge. (I'm also intrigued by the unauthorized extension of this metaphor to the idea that hot porridge gradually becomes cold porridge, I'm not so sure about that, but ...) Specifically, can you give instances where warm porridge is bad?

I'm kind of thinking of @ryanc 's scramble/regexp, which seems like a really nice warm bowl of porridge, where I think of hot porridge as the native string regexp syntax and cold porridge as Shivers' SRE's as realized in Alex Shinn's irregex, which was nice but I couldn't get it to run fast enough.

The weekend is still young :stuck_out_tongue:

1 Like

In my (sloppy) analogy, the "hot" bowl would be the regular expression string language that everyone knows (or thinks they know, until they double check the docs for some edge case).

Something like Emacs rx or @ryanc 's scramble/regexp would be the "cold" bowl -- where the structure or "AST" is expressed, maybe less concisely, but arguably more directly and less ambiguously.

I think both "extremes" can be great, depending on the situation.

But, in the past, if I've explored something too far from either extreme, sort of muddling both -- that's what I've usually found less rewarding.

p.s. My take on @bakgatviooldoos was they were sticking pretty close to the one, "cold" extreme, actually, and (FWIW) I think that's great.

2 Likes

Sounds good to me! PS: happy discourse birthday...

1 Like