That is a great question. I'm not sure I'll have a good answer since I wouldn't really call myself an expert here, but I'll do my best.
There are two main kinds of DSLs: embedded, and hosted. These are distinguished primarily by their "interface macros". Interface macros are just the set of macros which you use to implement a DSL (thus forming the "interface" between two languages). For instance, with Sawzall, the interface macros are aggregate, slice, where, among many others. With Racket's language for specifying contracts (racket/contract), the interface macros are and/c, or/c, -> and more. With Qi, the interface macro (singular) is flow.
Languages that have more than one interface macro are called "embedded," and languages that have just one are called "hosted."
Embedded languages are the most seamless since the DSL appears to be part of the host language. E.g. once you do (require sawzall), you can use the entire DSL just like any other Racket forms.
The tradeoff is that embedded DSLs must:
- share their binding space with the host language, and so cannot have forms named
andandif, for example. - always rely on the host language expansion and compilation process since such languages are wired directly as extensions of the host language (we'll see why this matters soon)
Hosted languages add an extra level of indirection -- e.g. once you (require qi) you can use the entire language in your code, but only by wrapping the expressions in (flow ...). This adds a seam, but it gains the two advantages mentioned above.
First with naming, for instance Qi has an and form which composes predicates by conjoining them (e.g. (and positive? integer?)). Likewise, Racket's pattern matching language (another hosted DSL) also has and and or forms.
Second -- and now we're getting to the heart of what the paper talks about -- with expansion, a hosted language gains more control here because the entire language is specified as subforms in a single macro (flow, in the case of Qi) whose responsibility is to generate Racket code from whatever the user has written. How we do it is up to us. Naively, we can fulfill expansion by the interface macro (e.g. flow) just recursively invoking itself until Racket code is generated. In this case, it is just a simple extension of the Racket expander and isn't doing anything fancy. This is how Qi 1.0 worked.
The paper proposes that, instead, the interface macro should do its job in two phases: (1) expansion to a core language, (2) compilation of that core language. This allows the language to be faster and smaller, while also supporting extension by users via macros.
In retrospect, we can see that in Qi 1.0, these two phases were conflated and there was no separation between them.
In order to achieve this goal, the paper proposes changing the implementation of the language itself from a "flat" layout (where all of the syntax is specified in a single macro) to a two-level layout, where there is a small core language on the lower level, and any number of macros on the higher level which simply expand into the core forms. The "expansion" part of the process is now all about expanding the macros (upper level) into the core syntax (lower level) of the language. Then, the compilation phase kicks off and compiles the core language into Racket. This is exactly the architecture of Racket itself: expansion of macros to a core language (fully expanded Racket), followed by compilation to a lower level language (Racket bytecode). Having this architecture allows us to implement optimizations at the level of abstraction of the DSL, where it is possible that there are optimizations that could be done which could no longer be reliably done at the level of Racket expressions since some information may be lost at that stage -- just as the Racket compiler does some optimizations even though lower level languages in the stack (e.g. C) will be compiled too. This allows your DSL to potentially recoup the performance losses it might incur from its idioms that may deviate from the host language idioms.
Incidentally when I wrote Qi I didn't set out to write a particular kind of DSL. It's only upon reading the paper that I, like Monsieur Jourdain, can say that I have been speaking a hosted DSL all along
And as it happened, that was exactly the kind of DSL that the paper is applicable to. Finally, hosted DSLs can also be embedded into the host language, to get the best of both worlds.
References for further reading:
Macroexpand anywhere with local-apply-transformer! -- a blog post by Alexis King covering how to expand subforms on demand within the context of macro expansion.
Qi compiler -- describes the planned Qi compiler which will complete the second phase described in the paper.