The term is introduced in the first subsection of the introduction:
The conventional solution is to decorate "loadP.scm" and similar files with eval-when annotations [7, 23] that designate the intended phase of an expression:
[7] R. K. Dybvig. Chez Scheme User’s Guide. Cadence Research Systems, 1998.
[23] G. L. Steele Jr. Common Lisp: The Language. Digital Press, second edition, 1990.
I might mention that in preLisp languages with macto processors,
completely different notations were used for the macro-time and
the run-time sublanguages. The macro language was used to generate
a textual program, which was then fed into the compiler as a file
in the nonmacro language.
The first conventional language to even try to break out of this
mold was PL/1. which used a syntax somewnat resembling (but different
from) its regular PROCEDURE syntax to express macros.
That was also the age that saw the appearance of language-independent
macro processors like ML/1 and m4. They took advantage of the fact that
programming languages were expressed as text files and could be
processed as if they had very little structure beyond that.
Lisp was different in that the macro language was the same as the
subject language. So a new term had to be invented.
I offer no opinion on the use of the word "phase", but this discussion reminds me of Section 9.11.3, entitled "Les niveaux d'evaluation" ("Evaluation Levels"), of Christian Queinnec's Les Langages Lisp (1994), released in English as Lisp in Small Pieces (1996), where he describes a "Tower of Evaluators". Each level has macros that are expanded using macros and functions from the level above, which defines macros and functions that can be used to expand the macros in the next level below. (Apologies to Christian if I didn't get the explanation completely correct.)
This is a textbook, so I doubt the book introduced the idea originally, but it made a big impression on me.
I immediately thought of 3-LISP and Brian Smith when the question was asked, but he also speaks of "levels" in his hierarchy, the copy of which I have with me says February 1982.
Bruce Duba and I studied macro expansion while we were PhD students and decided that the co-mingling of values across macro expansion and compilation/interpretation processes was a mistake. We thought that an exploration of a system that separated those two pieces would be worthwhile. We started using the word “passes” because Scheme 84 was a compiler with several passes. We then used the temporal word “phase” because it distinguishes two different points of time in one compilation step. (We all experimented with 2Lisp 3Lisp Brown Blonde and such languages at the time, and we were thus aware of the word ‘“level”. — My own dissertation research suggested that the word “level” as used by Brian Smith seemed to be the result of a misinterpretation of language semantics, as in the “theory of” not a specific language or feature.)
Once I encountered the idea of a “phase separation theorem” in the world of type systems (ML), I truly began to prefer this word over others because what I had wanted was a “phase separation” of macro expansion and compilation/interpretation.