Identifying all functions used by a function

Given a function f I would like to be able to extract all the functions that it touches (either by calling directly or just passing as an argument to another function), recursively, without running f. In other words I want to do a static analysis. Concessions to try to make this tractable:

  • No dynamic imports / module loading
  • f is not higher order (it has no arguments that are themselves functions)
  • It's fine if I can only detect the use of functions that have been defined with a macro e.g. tracking-defun or a custom #lang
  • Would also be fine to require Typed Racket, if that somehow could help

Is this possible in Racket? I can picture making tracking-defun use local-expand so that it can know that all the sexps inside are either special forms (which I would ignore) or calls, but you can reference a function without calling it directly (even though f isn't higher order it might still pass some g into say map), so I can't just look at the first element of each sexp.

Given an identifier I would need to be able to query its lexical context to figure out what it is actually going to be bound to when the function runs, which can be affected by what is imported. Does Racket provide a way to do that query? Or would I need a #lang that manually reimplements scoping and requires so it can answer queries about identifiers in scopes?

1 Like

Hi, @jgarvin.

Interesting question! This is just a stab at the problem, and not very lucid, but is this sort of what you have in mind?

#lang racket

(require (for-syntax
          syntax/parse
          syntax/parse/lib/function-header))

(begin-for-syntax
  (define-syntax-class (check-ex bound)
    #:attributes ([bind 1])

    ; is the pattern an id in the bound symbols?
    (pattern ex:id
      #:when (member (syntax->datum #'ex) bound)
      #:with (bind ...) #'(ex))

    ; is the pattern a nested form with bound symbols?
    (pattern ((~var ex (check-ex bound)) ...+)
      #:with (bind ...) #'(ex.bind ... ...))

    ; otherwise, ignore (keywords, numbers, etc.)
    (pattern _
      #:with (bind ...) #'())))

(define-syntax (define-checked stx)
  (define bound (syntax-bound-symbols stx))
  
  (syntax-parse stx
        ; used for the bindings, but is just for explanation, really
    [(_ [bindings:id]
        ; normal function header signature
        head:function-header
        ; the function's body, which we check using the bound symbols above
        (~var body (check-ex bound)) ...)
     #'(begin
         (define bindings '(body.bind ... ...))
         (define head
           body ...))]))

(define-checked [binds] (fun)
  (define g (list 1 2 3))
  (sort (map number->string g)
        string<?
        #:key identity))

binds
;=> '(define list sort map number->string string<? identity)
(fun)
;=> '("1" "2" "3")

It doesn't do exactly what you want, from what I understood, but it might be a step in the right direction.

The more experienced Racketeers might very well be able to fill in the gaps.


Edit: too many edits :upside_down_face:

1 Like

I'm worried about corner cases, like

(define f +)
(define (this-function x y)
  (map f x y))
(set! f -)
(display (this-function '(5 6) '(1 1))) ;==> '(4 5)

Avoiding using map is probably too restrictive, but perhaps you can forbid calling mutable variables.

1 Like

Do the arrows in Dr Racket's UI give the kind of answers that you want?

If so, you can access that analysis in your own program via drracket/check-syntax.

This doesn't run your program, or f. (It analyzes fully expanded code, and expanding entails "running" code > phase 0 -- but it doesn't evaluate f or any other phase 0 a.k.a. runtime code.)


The arrows are for all bindings. If you're strictly interested in bindings of function values, maybe you can filter the arrows based on use of #%app in the fully expanded code (or something like that)??

3 Likes

Since I'm fine with only tracking calls and passing of functions that are defined with my-defun is it possible to prevent set! from working on the bindings created by that macro?

I played with Racket a couple of years ago but most of my macro experience is still in other lisps and I haven't used define-syntax-class, could you describe at a high level what your approach is here? That would give me context for going through the docs for it and the other pieces.

Certainly. Bear in mind, I am, how you say, just some guy.

Good examples of using syntax-classes are helpfully found at Syntax-Parse Examples.

syntax-parse is a very powerful tool; it is very easy to use once you get the hang of it, but I also have to remind myself not to use it for everything.

My strategy, here and in many cases, is to define the "terms" I would like to parse using syntax-classes, and then string them out nicely using the syntax-parse form.

That being said, what a define-syntax-class expression does, is define a disjoint set of patterns which can be matched against syntax, much like match is used for runtime values, using literally defined patterns and other syntax classes, and a trove of other machinery exposed by the form. It also has a friend called define-splicing-syntax-class which is used for sequences of syntax, but this isn't really relevant here, apart from the fact that define-syntax-class matches against atomic syntax, so some atomic value like an id or a number, and so on, or (...).

In this case, we have three cases I came up with:

(pattern ex:id
  #:when (member (syntax->datum #'ex) bound)
  #:with (bind ...) #'(ex))
  • the pattern is an id--a symbol in syntax-clothing, so to speak--which would correspond to the identifier for a function, for example, bound in the scope(s) of the syntax of the define-checked macro. This is the reason for the :id suffix. The #:when clause specifies that the pattern should match iff the datum (symbol) from the id expression is a member of the bound symbols for the scope(s). The #:with clause associates the pattern with the attribute (bind ...) which has an ellipsis-depth of 1; hence the 1 in its #:attributes definition (which means that all of the patterns must share this attribute and have the same ellipsis-depth). This is nice, because when one needs to then use this attribute, it can be "unpacked" using ? ..., which I use to "elide" the identifiers and other atomic values we want to ignore, in the final pattern branch.
(pattern ((~var ex (check-ex bound)) ...+)
  #:with (bind ...) #'(ex.bind ... ...))
  • the pattern is a nested form, so (...). Note the use of ~var here; this is because the syntax-class takes an argument! so we use this more explicit syntax to pass the bound argument down. For our previous example, we could have written (~var ex id) in this manner. Also note the double-ellipsis in the #:with clause: this is because the pattern ex itself is an ellipsis pattern, the ...+ meaning "one or more", and the relationship between the number of ellipses must be maintained, therefore the double.
(pattern _
  #:with (bind ...) #'())
  • finally, and otherwise, we can have any atomic pattern (i.e. keywords, numbers, strings, etc.), which we will ignore the specifics of as indicated by _. Also note the empty assignment (bind ...) #'() in the #:with clause. This is what I was alluding to above, which allows us to elide or erase the trace of this pattern, since it has nothing of value to us.

Now, to the syntax-parse side of things, we have:

(define-syntax (define-checked stx)
  (define bound (syntax-bound-symbols stx))
  
  (syntax-parse stx
        ; used for the bindings, but is just for explanation, really
    [(_ [bindings:id]
        ; normal function header signature
        head:function-header
        ; the function's body, which we check using the bound symbols above
        (~var body (check-ex bound)) ...)
     #'(begin
         (define bindings '(body.bind ... ...))
         (define head
           body ...))]))

The bound definition makes use of the syntax-bound-symbols procedure (there are a number of definitions in that section that might be interesting to you).

In a nutshell, this is going to give us all of the symbols (datum forms of identifiers) which are bound in the scopes which are associated with stx, which is the syntax of the form when one calls the macro.

Out of convenience, I use the :function-header syntax-class from syntax/parse/lib/function-header, which matches valid function header signatures. The module also provides other utilities for working with parameters and keywords and so on for the header's formals (the complete parameters).

Again, we see the use of ~var and here we unpack the collected .bind ... attributes from the body ... (again, two ellipses), into a list. You'll notice the identifiers in this list are quoted, because of the fact that the values bound to the syntax are being quoted via '(stx ...). This is the same as writing (list 'stx ...).

Finally, we make sure to assemble the function as it would have been otherwise, using the (define head body ...) form.

Hopefully I didn't miss too much there :sweat_smile:


Edit:
To demonstrate some more of the functionality, if, say, you wanted to exclude certain identifiers from being included in the bindings, you might use what is called a syntax-parameter. This nifty little doodad can allow us to control some of the context of the macro's expansion.

(require racket/stxparam
         racket/splicing)

(define-syntax-parameter ignore-symbols (list))

(define-syntax (define-checked stx)
  ; edit the definition of `bound` to exclude our ignored symbols
  (define bound
    (filter (lambda (s) (not (member s {syntax-parameter-value #'ignore-symbols})))
            (syntax-bound-symbols stx)))
  ...)

And then we can parameterize our macro-call using splicing-syntax-parameterize, which splices the syntax into the surrounding context, as you would splice something into a list, but with the parameter bound to the value we assign for any syntax in its body.

(splicing-syntax-parameterize ([ignore-symbols '(define list)])
  (define-checked [bind] (fn)
    (define g (list 1 2 3))
    (sort (map number->string g)
          string<?
          #:key identity)))

bind
;=> '(sort map number->string string<? identity)
(fn)
;=> '("1" "2" "3")

Of course, this could be made to look prettier, but this is just to demonstrate some of the cool ideas we have to work with.


Edit:
To be clear, the reason for the use of splicing-syntax-parameterize, as opposed to simply syntax-parameterize, is because the result of our macro call produces a definition-expression in the tail-position, owed to the #'(begin ...) at the end there.

I mean, I understand all of this probably still sounds like gibberish (and is kind of), but if it happens to come up, that would be why.

1 Like