Helpful: suggests a closest name on unbound identifier error

Some tools in other programming languages can suggest a closest name when an unbound identifier error occurs.

(image taken from can't tokenize a string and pass it to struct in c++ - Stack Overflow)

I attempted to implement a similar feature for Racket.

Documentation: helpful: suggests a closest variable name on unbound identifier error.
Code: GitHub - sorawee/helpful: Helpfully suggest a closest variable name on unbound identifier error

Discussion

syntax-debug-info

syntax-debug-info is what makes this feature possible. It's also useful for @AlexKnauth's debug-repl and yjqww6's drcomplete. However, as @rocketnia mentioned in Discord:

it has "debug" in its name, so I keep disregarding it as something I could build anything stable upon.

I wholeheartedly agree with this. So the questions are: should this procedure be renamed to something else? Is it stable enough for actual uses?

Non-standard expansion

Looking up module bindings and local bindings are relatively easy via syntax-debug-info, and we can immediately raise the unbound identifier error as soon as we encounter one, if that's all we want.

The challenge is, to also provide suggestions from imported identifiers (either via require or via module lang), I need to use syntax-local-module-required-identifiers. This can only be invoked in a provide transformer. So my approach is:

  • Call (syntax-local-lift-provide #'(expand (please-raise-unbound-id-error the-id))) when an unbound identifier the-id is encountered, to defer the error until other expansion is done.
  • Make please-raise-unbound-id-error a transformer that calls syntax-local-module-required-identifiers and uses the information to report the error.

This is of course very hacky and might cause unexpected results. In particular, the reported error could be different after importing this module.

#lang racket/base
an-unbound-id ; <- an-unbound-id: unbound identifier
(let ()
  ())
#lang racket/base
(require helpful)
an-unbound-id
(let ()
  ()) ; <- #%app: missing procedure expression

Can we do any better?

Levenshtein distance?

Is Levenshtein distance the right metrics to use? Levenshtein distance counts each addition, deletion, and substitution as one. But does substitution make sense here? Should I compute the residue of longest common subsequence instead?

-- Sorawee (Oak)

13 Likes

This looks awesome!

The following might be "the perfect (or better) is the enemy of the good" material --- if so I hope you ignore it!!

I wonder should there be some protocol to expose one or more resolutions? A default handler could print those (as in your example). But a tool could present a choice to the user, and even take action on the choice.

  1. Sometimes I make a typo intending an imported identifier (your example above). There may be one or more imported identifiers that I intended. I could choose which, and the tool will correct it for me.

  2. Sometimes I type a valid identifier... the problem is I forgot to import it. There may be one or more modules that export it. I could choose which, and the tool will add the missing require.

It's possible 1 by itself might try to match something already imported, when the actual problem is 2.


Part of me feels like this would fit well with check-syntax... but maybe that assumes too much for all tools that would want to use this.


This reminds me of some work @notjack did for a framework for "linting" IIRC generally.


Again, apologies if this is an unwelcome detour, if so of course please just ignore me.

4 Likes

That lifting strategy is very clever, can you use it to report multiple unbound identifier errors instead of just the first one?

Also re: linting framework: yup that's Resyntax. It wouldn't work for this case I think, but it's in the same vein.

4 Likes

Finding what module must be imported to fix the error is a great idea! I will experiment with it. (This reminds me of all those memes where people install an exception handler to search StackOverflow.)

I consider repairing the mistake to be out of scope for this package. But, say, another DrRacket plugin package could definitely cooperate with this package to perform the actual repair.

4 Likes

Yes! You can use it to report multiple unbound identifier errors. But be careful what you wish for:

(matc '(1)
  [(list a) a])

would report three unbound identifiers: matc, a, and a.

One heuristic that might help is that, if an unbound id is at the head position, treat the whole list as blank. That might help with this case, but there are other tricky cases too:

(def a 1)
(println a)

would report three unbound identifiers: def, a, and a. The first a could be elided by the above heuristic, but you will still get the second a error.

This is why I think syntax-local-module-required-identifiers is actually an incorrect approach.

(syntax-par #'()
  [() 1])

will result in #%app: missing procedure expression;, which is less helpful than not using the module, which will result in syntax-pars: unbound identifier.

3 Likes

Version 1.0 released.

This version utilizes the recently added syntax-bound-symbols in Racket 8.7. There's no more non-standard expansion (and we no longer use syntax-debug-info).

I still haven't done the module import suggestion, but it looks feasible. Multiple error report on the other hand is unlikely to happen, due to the issue I outlined earlier.

1 Like

Version 2.0 released.

This version suggests potential module paths to require to fix the unbound id error.

Here’s an example:

Looks like example got chopped off by email parser?

Related: it seems interesting to me that this is enabled by a 'require', rather than by a DrRacket tool. I guess it's strange to me that the error message associated with an unbound identifier is actually part of the #lang definition, but I see how that would make sense. I ... think I'd be more likely to use something I can blanket-enable in the editor, though. It's an interesting area of crossover between the language and the editor. Thoughts about this?

This is very cool! (I'm sorry I missed it the first time!)

I wonder if the unbound identifier exception could be changed to include a syntax object that would have the information you'd want to use? It seems like the expander should have more information than it's currently reporting at that stage.

What does DrRacket do when you enable the debug mode? I think it's injecting errortrace in the module. Can a tool do something similar or it needs too much magic?

Ah -- I'm behind the times! The information is all there already:

(define e (with-handlers ([values values]) (expand #'(module m racket (+++ 13)))))
(length (syntax-bound-symbols (car (exn:fail:syntax-exprs e))))

produces a list with 2615 elements! :slight_smile:

Ouch. Thanks for letting me know. Here's the content:

> (module test racket/base
    (require helpful)
    ->)

results in:

->: unbound identifier
  in: ->
  suggestion: do you mean `-'?
  alternative suggestion: do you want to import one of the
following modules, which provides the identifier?
   `racket/contract/base' or `racket/contract' or `racket'
   `lang/htdp-advanced'
   `mzlib/contract'
   `ffi/unsafe'
   `typed/racket/base' or `typed/racket'
   `deinprogramm/sdp/beginner'

Thanks to @greghendershott again for the recommendation to suggest modules to import.

Code: GitHub - sorawee/helpful: Helpfully provide suggestions on unbound identifier error
Documentation: helpful: providing suggestions on unbound identifier error.

2 Likes

That's a great question!

My original intention is that a #lang might want to provide this capability by default, so I made this a library to allow that.

But I can see that making this a tool (e.g. raco helpful) could also be useful. I might add that next.

It's unclear how to make it integrate with DrRacket nicely as a plugin. The debugging component in DrRacket, which invokes errortrace that @gus-massa mentioned below, is tightly integrated with DrRacket. Though in the worst case I can add my own "Run" button, I guess...

1 Like