Debugging memory leaks

Hi all. I run R16, a Discord bot that allows users to evaluate snippets of code using racket/sandbox. However, I see issues with memory usage building up over time while the bot is running. The bot has essentially no state persisted between trick invocations, so I'm not really sure what is going on.

Additionally, there appear to be zero tools for debugging memory leaks. I know of dump-memory-stats, but it's a blunt and kludgy tool, and doesn't show me much.

Here is such a dump: foo.

The only thing I can see that I don't expect is that a lot of racket/draw and pict related objects are hanging around. However, there is no request active, and we discard each sandbox after a request ends, so this is a problem. Unfortunately, it's pretty much impossible to debug from here. I'm looking for tools such as Eclipse Memory Analyzer Tool (MAT) for Java, that can read a heap dump, compute on the object graph, and show me what exactly is holding onto what, and what objects retain the most bytes recursively. Something like it is a vital tool for any GC-ed production-grade language.

Does such a thing exist?

3 Likes

What happens if you run 3 times (collect-garbage) before (dump-memory-stats)?

(Usually, it's not necessary to run (collect-garbage) three times, but it's good for benchmarks because the second time allows cleaning the objects with finalizers and I don't remember why the third time is useful. My guess is that it may be helpful here to ensure every object that can be clean is cleaned.)

1 Like

When building Racket BC with --enable-backtrace option, dump-memory-stats allows interactive querying:

> (dump-memory-stats)
Begin Dump
Begin Racket3m/BT
...
End Racket3m/BT
Generation 0: 5783888 of 16777216 bytes used
Generation 0.5: 0 bytes used
Generation 1 [tagged]: 16665608 bytes used of 16665608 in 1028 pages
Generation 1 [atomic]: 3724376 bytes used of 3724376 in 259 pages
Generation 1 [array]: 1038288 bytes used of 1038288 in 70 pages
Generation 1 [pair]: 2143328 bytes used of 2143328 in 132 pages
Generation 1 [big]: 7488616 bytes used of 7488616 in 139 pages
Generation 1 [medium]: 258 [1/2064] 1026 [1/8208]
Generation 1 [medium atomic]:

Current memory use: 36853592
 part current use from child places: 0
Peak memory use before a collection: 40152376
Peak memory use after a collection: 31069704
Allocated (+reserved) page sizes: 77479936 (+1884160)
Phantom bytes: 0
Past allocated memory:   86671872
Past allocated tagged:   64692416
Past allocated array:     3296928
Past allocated atomic:   18682528
# of major collections: 0
# of minor collections: 15
# of installed finalizers: 7
# of traced ephemerons: 106
# of immobile boxes: 0
# of page-modify unprotects: 1125
# of old regions scanned during minor GCs: 895/6691
JIT-generated code: 0
Marshaled code: 4978897
Begin Help
 (dump-memory-stats 'struct) - show counts for specific structure types
 (dump-memory-stats spec) - prints path to instances, where spec is
     sym : prints paths to objects of type named by sym
           Example: (dump-memory-stats '<pair>)
     num : prints paths to objects with tag num
     -num : prints paths to objects of size num
     (list 'struct sym) : print paths to structs of type named by sym
   ** Backtraces depend on the most recent major GC **
 (dump-memory-stats spec 'new) - show only objects new since last dump
 (dump-memory-stats spec num) - limits backtrace path length to num
 (dump-memory-stats spec 'cons) - builds list instead of showing paths
 (dump-memory-stats spec any num) - record only each numth object
 (dump-memory-stats 'count sym) - return number of instances of type named by sym
   Example: (dump-memory-stats 'count '<pair>)
 (dump-memory-stats 'peek num v) - returns value if num is address of object, else v
 (dump-memory-stats 'fnl) - prints not-yet-finalized objects
 (dump-memory-stats 'next v) - next tagged object after v, #f if none; start with #f
 (dump-memory-stats 'addr v) - returns the address of v
 (dump-memory-stats thread) - shows information about the thread
End Help
End Dump

Is this functionality helpful enough for you?

It looks like SenoraGC might even be able to provide more information, but I failed to compile it.

$ ../configure --enable-bcdefault --enable-sgc --enable-sgcdebug --enable-backtrace
$ make
...
Error [ASSIGN] 857 in ../bc/src/salloc.c: suspicious assignment with a function call, LHS ends tagged.
Error [ARITH] 2221 in ../bc/src/salloc.c: suspicious arithmetic, LHS ends a.
Error [ARITH] 2220 in ../bc/src/salloc.c: suspicious arithmetic, LHS ends c.
Error [ARITH] 2219 in ../bc/src/salloc.c: suspicious arithmetic, LHS ends t.
Error [DECL] 2612 in ../bc/src/salloc.c: Variable declaration (int () not at the beginning of a block.
Error [DECL] 2615 in ../bc/src/salloc.c: Variable declaration (int 0) not at the beginning of a block.
Error [DECL] 2616 in ../bc/src/salloc.c: Variable declaration (int 0) not at the beginning of a block.
Error [DECL] 2617 in ../bc/src/salloc.c: Variable declaration (int 0) not at the beginning of a block.
Error [DECL] 2628 in ../bc/src/salloc.c: Variable declaration (int 10000) not at the beginning of a block.
gc at 54560928 (71761920): 0 after 129 msecs
done  35637096 (66125824); recovered 18923832 in 19 msecs
Error [LOOP] 2882 in ../bc/src/salloc.c: while/do/for with body not in braces.
Error [CALL] 2879 in ../bc/src/salloc.c: Bad place for function call (in decls), starting tok is scheme_get_param.
Error [ASSIGN] 2879 in ../bc/src/salloc.c: suspicious assignment with a function call, LHS ends m.
Error [CALL] 2810 in ../bc/src/salloc.c: Bad place for function call, starting tok is scheme_get_type_name.
Error [CALL] 2723 in ../bc/src/salloc.c: Bad place for function call (in decls), starting tok is scheme_get_type_name_or_null.
Error [ASSIGN] 2723 in ../bc/src/salloc.c: suspicious assignment with a function call, LHS ends tn.
Error [SHADOW] 3604 in ../bc/src/salloc.c: Pointerful variable ht shadowed in decls.
xform: Errors converting
  context...:
   /Users/sorawee/projects/racket-new/racket/collects/compiler/private/xform.rkt:8:2: xform
failed
 in xform
 in build-one
 in loop
 in module->hash
make: *** [all] Error 1

2 Likes

Funnily enough, I've run into this leak issue with R16 in the past, and switching to BC seemed to alleviate it. I switched back to CS a couple releases ago because the issue seemed to be less frequent, but it just today again, which is what prompted this post.

For Racket CS, @samth has GitHub - samth/object-backtrace: find out what keeps an object alive. If it doesn't provide everything you want, you could probably use a similar approach to access the underlying Chez Scheme functionality, e.g. the inspector: racket/debug.stex at ce64c04d26f97eebfaea128ec5564bcb569527c7 · racket/racket · GitHub (Some relevant functionality was added in Racket's branch of Chez Scheme to support Racket's per-custodian memory accounting.)

3 Likes