Help with FFI errors (for #lang clingo integration)

Heya!! This is a long shot, but I figured I'd post here to see if anyone could possibly provide some advice on an FFI integration that I'm currently building and running into some really mind-bending errors.

The repository is here: GitHub - Gopiandcode/clingo-lang: #lang clingo for Racket (WIP)

For context, this package implements a binding to the libclingo ASP solver. The file unsafe.rkt lists my fairly exhaustive FFI binding for all of the functions from clingo.h.

The file main.rkt lists my initial experiments with using the FFI API to make queries from Racket.

The core of my program is:

;; create a context
(define ctrl (clingo-control-new '[] #f #f 20))
(configure-to-enumerate-all-models ctrl)
;; add facts to database
(clingo-control-add ctrl "base" '[] "a :- not  b. b :- not a. - a :- b.")
(clingo-control-ground ctrl `[,(make-clingo-part "base" #f 0)] #f #f)

;; ask the solver to solve this database
(define solve-handle (clingo-control-solve ctrl 'clingo-solve-mode-yield '[] #f #f))

;; loop over solutions and print them
(define (loop)
  (clingo-solve-handle-resume solve-handle)
  (define model (clingo-solve-handle-model solve-handle))
  (when model
    (print-model model)
    (loop)))
(loop)

;; clean up the solver handle
(define solve-result (clingo-solve-handle-get solve-handle))
(println solve-result)
(clingo-solve-handle-close solve-handle)

This actually works pretty well, and I can send queries, and print out the results (using a function from the api to convert solutions to strings) and they match with what I expect.

Inside print-model, I then started writing some code to translate from clingo's internal representation back to a more racket friendly encoding, via a function symbol->term, where term is defined as:

(define term?
  (flat-rec-contract term?
   (or/c
    infinum?              ;; #inf
    supremum?             ;; #sup
    number?               ;; 1
    string?               ;; "hello"
    symbol?               ;; a
    (list/c '-
            (or/c
             symbol?
             (cons/c symbol? (listof term?)))) ;; (- a) or (- (f x y))
    (cons/c symbol? (listof term?))))) ;; (f x y)

My problem is that my function symbol->term is not behaving properly, and I'm getting really weird results.

(define/contract (symbol->term sym)
  (-> any/c term?)
  (define symbol-type (clingo-symbol-type sym))
  (match symbol-type
    ...
    ['clingo-symbol-type-function
     (define is-positive (clingo-symbol-is-positive sym))
     (define is-negative (clingo-symbol-is-negative sym))
     (define f (string->symbol (clingo-symbol-name sym)))
     (define args (map symbol->term (clingo-symbol-arguments sym)))
     (define expr (if (null? args) f (cons f args)))
     (if is-negative
         (list '- expr)
         expr)]))

Above I have the relevant snippet from symbol->term; the errors relate to the outputs of the function clingo-symbol-is-negative, and clingo-symbol-is-positive, which seem to sometimes return the "wrong" results (sometimes they both return #true, which is definitely wrong.

I double checked my bindings, and I can't think there's something wrong there:

(define-clingo clingo-symbol-is-positive
  (_fun _clingo-symbol (positive : (_ptr o _bool)) -> (res : _bool) ->
        (if res positive (raise-clingo-error))))

(define-clingo clingo-symbol-is-negative
  (_fun _clingo-symbol (negative : (_ptr o _bool)) -> (res : _bool) ->
        (if res negative (raise-clingo-error))))

I've looked at the source of the corresponding functions in the c-code, and they definitely should never return the same results at the very least:

extern "C" bool clingo_symbol_is_negative(clingo_symbol_t val, bool *sign) {
    GRINGO_CLINGO_TRY {
        clingo_expect(Symbol(val).type() == SymbolType::Fun);
        *sign = Symbol(val).sign();
    } GRINGO_CLINGO_CATCH;
}

extern "C" bool clingo_symbol_is_positive(clingo_symbol_t val, bool *sign) {
    GRINGO_CLINGO_TRY {
        clingo_expect(Symbol(val).type() == SymbolType::Fun);
        *sign = !Symbol(val).sign();
    } GRINGO_CLINGO_CATCH;
}

I assume I've wrapped something wrong, and now am getting some kind of stack corruption :scream: :scream: , but for the life of me, I don't know where... :sob:

Could anyone give some advice as to what I might be doing wrong? or how could I go about debugging this issue?

Hm... If I call the same functions multiple times in a row, I end up getting different results sometimes:

     (println "start")
     (define is-positive (clingo-symbol-is-positive sym))
     (define is-negative (clingo-symbol-is-negative sym))
     (println [list is-positive is-negative])
     (set! is-positive (clingo-symbol-is-positive sym))
     (set! is-negative (clingo-symbol-is-negative sym))
     (println [list is-positive is-negative])
     (println "done")

which outputs:

"start"
'(#t #t)
'(#t #f)
"done"

which seems to suggest that the error isn't from the clingo-symbol-is-negative function side? because if it were stack corruption, I would have expected one of the other C-ffi functions to accidentally end up overwriting the values for is-positive and is-negative. Maybe I've messed something up and the garbage collector is causing this??

hmmm.... changing (_ptr o _bool) to (_box _bool) seems to make the results consistent, but I'm pretty sure I haven't actually solved the problem here, just kicked it down the line a bit...

(define is-negative-result-box (box #false))
(define-clingo clingo-symbol-is-negative-unsafe
  (_fun _clingo-symbol (negative : (_box _bool)) -> (res : _bool) ->
        (if res negative (raise-clingo-error)))
  #:c-id clingo_symbol_is_negative)
(define (clingo-symbol-is-negative symbol)
  (clingo-symbol-is-negative-unsafe symbol is-negative-result-box)
  (unbox is-negative-result-box))

Does it help to use _stdbool instead of _bool?

Huh. Yeah, it does? (well, I return to getting consistent correct results when I changed to use (_ptr o _stdbool). Do you know why this was happening? What's the explanation? I would have thought the size of _bool and _stdbool would be the same?

The size of _stdbool is 1 byte (on all systems I know about), and the size of _bool is 4 bytes (everywhere that Racket runs).

See also the _stdbool docs.

Ah, gotcha; I figured it might be due to a size difference.

Is there a good rule of thumb when to use which? From the docs, I guess it mentions that _bool is for pre-99 conventions, so should I prefer _stdbool instead of _bool?

The library I'm binding to is C++ based, could I take that as an indicator that I should use _stdbool instead of _bool (I checked the sources and it doesn't #include <stdbool.h>, but I assume this can be assumed of c++ code?)

Yes, if a library uses the bool type in modern C or C++, then that's _stdbool.

Otherwise, the library probably says int or it defines something like some_lib_bool_t as an alias, typically for int. For example, gboolean (in GLib) is defined as int.