Back to all articles
Formal Methods5 min read

What Is FMToolkit?

FMToolkit answers one question: do these two functions behave identically? What it does, what it does not, and where it fits.

By David Kovar


FMToolkit (FMT) answers one question: do these two functions behave identically?

You have a function. You change it - for performance, for clarity, to port it to a new language, because an AI assistant rewrote it. The question that follows every such change is the same: did you preserve the behavior? Tests give you partial evidence. FMToolkit gives you a proof.

What FMToolkit does

FMToolkit takes two versions of a function and determines whether they are functionally equivalent - that is, whether they produce identical outputs for every possible input. It does this using formal methods: it encodes both functions as logical formulas and asks a solver to find any input where they disagree. If none exists, the functions are proven equivalent. If one exists, FMToolkit returns it as a concrete counterexample - the specific input where they diverge.

Two endpoints cover the main use cases:

The CI endpoint checks equivalence on every change. Integrate it into your pull request workflow and every merge either carries a proof that behavior was preserved, or is stopped by a counterexample before it ships.

The Audit endpoint analyzes an entire codebase and classifies every function by how much formal coverage is achievable today - and what it would take to extend that coverage.

What FMToolkit does not do

FMToolkit proves functional equivalence, not correctness in an absolute sense. If you compare a new function to an old one, and the old one has a bug, FMToolkit can prove the new one is equivalent to that bug. Equivalence is only as meaningful as the reference you measure against - which is why choosing the right baseline is a real design decision.

FMToolkit does not verify performance, memory usage, or side effects outside what it models. It is additive to tests, not a replacement for them. Tests cover the inputs you wrote down. FMToolkit covers the ones you didn't.

Where it fits

FMToolkit is most valuable where the cost of undetected behavioral change is high:

Modernization and porting. Prove that a rewrite - new language, new framework, new runtime - preserves the original behavior, function by function. The old implementation is the reference; the new one must match it.

Refactoring. Every performance optimization and structural cleanup can carry a machine-checkable proof that behavior was preserved. The proof becomes part of the change record.

AI-generated code. LLMs produce plausible output, not proven output. FMToolkit closes the gap between "it looks right" and "it is right." Trust but verify: a neuro-symbolic optimization loop

Reference standards. Write correct behavior as a formal specification, independent of any implementation, and prove all implementations against it.

An example

Calendar date arithmetic is representative. You have a function that computes the number of days between two dates. You rewrite it. How do you know the rewrite is correct?

Testing covers the dates you thought of. The century-year edge cases - year 100 is not a leap year, year 400 is - involve a handful of inputs out of roughly three million valid dates in any reasonable range. Most test suites don't include them. FMToolkit finds them without being told to look. Why formal methods matter: the calendar problem

Want to go deeper?

Learn how URSA Secure brings formal verification to your most critical software.

Get in touch