KCL · Machine Learning · Week 22

Formally Verifying Neural Networks I

Proving — not testing — that a network behaves: robustness, safety and consistency, the precondition/postcondition language for stating them, and what a counterexample really is.

KCL · ML ≈40 min read 17 self-tests 18 unique slides covered
Part I

What Formal Verification Is

Before we touch neural networks, we need to be clear what it means to prove something about a program — and why writing down "what we want" is itself the hard part.

This is a new strand of the module, gl red" x="98" y="222">θ Intruder v_int ψ ρ

The ACAS Xu geometry (Katz et al.): v_own, v_int are the two aircraft's velocities, ρ the range (distance) between them, ψ the intruder's heading angle, θ the relative angle at the ownship. The NN maps these to advisories like "turn left/right."
QFor the aircraft collision-avoidance NN (Katz et al.), what counts as a "bad state," and why was formal verification necessary rather than optional?

A "bad state" is any situation in which the two aircraft would collide. Verification was necessary (not just nice-to-have) because the system is safety-critical: it must be certified before deployment, and certification for such systems requires a mathematical guarantee — proven safety on all inputs — not merely good test performance. The inputs include heading angles and vertical separation; outputs are advisories like turn left/right.

10 · Property #3 — Consistency

Definition — Consistency

Consistency is especially important for generative models: the model's outputs should not contradict each other. For example, we'd like an LLM to not simultaneously state that Block A is on B, B is on C, and C is on A.

A B C C is on A ?! (impossible cycle)
"A on B, B on C, C on A" cannot all be true at once — a consistency violation. We want generative models verified not to assert such cycles.

The lecturer closes Part II by promising that the coming parts will show how to mathematically express these three properties — which is exactly Part III.

QName the three desirable NN properties introduced this week, and give the kind of model each is most relevant to.

Robustness — small input change ⇒ small output change; most relevant to classifiers. Safety — never enter a "bad" state; most relevant to NNs controlling cyber-physical systems. Consistency — outputs don't contradict each other; most relevant to generative models (e.g. LLMs).

·  ·  ·
Part III

A Language for Specifying NN Properties

English is fine for intuition but useless for proving theorems. We now meet a compact precondition/postcondition language — Hoare-triple style — and use it to formalise every property from Part II.

We discussed the properties in English, but English is not useful if we want to prove theorems that the properties hold. So we adopt a formal language. Crucially, we want it to handle specifications that involve interactions between several NNs and multiple inputs and outputs — not just one network on one input.

11 · The general form (precondition / postcondition)

Specification { Precondition } NN(s) computes something { Postcondition }
Definition — the specification triple

Pre- and postconditions are written with:

Have you seen this before?

This is exactly a Hoare triple from program verification, \(\{P\}\,C\,\{Q\}\): assume \(P\) of the inputs, run the computation \(C\), and \(Q\) must hold of the outputs. The lecturer notes this is only one way to model specifications — it is based on a presentation in Aws al-Barghouthi's book (on neural-network verification).

12 · Every NN is a function \(f:\mathbb{R}^n \to \mathbb{R}^m\)

Convention

In specifications, every NN is treated as a function \(f:\mathbb{R}^n \to \mathbb{R}^m\). Henceforth:

How to read the whole triple

"For all input vectors that make the precondition true, the postcondition will hold after the NN computes on one of those inputs." The precondition filters which inputs we care about; the postcondition is the promise we must keep for every one of them.

QIn this specification language, what symbols denote inputs vs outputs, and how is a full specification read aloud?

Inputs: \(x,y,z,\dots\). Outputs: \(a,b,c,\dots\). Each NN is a function \(f:\mathbb{R}^n\to\mathbb{R}^m\). Reading: "for all input vectors satisfying the precondition, after the NN computes its output, the postcondition holds." The precondition restricts the inputs considered; the postcondition is the guaranteed property of the outputs.

13 · Formalising image robustness

Recall the English version: "if \(x\) is slightly darker/lighter than \(y\), then \(x\) is classified the same as \(y\)." In the language:

Specification { dist(x, y) ≤ 0.01 } a ← f(x) b ← f(y) { class(a) = class(b) }

where \(f:\mathbb{R}^n\to\mathbb{R}^m\). Two inputs that are close (distance \(\le 0.01\)) must receive the same class. The two clauses \(a\leftarrow f(x)\) and \(b\leftarrow f(y)\) show the same network \(f\) being applied to two inputs — exactly the "multiple inputs" capability we wanted.

The two pieces: dist and class

Distance between vectors

dist(x,y) is a notion of distance. Two examples on the slide:

Euclidean distance: \[\operatorname{dist}(x,y)=\sqrt{\sum_{0\le i\le \dim(x)} (x_i - y_i)^2}\]

Chebyshev distance: \[\operatorname{dist}(x,y)=\max_{0\le i\le \dim(x)} \lvert x_i - y_i\rvert\]

(Note: \(x_i\) is the \(i\)-th component of vector \(x\).)

Classification from the output

class(a) is the class predicted from the NN's output vector \(a\). If there are \(n\) classes, the output has \(n\) dimensions (one per class), and \[\operatorname{class}(a)=\arg\max_{0\le i\le n-1} a_i.\] i.e. the index of the largest output component — the "most activated" class.

QWrite the Euclidean and Chebyshev distances between two vectors, and state what \(\operatorname{class}(a)\) computes.

Euclidean: \(\sqrt{\sum_i (x_i-y_i)^2}\). Chebyshev: \(\max_i |x_i - y_i|\) (the largest single-coordinate gap). \(\operatorname{class}(a)=\arg\max_{0\le i\le n-1} a_i\): the index of the largest component of the output vector — i.e. the predicted class is whichever output neuron is highest.

QWrite the full formal robustness specification for image classification, and explain what each line does.

\(\{\operatorname{dist}(x,y)\le 0.01\}\) — precondition: the two images are within distance 0.01. Then \(a\leftarrow f(x)\) and \(b\leftarrow f(y)\) — apply the same network to both. \(\{\operatorname{class}(a)=\operatorname{class}(b)\}\) — postcondition: they must get the same predicted class. Together: any two sufficiently-close inputs are classified identically.

14 · Formalising NLP robustness (synonyms)

English: "if two sentences differ in one word, and those words are synonyms, the classification is unchanged." Formally:

Specification { 0 ≤ i < dim(x) ∧ x[i ↦ y_i] = y ∧ x_i ≡ y_i } a ← f(x) b ← f(y) { class(a) = class(b) }
Reading the precondition

Postcondition: same classification.

Exercise (from the slide)

"Change the specification to allow for a difference in two words."

QSlide exercise: rewrite the NLP synonym specification to permit the two sentences to differ in two word positions (both replaced by synonyms).

Introduce two distinct positions \(i\) and \(j\) and require both substitutions to recover \(y\), with both pairs being synonyms. One clean version:

Specification { 0 ≤ i < dim(x) ∧ 0 ≤ j < dim(x) ∧ i ≠ j ∧ x[i ↦ y_i][j ↦ y_j] = y ∧ x_i ≡ y_i ∧ x_j ≡ y_j } a ← f(x) b ← f(y) { class(a) = class(b) }

The key changes: a second position \(j\) (with \(i\neq j\)); apply both replacements to \(x\) to obtain \(y\); and add the second synonym constraint \(x_j \equiv y_j\). Everything else (apply \(f\) to both, demand equal class) is unchanged.

15 · Formalising safety (the aircraft)

The collision-avoidance NN can output several advisories; one of them is "do nothing." A sensible safety property: if the intruder is very far (\(>10\text{km}\)) and our aircraft is much faster, the system should advise "do nothing" — i.e. not panic.

Specification { ν_own > 5·ν_int , ρ > 10000 } a ← f(ν_own, ν_int, ρ, ...) { class(a) = do_nothing }
Reading it

\(\nu_{\text{own}} > 5\,\nu_{\text{int}}\): our speed exceeds five times the intruder's. \(\rho > 10000\): range exceeds 10,000 (≈10 km). The comma between the two conditions is a conjunction (both must hold). Under those circumstances, the recommended action must be do nothing.

Exam note

The lecturer's takeaway: for sophisticated systems, coming up with the right property to verify is itself hard and can require domain experts. The maths of the triple is easy; knowing that "\(\nu_{\text{own}}>5\nu_{\text{int}}\) and \(\rho>10000\) ⇒ do nothing" is the safe behaviour requires aviation knowledge. (Recall the sorting trap — the same lesson.)

16 · Formalising consistency (two networks)

Consistency here is stated as: two networks \(f\) and \(g\) should produce the same output on any given input.

Specification { true } a ← f(x) b ← g(x) { class(a) = class(b) }
Reading it

The precondition is literally true — which holds for all possible inputs, so there is no restriction at all. The same input \(x\) is fed to two different networks \(f\) and \(g\). The postcondition demands their classifications agree. This is the "interaction between different NNs" the language was designed to express.

QIn the consistency specification, what does the precondition true signify, and what makes this specification structurally different from the robustness ones?

true is always satisfied, so the precondition imposes no restriction — the property must hold for every input. Structurally, robustness applied one network \(f\) to two inputs \(x,y\); consistency applies two different networks \(f\) and \(g\) to the same input \(x\). This is the multi-network case the language was built to handle.

QWhose textbook presentation is this specification language based on, and what is the equivalent concept from classical program verification?

It is based on a presentation in Aws al-Barghouthi's book on neural-network verification. The equivalent classical concept is the Hoare triple \(\{P\}\,C\,\{Q\}\): precondition \(P\), command/computation \(C\), postcondition \(Q\).

·  ·  ·
Part IV

What Can We Do With a Specification?

Once a property is written down, there are exactly two outcomes: we disprove it with a counterexample, or we prove it holds. This week stops at the framing; the proof techniques are for next time.

Given a specification, we have one of two options. Either we show it does not hold (find a counterexample), or we show it holds (formally verify it by proof).

17 · Option 1 — disproof by counterexample

Definition — Counterexample

A counterexample is a specific input assignment \(\{x\mapsto \text{val}_1,\; y\mapsto \text{val}_2,\; z\mapsto \text{val}_3,\dots\}\) showing the specification fails: the precondition holds for it, yet the postcondition does not hold for the NN's output on it.

The slide's illustration uses a trivial "network" \(f(x)=x+1\):

Specification { x ≤ 0.1 } a ← f(x) # f(x) = x + 1 { a ≤ 1.05 }

The lecturer states this specification does not hold, and gives the counterexample \(\{x \mapsto 1\}\).

Exam trap — read the definition carefully (likely slide typo)

By the definition on the very same slide, a counterexample must satisfy the precondition. But \(x=1\) does not satisfy \(x\le 0.1\) — so strictly, \(x\mapsto 1\) is not a valid counterexample to this triple. The cleanest correct counterexample is \(\boxed{x = 0.1}\): it satisfies the precondition (\(0.1 \le 0.1\)), and gives \(a=f(0.1)=1.1\), which violates \(a\le 1.05\). So the spec genuinely fails — just witness it with \(x=0.1\), not \(x=1\).

If your exam asks you to produce a counterexample, always check it satisfies the precondition and violates the postcondition. (The point the slide is making — that the spec is false — is correct; only the chosen witness is off.)

QFor \(f(x)=x+1\) and the spec \(\{x\le 0.1\}\;a\leftarrow f(x)\;\{a\le 1.05\}\), give a valid counterexample and verify both conditions.

Take \(x = 0.1\). Precondition: \(0.1 \le 0.1\) ✓ (holds). Compute: \(a = f(0.1) = 1.1\). Postcondition: need \(a \le 1.05\), but \(1.1 \le 1.05\) is false ✗. Precondition holds and postcondition fails, so \(x=0.1\) is a valid counterexample and the specification does not hold. (Any \(x\in(0.05,\,0.1]\) works, since then \(1.05 < x+1 \le 1.1\).)

18 · Option 2 — the specification holds, so prove it

The second possibility is that the specification is true, in which case we can formally verify it by mathematically proving it. The slide's example:

Specification { x ≤ 0.1 } a ← f(x) { a ≥ 0.1 }
What this slide is really illustrating

This is presented as the contrast case: instead of disproving with a counterexample, here the property holds and our job is to construct a proof. The lecturer uses it to make the conceptual point — "the other outcome is: it's true, now prove it" — rather than to grind through the algebra. (Whether it literally holds depends on the exact network \(f\) and the allowed input domain; treat it as a schematic example of the "spec-holds" branch.)

The week ends on a cliff-hanger: what remains is discussing the techniques that let us prove or disprove a specification in a machine-checkable way. That is the content of "Formally Verifying Neural Networks II."

QGiven any NN specification, what are the only two things we can do with it, and which one yields a machine-checkable proof?

(1) Disprove it by exhibiting a counterexample — an input satisfying the precondition but whose output violates the postcondition. (2) Prove it holds by formally verifying it. The proof branch is what produces a machine-checkable certificate of correctness; a counterexample is itself easily checkable too (just run \(f\) on it), but it certifies failure rather than correctness.

QSynthesis: walk through the full pipeline this week set up, from an English wish to a machine-checkable result.

(1) Decide what property you want (robustness / safety / consistency) — hard, may need domain experts, and the "obvious" wording is often too weak (the sorting trap). (2) Represent it in the precondition/postcondition language: \(\{P\}\) inputs, NN(s) compute, \(\{Q\}\) outputs, using \(\wedge,\vee,\Rightarrow\), arithmetic and comparisons, with each NN as \(f:\mathbb{R}^n\to\mathbb{R}^m\). (3) Then either find a counterexample (precondition holds, postcondition fails) to disprove it, or prove it holds. (4) The proof/check is machine-checkable, giving certainty over all inputs. Step (3)'s actual techniques are next week.

·  ·  ·
Part V

Wrap-up & Exam Readiness

The lecturer's own summary: this week introduced formal verification and specifications, discussed specifications for NNs, and showed how to represent them mathematically so they can be verified.

That maps cleanly onto our three big questions: we answered "what properties?" (robustness, safety, consistency) and "how to represent them?" (the precondition/postcondition language). The third question — "how to verify?" — is deliberately left open for the techniques lecture.

The week in one breath

Formal verification means proving — and machine-checking — a property over every input rather than testing a few; the hard part is writing the property down (even "sort" needed a "same-elements" clause), so for neural networks we target three intrinsic, provable properties — robustness, safety, consistency — and encode each as a precondition/postcondition triple over networks-as-functions \(f:\mathbb{R}^n\to\mathbb{R}^m\); given such a spec we either disprove it with a counterexample or prove it holds.

Exam-readiness checklist

  1. Define formal verification and explain why proofs are "usually machine-generated, always machine-checkable" — and why the second clause matters more.
  2. Use the sorting example to argue that specifying intent is harder than proving it; explain why "output is sorted" is too weak and state the "same elements" fix.
  3. State the three questions formal verification of NNs decomposes into, and which two this week answers.
  4. Explain why "agrees with most humans on most pandas" is not a verifiable property.
  5. Define robustness; describe the Goodfellow et al. 2014 panda→gibbon example (FGSM: \(\epsilon\cdot\operatorname{sign}(\nabla_x J(\theta,x,y))\), confidences 57.7% → 99.3%) and the STOP-sign safety motivation.
  6. Define safety and a "bad state"; describe the Katz et al. aircraft collision-avoidance NN (inputs, outputs, why certification needs proof).
  7. Define consistency and give the "A on B, B on C, C on A" contradiction; say why it matters for generative models.
  8. Write the general specification triple \(\{P\}\;\text{NN(s)}\;\{Q\}\) and list the connectives/operators allowed (link it to Hoare triples / al-Barghouthi).
  9. State the NN-as-function convention \(f:\mathbb{R}^n\to\mathbb{R}^m\), the \(x,y,z\)/\(a,b,c\) naming, and read a triple aloud correctly.
  10. Write the image-robustness spec and define dist (Euclidean & Chebyshev formulas) and class (\(\arg\max\) over output dims).
  11. Write the NLP synonym spec, explain \(x[i\mapsto y_i]=y\) and \(x_i\equiv y_i\), and do the two-word extension exercise.
  12. Write the safety spec \(\{\nu_{\text{own}}>5\nu_{\text{int}},\ \rho>10000\}\Rightarrow\) do nothing, and the consistency spec with precondition true and two networks \(f,g\).
  13. Define a counterexample precisely, and for \(f(x)=x+1\), \(\{x\le0.1\}\{a\le1.05\}\) produce a valid witness (\(x=0.1\)) — checking it satisfies the precondition and violates the postcondition.
  14. State the two things you can do with any specification (disprove via counterexample / prove it holds) and note that the verification techniques are the "Part II" still to come.