How FOS Watch works

Rules first. AI second. Every number reproducible.

FOS Watch is the analytics layer that turns public Financial Ombudsman Service decisions into structured insight for risk and compliance, MI and complaints leadership teams. This page explains how the numbers are produced — where the data comes from, how it gets structured, how the dashboards are built, and where (and where not) we use generative AI.

The headline principle is simple: every number on a FOS Watch dashboard is produced by deterministic rules running over a versioned taxonomy. Generative AI is used selectively on top — for search, drafting and pattern surfacing — never to produce the figures themselves. If a number changes, you can trace it back to the decisions that moved it.

Step 1

The data and taxonomy

FOS Watch reads the public FOS decision database. We don't host or republish the decisions themselves — for specific decision text, FOS Watch links the user to the FOS's own database. The structured insight we provide is built on top of that public record.

FOS Watch analyses decisions published each week and captures data on firm, product, and the following dimensions:

Product: mortgages, credit cards, motor insurance, and so on.
Reason: the complaint itself: affordability, mis-sale, claim handling, etc.
Outcome: upheld, not upheld, as classified by the FOS (partial uphold is also captured where appropriate).
Rules: the regulatory rules and codes the ombudsman cited (DISP, ICOBS, CONC, Consumer Duty outcomes, the CRM Code, and others).

In addition we capture:

Raised by: consumer direct, or via a professional representative.
Escalated by: consumer-driven or firm-driven escalation to ombudsman stage.
Firm defence: what the firm argued.
Evidence: what documents or proof the firm or consumer provided, or the ombudsman requested.
Directions: what the ombudsman directed the firm to do.
APP subtype: for authorised push payment cases: purchase scam, investment scam, romance scam, and so on. This can be extended on client request to capture additional complaint types in more granular detail.

The taxonomy is plain-English, designed by former ombudsman casework leaders building on feedback provided by analysts, risk and compliance leads and casehandlers from Tier 1 banks and insurers. It is versioned — every change is dated and recorded, so a dashboard read on a given date can be reproduced months later.

Step 2

The tagging: deterministic rules, reviewed by humans

Tags are applied using rule-based methods — pattern matching and regular expressions running across the text of each decision. The same rules produce the same tags every time. There is no probabilistic model deciding what "mostly looks like" a mis-sale tag; either the patterns fire or they don't.

Every tagging rule is reviewed and signed off by a human before it goes into production. Sample audits are run on each new dimension and each rule change, scoring precision and recall against a hand-labelled set of decisions. We publish the precision and recall numbers internally and share them with customers on request.

When a rule is updated, we record the taxonomy version, the decisions affected, and the before-and-after tag count. This means the path from a decision's text to a tag — and from a tag to a number on a dashboard — is fully traceable.

Step 3

The aggregations: arithmetic you can audit

The numbers on a FOS Watch dashboard — uphold rates, peer comparisons, theme volumes, month-on-month movement — are produced by structured queries over the tagged decisions, running on a relational database. A dashboard tile showing "consumer hire agreements: 27.8% uphold rate over the last 12 months" is a single query. Two consequences follow:

The same query run twice produces the same number. No model-output drift between runs.
Aggregates appear on the same view as a sample of the most recent underlying cases, so the user can check that the decisions feeding the analysis match their search intent. Each case shows its FOS decision reference, firm, decision date, and outcome, and links to the original on the FOS site.

FOS publishes decisions six to eight weeks after they are issued. The most recent months are therefore always incomplete, and partial reads can be volatile — particularly on smaller case populations where a 0% or 100% uphold rate would be misleading. The Insight Hub suppresses incomplete months on monthly trends. The fortnightly Risk, Compliance & MI Briefings follow the same rule. The Case Update Digest works on a sliding fortnightly window and so isn't subject to it; we note that on the digest itself.

Step 4

Where generative AI sits

Generative AI is used in three places, all on top of the structured layer:

Natural-language search

A user can ask "show me motor insurance cases where the firm relied on a policy exclusion" and the search interprets the question into a structured filter over the taxonomy. The cases returned come from a deterministic query — the AI translates the question, it doesn't pick the cases.

Briefing drafting

The fortnightly Risk, Compliance & MI Briefings start from structured aggregations. AI drafts the prose around them. Every figure in the briefing is traceable to the underlying query.

Pattern surfacing

AI flags clusters of decisions that may warrant a deeper look — for example, an unusual concentration of a particular firm defence in a particular product line. These are surfaced as candidate themes for human review, not as conclusions.

What we do not use generative AI for: producing the figures on dashboards; classifying decisions into tags; making regulatory or legal judgements.

Step 5

Audit and traceability

For a Tier 1 firm, the question "how do I know the numbers are real?" is a governance question, not a product question. FOS Watch is built so the answer is straightforward:

Every number traces to a query.
Every query runs over a versioned taxonomy.
Every taxonomy version is dated.
Every tag was produced by a rule that a human signed off.
Each underlying decision retains a link to the original on the FOS site.

If your internal audit team or second-line function wants to validate a specific figure, we provide the calculation, the taxonomy version in force, and the underlying decisions. We expect to be asked. The methodology isn't proprietary in a way that obstructs review — the value is in doing the work, not in keeping it opaque.

What FOS Watch is not

FOS Watch is an analytics layer, not a decision database. We do not host or republish FOS decisions; for specific decision text, the link from FOS Watch takes the user to the FOS's own database.

FOS Watch is not a regulatory advice service. We surface what ombudsman decisions show; we do not interpret what a firm should do in response. That judgement sits with your risk, compliance, and legal functions.

FOS Watch is not a black box. If a number on the dashboard looks wrong, we expect customers to ask, and we expect to show the working.

Look under the hood.

Book a 15-minute demo and we'll walk you through the taxonomy, the tagging pipeline, and a worked example of how a single number on a dashboard is produced — start to finish.

Book a 15-minute demo Try four weeks of email briefings →

No charge, no commitment