Sigil Data produces sovereign, expert-labelled legal training data for the next generation of artificial-intelligence systems. We take open-licensed UK judgements, tribunal decisions, regulatory notices and inquiry materials, and we apply a structured labelling apparatus — designed and supervised by UK-qualified legal professionals — that turns public-but-unusable source text into training-grade structured data. Every record we publish carries our sigil: a measurable, auditable mark of quality.
The United Kingdom holds one of the deepest bodies of publicly-licensed legal material in the world. The Open Government Licence places sweeping commercial-use rights on Crown court judgements, tribunal decisions, regulatory enforcement notices and the records of statutory inquiries. The constraint on the next generation of legal artificial-intelligence is not access to this material; it is the absence of a credible, jurisdictionally-clean apparatus to convert it into the structured, expert-annotated training data that modern alignment and retrieval methods require.
The incumbents who could have built that apparatus are structurally locked out. The dominant commercial legal databases are foreign-owned and operate under licences that explicitly prohibit the use of their corpora for artificial-intelligence training. The offshore-labour annotation industry cannot make a credible sovereignty claim. The academic releases are partial, dated and not commercial-grade. The gap is a market.
Per-corpus structured datasets — Employment Tribunal, Tax Chamber, Property Chamber, Information Commissioner enforcement, financial-regulatory notices, public inquiry records. Each Codex is delivered as a Croissant-compatible JSON-Lines package with full case metadata, citation graph, ratio decidendi, reasoning chain, controlled-vocabulary outcome and quantum, and verified anonymisation.
The proprietary labelling manual under which the Codex is produced. A five-stage pipeline, executed by junior paralegals and trainee solicitors under defined senior-lawyer review, with measurable inter-annotator agreement, calibration thresholds and a versioned change log. The Apparatus is the firm's principal intellectual asset.
A published benchmark evaluation set against which a customer's legal-AI system can be measured. A small, gold-standard reference corpus, hand-labelled by senior practitioners and a retired judicial reviewer, and held as the calibration point for every record we sigil.
UK source, UK quality assurance, UK soil. Every record we publish is drawn from a UK-government-licensed source. Every labelling judgement is made by a UK-qualified legal professional resident in the United Kingdom. Every dataset is held, processed and delivered from infrastructure inside UK jurisdiction. No record is sent to any third-party artificial-intelligence service for processing.
We do not use BCIS subscription data, Westlaw or LexisNexis corpora, or any other source whose licence prohibits training-data use. We attribute every record to its public-sector origin under the Open Government Licence v3.0. Where any reporting restriction applies to a source document we honour it in full, and we do not publish material outside the scope the original tribunal or court intended.
Sigil Data is a venture of CSQS Ltd, a chartered quantity surveying practice in its sixth year, regulated by the Royal Institution of Chartered Surveyors. The firm is held without external equity. Initial discussions with prospective customers, partner law firms and senior legal practitioners are welcomed at the address below.