Content to follow.
Blackstone& proposal for Serco Limited — AI Lab establishment, infrastructure foundations, and first use case delivery under the Framework Agreement.
Content to follow.
Most organisations struggle with AI adoption not because the technology fails, but because they build the wrong things, in the wrong order, without knowing what's already available. The result is a graveyard of disconnected proofs of concept — each one built in isolation, none of them connecting to shared infrastructure, and no discipline for stopping work that isn't delivering value.
At the same time, many organisations struggle to translate the unconstrained potential of AI — what the technology could do — into solutions that can operate within the constraints of an enterprise environment — security, governance, data access, and operational support.
We take a different approach. Our model explicitly bridges these two worlds:
Our end-to-end AI delivery model is structured, artefact-driven, and designed to move consistently from idea to production — while building reusable, enterprise-grade AI capabilities that make every subsequent use case cheaper and faster to deliver.
The model operates on two layers:
These layers run concurrently. The portfolio view continuously reprioritises as new use cases are submitted, experiments produce evidence, and capabilities are built.
We start with the real problem, not just the idea.
Before any formal intake, we conduct targeted stakeholder conversations to understand the operational context behind each AI opportunity. This is deliberate: the best use cases come from people who understand the problem deeply but may not frame it in AI terms.
In these conversations, we:
This ensures every use case that enters the pipeline is grounded in real user needs and operational reality, not abstract ideas or technology-first thinking. It also builds the relationship between the AI Lab and the business units it serves — the Lab is a partner in solving problems, not a ticket queue.
The output of this step is a clear understanding of the opportunity, ready to be structured through the formal intake process.
We create clarity from day one.
All use cases — whether newly discovered or pre-existing — enter through a single, governed front door. This is a deliberate design choice. A single intake point standardises inputs across the organisation, ensures every opportunity is assessed on the same basis, prevents duplication of effort, and provides full visibility of the pipeline.
Each use case is systematically captured across five pillars:
| Pillar | What It Captures | Why It Matters |
|---|---|---|
| Value | Problem statement, business impact, strategic alignment, scale and frequency of the problem | Ensures we're solving problems worth solving |
| Understanding | Current process maturity, workflow clarity, whether success metrics are defined | Reveals whether the problem is well-enough understood to act on |
| Data | Data types needed, where data lives, data quality, sensitivity level | Determines what's technically feasible and what governance is required |
| Capability | AI pattern required (classification, RAG, forecasting, etc.), platform capabilities needed, integration requirements | Maps the use case to the technical capabilities it demands |
| Readiness | Infrastructure readiness, governance requirements, team capability, blockers and dependencies | Shows whether the organisation is ready to support this use case |
The Front Door is initially human-led: the AI Lab team works directly with business users to capture and structure each opportunity. Over time, this evolves into a self-serve portal where business users can submit use cases directly, guided by an AI-assisted intake process that asks the right questions and ensures completeness.
Output: A standardised Use Case Card — scored across all five pillars, with a dependency category assigned, required capabilities identified, and blockers surfaced.
We prove value before we build.
Prioritised use cases do not go straight to development. They enter a structured experimentation process designed to validate — or invalidate — the core assumptions before any meaningful investment is made.
This process operates as two connected cycles:
Ideate
We start with a clear understanding of the problem, user context, and desired outcomes — ensuring focus on real operational challenges.
Working sessions with stakeholders, domain experts, and delivery teams rapidly explore solution options. Problems are reframed into opportunity statements, AI intervention points are identified, and multiple approaches are explored in parallel using proven patterns such as RAG, classification, and workflow automation.
Ideas are quickly assessed against desirability and viability, creating a disciplined funnel from opportunity to testable concept.
Business Prototype
Promising ideas are translated into lightweight, working prototypes — not to prove technical perfection, but to test value.
Prototypes are built rapidly using reusable components aligned to the AI Capability Library (e.g. retrieval patterns, summarisation, speech interfaces), combined with representative data and simple interfaces grounded in real workflows. This allows us to assemble working solutions quickly, rather than building from first principles.
Development is strictly time-boxed to maintain pace and avoid over-engineering.
Each prototype is designed to answer three questions:
Stakeholders engage directly with the prototype, enabling fast feedback, refinement, or rejection before further investment.
Assess
Each use case is evaluated across three lenses:
At this stage, desirability and viability take priority. Feasibility is not treated as a hard gate — allowing high-value opportunities to progress even if capabilities are not yet in place, and enabling informed, portfolio-level investment decisions.
Hypothesise
Each experiment is defined with precision:
Experiments use real data, with success criteria defined upfront.
To ensure consistency and repeatability, we apply a standardised evaluation test suite, aligned to common AI capability patterns (e.g. retrieval quality, summarisation accuracy, workflow outputs). This allows experiments to be assessed objectively, rather than relying on subjective judgement.
All results are captured in an Experimentation Log, providing a transparent, auditable record of what was tested, learned, and decided.
Experiment & Learn
Experimentation builds confidence over time — it is not a single pass/fail step.
After each experiment, confidence is updated across desirability and viability. Multiple targeted experiments are run to reduce uncertainty. Strong signals increase confidence; weak or negative signals trigger refinement or alternative approaches.
The evaluation test suite is reused and extended across experiments, ensuring results are comparable as the solution evolves.
If confidence cannot be raised to an acceptable level, the use case is stopped.
Decide
We make evidence-based decisions.
Each use case reaches a formal decision point:
Kill discipline is intentional. Stopping weak ideas early protects investment and prevents accumulation of low-value solutions.
Experimentation runs on a structured cadence to maintain pace and transparency:
This ensures continuous learning, visible progress, and shared ownership of decisions.
We formalise before we scale.
Validated use cases are translated into a Build Readiness Pack — a structured, evidence-based handoff from experimentation to delivery.
This is developed collaboratively with business, architecture, security, data protection, and operations teams to ensure alignment with enterprise standards from the outset.
The pack defines:
This ensures delivery begins with clarity, alignment, and agreed constraints — not assumptions.
Governance is by design, not added later. Existing forums (architecture, security, data protection) are used to validate decisions early, avoiding late-stage blockers.
We move directly from validated use case to controlled build.
Once approved, the use case enters delivery via the roadmap's "Now" horizon. The focus shifts from validation to execution.
Development begins with an MVP built on production-aligned architecture, governed data access, and reusable platform components. This is not a prototype — it is the foundation of a scalable solution.
The Build Readiness Pack feeds directly into delivery, generating structured engineering epics covering data, models, applications, security, and operational readiness. Teams can begin work immediately with clear scope and ownership.
Delivery progresses through three distinct stages:
The solution is deployed to real users with real data. The objective is to demonstrate measurable value in a live context, not a simulated one.
Before wider rollout, the solution is validated with enterprise stakeholders (architecture, security, DPO, operations).
It is then scaled progressively using controlled release strategies. Performance, reliability, and adoption are monitored closely, with decisions to expand, refine, or halt based on evidence.
Scaling is addressed across two dimensions:
This ensures solutions are not only technically robust, but capable of delivering value at enterprise scale.
The solution transitions into a managed product with defined service levels, monitoring, incident management, and clear ownership across teams.
All solutions are built using reusable, standardised components — including shared pipelines, integration patterns, and observability frameworks.
The same discipline applied in experimentation continues through delivery. If a solution does not demonstrate expected value, adoption, or performance, it is refined or stopped — not scaled.
Delivery is accelerated through a set of reusable assets embedded across each stage of the lifecycle. These are not standalone tools, but integrated components used during discovery, experimentation, and production delivery.
| Stage | Reusable Assets | Purpose |
|---|---|---|
| Discovery & Intake | Use Case Card template, structured intake framework | Standardises inputs, ensures consistent evaluation and prioritisation |
| Experiment Engine | Prompt templates, RAG prototypes, evaluation harness, Experiment Library | Rapidly test ideas using proven patterns and measurable criteria |
| Business Prototype | Low-code UI patterns, workflow templates, sample datasets | Quickly create interactive prototypes aligned to real workflows |
| Build Readiness | Build Readiness Pack template, architecture patterns | Translate validated ideas into delivery-ready specifications |
| MVP Build | Reference architectures, reusable pipelines, orchestration patterns | Accelerate development using production-aligned components |
| Scaling & Operate | Monitoring frameworks, logging standards, evaluation pipelines | Ensure reliability, performance, and continuous improvement in production |
While the Use Case Layer validates individual opportunities, the Portfolio Layer determines what to build and when across the estate. It brings all use cases into a single view, enabling informed, evidence-based investment decisions.
The objective is simple: maximise value by investing in the capabilities that unlock the most impact.
Each validated use case defines a set of required capabilities — such as retrieval (RAG), classification, forecasting, or workflow automation.
When aggregated, these requirements create a clear, structured view of demand across the organisation. This allows us to identify common patterns, shared dependencies, and opportunities for reuse — shifting the focus from individual solutions to underlying capabilities.
In parallel, we assess the current technology landscape — including existing AI solutions, data platforms, integrations, and infrastructure.
This is not a static inventory. Capabilities are evaluated for scalability, governance, and reusability, providing a clear view of:
By comparing demand with supply, we identify the capability gap — the set of capabilities that must be built, enhanced, or standardised to support the portfolio.
This reframes the investment question from:
"Which use case should we build next?" to: "Which capability unlocks the most value across multiple use cases?"
Feasibility is assessed at this level, considering dependencies such as data platforms, infrastructure, security, and governance.
Use cases are not progressed in isolation. As part of portfolio management, we translate prioritised opportunities into structured business cases, aligned to Serco's investment and governance processes.
Tiered Business Case Model:
Each business case includes:
Where multiple use cases rely on the same underlying capabilities, we group them into investment bundles rather than assessing them independently.
For example:
This allows capability costs to be amortised across multiple use cases, stronger more compelling business cases, and avoidance of duplicated investment.
Instead of funding isolated use cases, Serco invests in capabilities that unlock multiple outcomes.
The portfolio layer continuously integrates new evidence from the use case layer — validated use cases, experiment results, capability maturity assessments — into an evolving, evidence-based roadmap.
This approach enables Serco to move beyond isolated AI initiatives and instead build a coherent, scalable AI capability. This end-to-end model delivers five outcomes:
What are the key considerations for establishing target architecture and implementation approach for Serco's global AI infrastructure. Responses should as a minimum address the areas below and state any assumptions and prerequisites.
Our high level implementation approach moves us from analysis, to solution delivery and finally to platform expansion:
A critical input to target architecture is identifying what we can scale from existing Serco infrastructure versus what needs to be implemented new.
Our maturity assessment (Section 1) evaluates these bright spots across all four divisions.
We leverage those bright spots and add additional capabilities by delivering a prioritised use case into production.
This helps us understand exactly what it takes to ship an AI product at Serco, through our defined approach and methodology.
Through the assessment and delivery, we build out repeatable patterns and core infrastructure.
The delivery team evolves into the AI Platform Team, supporting accelerated delivery of use cases that adhere to core standards and guardrails.
The future state recommended operating model — who owns what, and how the platform serves product teams building AI agents and solutions — is defined first in this section. When combined with our approach and methodology for delivering use cases is what will enable Serco's vision for this engagement.
We then introduce the AI Capability Library, which operates as the backbone for our evolvable reference architecture. Building on our experience of establishing Internal Developer Platforms for central government and beyond, this is a key enabler for accelerating product delivery while maintaining key guardrails.
Finally, we outline how we evaluate the core capabilities of the reference architecture, and the questions we may need to ask along the way.
Our recommended operating model is driven by our experience in deploying Product and Platform teams across the public and private sectors. This approach provides product team autonomy and accelerated product delivery while still operating within the required guardrails. An example of how this could be introduced at Serco is visualised below:
There are clearly demarcated ownership boundaries and interaction modes in this approach.
The product teams will consume capabilities from the library, aligning to an InnerSource model. The platform team curates the library, operates the infrastructure, and embeds governance structurally so that product teams operate within guardrails by default. Product teams can choose NOT to use a specific capability, but in general that approach will be slower for them and they will need to justify their rationale at Build Readiness.
The Serco AI Platform Team will have five core responsibilities:
The team would be a joint partnership with Serco. Blackstone& will provide expertise and accelerators including the AI Capability Library, Serco will provide domain knowledge and additional architectural resources. Over the engagement, the ratio shifts through paired delivery. We lead, then co-lead, then support, then step back. The end state is a Serco-owned platform team operating without external dependency. This is outlined further in Knowledge Transfer.
Product Teams interact with the Platform Team and Capability Library throughout the AI Adoption Framework outlined in Section 1.
This creates a continuous feedback loop. Product teams will surface real-world needs, the platform team evaluates and responds, and the library evolves from delivery experience — not from a purely theoretical architecture.
AI products can degrade gradually through quality drift, retrieval relevance decay, or model provider changes that alter output characteristics. Monitoring tools such as DataDog and LangSmith provide advanced capabilities that capture not just token usage and latency but drift, which we like to track as part of our delivery pipelines.
The platform provides the monitoring and evaluation tooling with the product team defining what "good" looks like for their domain and acts on the signals.
Escalation between levels is defined: if product-level quality degrades and the root cause is a platform capability (e.g., vector search latency, model performance regression) then it escalates to the platform team. If the root cause is product-specific (e.g., outdated grounding data, prompt drift) then the product team resolves it using the platform's evaluation and versioning tools.
Post go-live, the platform team also manages the evolution cycle. Quarterly tech radar reviews, golden path updates, capability deprecation with migration support — ensuring that live products are not disrupted by platform evolution.
A static reference architecture document can become outdated the moment it is published, regardless of how well it is designed. Model capabilities are advancing on a quarterly basis and new patterns (agentic workflows, tool-use orchestration, multi-modal reasoning) emerge faster than an ARB (Architecture Review Board) can evaluate them. The reference architecture must be a living, consumable, evolvable artefact rather than a static document.
An architecture that maintains currency means Serco will never be locked into yesterday's decisions as new opportunities emerge. For this to be truly effective, developed solutions need to maintain evolvability as a key architectural principle. This enables solutions to (e.g.) swap out models via a simple config change.
Assessed, approved, production-ready.
The recommended, supported way to build AI products. Pre-approved security posture, established pipelines, shared documentation. Deviation permitted but governed.
The living reference architecture.
Agentic patterns and atomic capabilities with data classification governance. What the platform team builds is surfaced here for product teams to consume.
Top-down — Continuous evaluation of emerging models, tools, and patterns.
Bottom-up — Demand signals from product teams at every lifecycle stage — sandbox experiments, build gaps, live performance.
Our Capability Library enables organisations to act on what's new without destabilising what's already working. Top-down and bottom-up radars capture what's emerging and what's relevant to Serco's context. The radar helps to filter signals from noise.
As new product teams come through the adoption lifecycle, they surface capability gaps at build readiness. Some gaps are fast-tracked into the golden path. Some are added as niche capabilities. Some are product-specific and stay that way. The platform team makes a deliberate decision for each.
In parallel, the platform team systematically evaluates emerging models, tools, and patterns on a quarterly cycle (assess, trial, adopt, hold). This ensures the platform stays current with advances in the field, not just reactive to product team requests.
Over time, the capability library and golden path become richer, the agentic patterns become more mature, and new product teams get to production faster because more of what they need already exists.
Blackstone& developed the AI Capability Library as a curated catalogue of composable, pre-approved patterns and capabilities that product teams can browse, select from, and build on.
Serving the same purpose as Spotify Backstage does for standard software development, the AI Platform Team curates a set of "golden paths" that help teams navigate the complexity of their solution build.
When the Platform Team operationalises new capabilities (this could be a new model via the gateway, a new retrieval approach or a new guardrails capability for example) it appears in the library as something teams can consume. The library is always the current state of what is available and approved.
The library is an existing accelerator we will bring to this engagement.
Agentic Patterns
Reusable templates for building agents. Each pattern pre-wires the orchestration, memory, tool access, and guardrails an agent type needs. A product team picks a pattern, configures it for their domain, and gets a working agent with governance already embedded. Based on the provided use cases, possible starting points could be:
| Pattern | What It Does | What Comes Pre-Wired | Serco Use Cases It Suits |
|---|---|---|---|
| Knowledge Worker | Answers questions from a document corpus | RAG orchestration, source citation, confidence scoring, human escalation | Collaboration Hub, Resource Mapping Agent |
| Analytical | Monitors, analyses, and reports on data | Scheduled triggers, dashboard integration, threshold alerting, reporting | Finance Genie, Contract Risk Agent, Operational Management Genie |
| Process Automation | Executes multi-step workflows with system integrations | Approval gates, audit logging, rollback, human-in-the-loop at defined decision points | HR Agent, Complaints Processing, Smart Payroll |
| Scanner | Continuously watches sources and surfaces relevant information | Continuous ingestion, relevance scoring, notification routing | Bid Scanner, Regulation Scanner, Market Scan Agent |
These are suggested starting points. As Serco delivers more use cases, patterns will be refined and new ones will emerge from delivery experience.
Atomic Capabilities
The individual building blocks that agentic patterns compose from. These are what the infrastructure surfaces as consumable services and are outlined in more depth in the Core Capabilities section.
| Category | Capabilities | Powered By |
|---|---|---|
| Retrieval & Reasoning | Document Q&A, semantic search, summarisation, multi-document reasoning | RAG / knowledge infrastructure |
| Extraction & Action | Structured data extraction, classification, tool calling | Models and orchestration |
| Safety Enablers | PII redaction, data classification, anonymisation | Enabler capabilities that unlock others safely |
| Agent Runtime | Agentic workflows, agent memory, state management | Orchestration and knowledge infrastructure |
Each capability is classified by maturity tier (Enabler, Foundational, Desirable or Niche) reflecting how broadly proven and supported it is.
Every capability in the library is tagged with the data classification levels it's approved for. The same capability (document Q&A, for example) works differently depending on how sensitive the data is:
This means security and compliance decisions are built into the library, not managed through separate review processes. In practice:
Serco's specific classification scheme will be established through an initial discovery period and aligned to the library in the first weeks of the engagement.
The golden path concept originates from platform engineering, as pioneered by companies like Spotify through their Backstage internal developer platform. It's the recommended, supported way to build. Not a mandate, but a strong default that makes the right thing the easy thing.
For Serco, the golden path will be the combination of capabilities, patterns, and infrastructure that the platform team has validated and proven through delivery. Teams who follow it get:
This matters in Serco's operating environment where building outside proven, governed patterns creates risk. The golden path reduces that risk by default.
Deviation is permitted. A computer vision use case has genuinely different needs to a document Q&A agent. But deviation is governed with product teams needing to justify their rationale at build readiness, and the platform team tracks it. If multiple teams deviate in the same direction, that's a signal to update the golden path.
Before any technology choices are made, we agree a set of principles with Serco that govern all subsequent decisions. The architecture evolves continuously; the principles do not.
| Category | Principle | What it means in practice |
|---|---|---|
| Ethical | Human oversight for consequential decisions | Agents don't make high-impact decisions alone and humans stay in the loop where it matters |
| Ethical | Transparency proportionate to risk | The higher the stakes, the more explainable the AI must be |
| Ethical | Fairness and bias monitoring | Outputs are continuously checked for bias, not just at launch |
| Ethical | Privacy by design | Data protection is built in from the start, not bolted on later |
| Architectural | Composability over monoliths | Small, swappable components — not large, tightly coupled systems |
| Architectural | Abstraction at interfaces | Product teams are insulated from infrastructure changes happening underneath |
| Architectural | Data sovereignty by default | Data stays in-jurisdiction unless explicitly approved otherwise |
| Architectural | Open standards preferred | Avoid vendor lock-in; make it possible to change direction |
| Operational | Everything observable | If it runs, it's monitored — no black boxes |
| Operational | Everything auditable | If an agent acts, there's a record of what it did and why |
| Operational | Progressive rollout | Changes are deployed gradually, not all at once |
| Operational | Capability building over dependency | Serco teams own the outcomes — we build capability, not reliance |
The fastest and most reliable way to establish Serco's core AI capabilities is to build them through the delivery of a real use case, starting with (e.g.) the Collaboration Hub.
The AI platform team begins by delivering the use case end-to-end. Through that delivery, our team — working in partnership with Serco — makes real technology choices, builds real infrastructure, and solves real problems.
The components that emerge (RAG pipelines, vector stores, model routing, guardrails, monitoring) become the first capabilities in the library and the foundation of the golden path.
This approach means:
Once the use case is live, the team pivots from product delivery to platform operation. It supports the next wave of product teams (Bid Agent, Contract Risk, etc.), building out additional patterns and capabilities as demand requires. The capability library grows from delivery experience, not from a theoretical roadmap.
In parallel, the platform team establishes the technology radar for systematic evaluation of emerging capabilities beyond what current delivery demands.
Serco's use case list is heavily agent-focused. Deploying agents at enterprise scale introduces cost and quality challenges that aren't addressed in a standard infrastructure checklist, but matter enormously in production.
A single agent request can trigger 3 to 10 model calls. Without active management, costs spiral. We have seen implementations where expensive models were hardwired in so the cost to serve made no economic sense.
Our approach: intelligent model routing, prompt caching, and cost-per-outcome tracking rather than cost-per-token.
In practice, these techniques reduce agent operating costs by 70 to 90 percent compared to naive implementations.
Prompt engineering gets the attention, but context engineering determines quality. How retrieval results, system instructions, memory, and user input are assembled matters as much as the prompt itself.
Poor context assembly is the most common cause of hallucination. Our platform provides tooling to inspect, debug, and optimise context construction.
Static agents degrade over time. We advocate architectures where agents observe their own outcomes, detect what's working, and refine strategies through structured feedback.
Persistent memory across sessions, reflexion patterns, and continuous evaluation against quality baselines. The goal: agents that get better with use, not worse.
Serco's agents need more than single-session context. A Bid Agent that remembers successful patterns, a Contract Risk Agent that builds knowledge of recurring risks.
A layered memory system: working (current session), episodic (past interactions), semantic (learned knowledge). Memory management is an active design concern, not an afterthought.
These are considerations we will address through delivery of the initial use cases, building the right patterns into the platform from the start rather than retrofitting them later.
Ultimately, our recommendations will acknowledge Serco's existing infrastructure (AWS, Databricks) and be validated through discovery and our initial use case delivery.
| Capability area | What it provides | Decisions to validate |
|---|---|---|
| RAG / knowledge layer | Answer questions from documents. Find relevant content by meaning. Summarise and reason across multiple sources. | Vector store: Databricks Vector Search vs pgvector vs OpenSearch. Knowledge graph: Neo4j vs Amazon Neptune. Chunking strategy tuned per document type. |
| Data foundations | AI-ready data from Serco's existing estate. Classification before data enters the AI layer. | Integration boundary with Databricks programme. Classification taxonomy aligned to Serco's governance. Which source systems to connect first. |
| Model strategy | Right model for the task. Governed by data classification level. Product teams never choose a model directly. | Hosting per classification level. Cost/performance benchmarks on Serco tasks. Sovereignty requirements per jurisdiction. |
| LLMOps / MLOps | Version-controlled prompts and chains. Automated evaluation before deployment. Rollback if quality drops. | Evaluation criteria per domain. Prompt and context engineering tooling. Integration with Serco's existing CI/CD. |
| Security and compliance | Protection embedded in every layer. Every interaction logged and auditable. IP controls on what reaches external models. | Serco's data classification scheme. Compliance requirements per jurisdiction. IP policies per model provider. Supplier access model. |
| Responsible AI | Ethical controls built into the platform, not bolted on. Human-in-the-loop where it matters. Guardrails on every interaction. | Policy alignment with Serco's risk and ethics teams. Autonomy boundaries per domain (justice, health, defence). EU AI Act applicability. Guardrails baseline configuration. |
| Monitoring and observability | Visibility into quality, cost, reliability, and drift. Early warning before users notice degradation. | Quality baselines established through early delivery. SLO targets (start internal, tighten over time). Alerting thresholds per capability. Integration with Serco's observability stack (DataDog?). |
This section details how we would deliver the Collaboration Hub as the first use case through our methodology — from problem definition through to cost estimate. Each area below addresses a specific RFP requirement, demonstrating how our approach applies in practice to a real use case that we have already prototyped.
Serco's contract portfolio represents a unique institutional knowledge base built over decades. Today, that knowledge is fragmented across contracts, teams, and regions — inaccessible when it matters most.
The Collaboration Hub transforms this into a searchable, clearance-aware intelligence platform — enabling knowledge reuse, faster decision-making, and cross-contract learning at scale.
The objective is not just to surface information, but to enable users to synthesise insight, apply reasoning, and take action across fragmented systems. This aligns to our Discovery & Framing and Use Case Intake steps, ensuring opportunities are assessed consistently and aligned to Serco's AI platform and capability model.
As described in our methodology, we would typically engage with a broad set of stakeholders to properly assess build readiness, intended value, and the most effective way to build — understanding data, integrations, readiness, and capability dependencies.
In the absence of that discovery, we have made assumptions based on the use case documentation provided in the RFP and the Databricks data programme architecture. What follows would be validated and refined through the Discovery & Framing phase.
Using our assumptions about the outcome of the discovery work around the use case, we have created the Collaboration Hub Build Readiness Pack which can be seen here: use-case-ingest.blackstoneand.com/build-readiness/uc-023
Unifies access to distributed enterprise knowledge — enabling users to retrieve, synthesise, and act on information across systems through a single interface.
Built as part of Serco's AI platform, using reusable ingestion, retrieval, and access control capabilities rather than point-to-point integrations.
API-based ingestion from SharePoint, Teams, and data platforms. Support for PDF, Word, Excel, structured tables.
Document parsing, chunking, metadata enrichment (type, contract, date, owner), and entity normalisation.
Vector store embedding for semantic retrieval with metadata for filtering, security trimming, and traceability.
Batch ingestion for MVP. Event-driven or scheduled updates for near real-time refresh as the solution scales.
Access is governed through enterprise identity systems, aligned to Serco's security model:
Access control is enforced before retrieval and before agent execution. Implemented as a deterministic policy layer (not prompts or model inference), ensuring reliability and auditability.
Standard connectors to SharePoint, Teams, and data platforms. Reusable services exposed as platform capabilities.
Trigger updates when documents are created or modified. Enable timely refresh of indexed knowledge.
Expose via UI (as in the prototype). Optional integration into Teams, internal portals, and existing tools.
We explicitly account for real-world enterprise data challenges: variability in document quality, inconsistent metadata, permission complexity (e.g. SharePoint inheritance), latency between updates and availability, and inconsistent classification across systems.
Addressed through metadata enrichment, controlled initial scope (thin slice), progressive pipeline refinement, entity normalisation, and caching strategies.
Outcome: This approach enables a unified, secure view of enterprise knowledge with grounded, traceable AI outputs. It supports both retrieval-based responses and agent-driven workflows, scales across additional data sources without re-architecture, and establishes a reusable data foundation for future AI capabilities across Serco.
The Collaboration Hub is implemented as a modular, scalable AI capability that enables users to retrieve, synthesise, and act on enterprise knowledge.
It is delivered as a reusable, agent-driven architecture, forming part of Serco's broader AI platform and capability library.
The solution follows a layered architecture to ensure separation of concerns, scalability, and reuse.
Delivery is accelerated using proven components: pre-built ingestion and retrieval pipelines, agent orchestration patterns, structured output templates (summaries, risks, actions), evaluation frameworks, and reusable UI patterns. Enables rapid progression from prototype to MVP.
The Collaboration Hub establishes reusable platform capabilities: common ingestion framework, shared knowledge and retrieval layer, standard agent orchestration pattern, and centralised monitoring and evaluation. Result: faster delivery of future use cases, reduced duplication, and consistent governance and control.
Outcome: This approach enables the Collaboration Hub to deliver immediate value through contract reporting, risk identification, and client outputs, support both retrieval and action-oriented workflows, scale across users, data sources, and use cases, and form the foundation of a reusable, enterprise AI platform.
The Collaboration Hub is delivered through a structured, iterative approach that moves from problem definition to production in controlled stages.
Delivery is centred on proving value early through thin, end-to-end slices, using real workflows and data, before scaling. All delivery aligns to Serco's AI platform, leveraging reusable capabilities and contributing back to the capability library.
We define the Collaboration Hub in operational terms with Serco stakeholders.
Activities: Engage contract managers, analysts, and commercial teams. Map workflows (reporting, risk identification, client communication). Identify friction points. Assess across five pillars. Define success criteria.
Outputs: Use Case Card, initial desirability & viability scoring.
We rapidly design and demonstrate how the Collaboration Hub supports real workflows.
Activities: Define key interaction patterns. Explore agent-based solution patterns (retrieval, orchestration, workflows). Build a lightweight prototype demonstrating querying enterprise knowledge, generating summaries, risks, and actions.
Outputs: Clickable prototype, early user feedback, validation of desirability and usability.
We validate that the solution delivers reliable, governed outputs using real data.
Activities: Test retrieval across selected data sources. Validate orchestration. Verify access control and governance enforcement. Evaluate output quality. Run targeted experiments to reduce uncertainty.
Outputs: Experimentation Log, updated confidence, Build Readiness Pack, decision to proceed / iterate / stop.
We deliver a working Collaboration Hub through incremental, end-to-end slices.
Approach: Build vertical slices delivering complete user workflows. Test-first development. AI-augmented engineering with human oversight. Leverage platform golden path capabilities.
Example slices: Slice 1 — query + summary (SharePoint). Slice 2 — source traceability. Slice 3 — risk identification. Slice 4 — structured outputs (reports, emails).
Outputs: Working MVP deployed to a controlled user group, early usage and feedback data.
The Collaboration Hub is deployed to pilot users within live operational contexts.
Activities: Enable contract managers and analysts to use the solution in real workflows. Monitor usage, output quality, and performance. Gather structured user feedback. Refine prompts, retrieval, and workflows.
Outputs: Validated solution in real-world usage, evidence of adoption and user satisfaction.
The Collaboration Hub is scaled progressively across users, teams, and use cases.
Approach: Controlled release (phased rollout, environment promotion). Ongoing monitoring of performance, reliability, and adoption.
Scaling: Vertical — performance, reliability, cost optimisation. Horizontal — new users, contracts, and use cases. Each new use case builds on the same platform capabilities, not a new solution.
Stakeholders see working software early and continuously, not at the end.
Identified early and managed as part of the delivery plan.
Progress is demonstrated through working functionality, not documentation:
Outcome: This delivery approach ensures that the Collaboration Hub delivers value early through real workflows, is validated with users before scaling, evolves incrementally into a robust, enterprise-grade capability, and contributes to and benefits from Serco's shared AI platform and capability library.
The Collaboration Hub is designed to deliver trusted, auditable, and high-quality outputs. Given its role in supporting contract performance, risk identification, and client communication, structured controls are applied across the full lifecycle — from generation through to ongoing operation.
All outputs are grounded in enterprise data and subject to deterministic controls.
Critically: access control is enforced before retrieval and before agent execution. Policies are applied through deterministic controls (identity, RBAC, data policies) — not model inference. This ensures outputs are both evidence-based and compliant by design.
We implement a structured evaluation framework to continuously measure and improve output quality.
Evaluation dimensions:
Approach: Define evaluation datasets based on real Collaboration Hub scenarios (e.g. contract reporting, risk identification). Combine automated evaluation (pattern and metric-based scoring) with human review (SMEs validating outputs in context). This ensures quality is measured systematically and tied to real operational use.
Once deployed, the Collaboration Hub is actively monitored to detect issues and drive improvement.
Insights are fed back into prompt refinement, retrieval tuning, and orchestration and workflow adjustments. This creates a continuous improvement loop aligned to the broader experimentation and operating model.
We implement explicit guardrails to manage risks associated with AI-driven outputs.
Models and prompts are actively managed to maintain performance, reliability, and cost efficiency.
This ensures the Collaboration Hub remains effective as data, usage, and requirements evolve.
All interactions with the Collaboration Hub are traceable and auditable.
This provides a clear audit trail, supporting both internal assurance and external scrutiny.
Outcome: This approach ensures that the Collaboration Hub delivers reliable, evidence-based outputs, operates within clear governance and risk controls, is continuously measured and improved, and builds user trust — critical for sustained adoption and business impact.
Knowledge transfer is embedded throughout delivery — not treated as a final handover. The approach is designed to enable Serco to independently operate, extend, and scale the Collaboration Hub, while building the internal capability to deliver future AI use cases on the shared AI platform.
Delivery is executed through a blended team model, where Serco teams work alongside our delivery team across all phases — from discovery through to scaling.
For the Collaboration Hub, this includes:
This ensures knowledge is transferred through active participation in real delivery, not documentation alone, and builds confidence in operating AI-enabled workflows in practice.
Each stage of the methodology is designed to build specific capabilities within Serco:
This creates a repeatable model that Serco can apply beyond the Collaboration Hub.
For this use case, we focus on transferring the capabilities required to operate and extend the Hub as a reusable enterprise service. This includes:
All components are delivered in a way that is transparent, documented, and reusable, forming part of Serco's AI capability library.
We provide full access to all artefacts created during delivery, including:
These artefacts are structured for reuse across future use cases and aligned to platform standards.
As the Collaboration Hub moves from MVP to scale, ownership progressively transitions to Serco teams. This includes:
We support Serco in establishing a Centre of Excellence (CoE) or equivalent function to govern and scale AI delivery.
The Collaboration Hub is not only a consumer of platform capabilities, but a contributor to their evolution. As part of delivery: reusable patterns are contributed to the AI Capability Library, gaps and constraints are fed into the platform backlog, and evaluation datasets and learnings are shared across use cases. This ensures that each delivery strengthens the overall platform, accelerating future AI initiatives.
The target state is for Serco to operate the Collaboration Hub as a managed, evolving product within a broader AI platform. In this model:
Outcome: This approach ensures that Serco builds internal capability, not dependency, can extend the Collaboration Hub independently, establishes a repeatable model for AI delivery at scale, and continuously evolves its AI platform through real-world usage and feedback.
The Collaboration Hub is delivered incrementally, with effort aligned to each phase.
| Phase | Duration | Key Activities |
|---|---|---|
| Discovery & Framing | 2 weeks | Stakeholder engagement, workflow mapping, use case definition |
| Ideation & Prototype | 1 week | Interaction design, prototype build, early validation |
| Experimentation | 2 weeks | Data validation, retrieval testing, orchestration validation |
| MVP Build (Thin Slices) | 6-8 weeks | Slice-based delivery, continuous deployment, weekly demos |
| Pilot | 4-6 weeks | Real user usage, monitoring, refinement |
| Scale (Initial rollout) | 4-8 weeks | Controlled rollout, performance optimisation |
Total initial delivery: ~5-6 months
This approach ensures early value delivery (within weeks), controlled risk reduction before scaling, and predictable progression from concept to production.
Please see the Commercials section for team shape, rate card, and cost details.
Blackstone& is a senior delivery team. Every person on the engagement delivers work directly — there are no management layers between the team and the output. The team is structured in two tiers: a fractional engagement team providing strategic direction, governance, and specialist advisory; and a full-time delivery team embedded in Serco's AI Lab on a day-to-day basis.
Senior leadership available on a fractional basis — providing strategic direction, methodology oversight, operating model design, and data platform advisory without the cost of full-time senior rates.
| Role | Person | Location | Basis | Primary Focus |
|---|---|---|---|---|
| Engagement Lead / AI Strategist | Kieran Blackstone | UK / UAE | Fractional | Strategy, methodology, stakeholder engagement, quality assurance |
| Target Operating Model / Delivery Lead | Wayne Palmer | UK | Fractional | Operating model design, delivery governance, capability building |
| Data Strategy Advisor | Suranga Fernando | UK | Fractional | Data platform strategy, Databricks architecture, data engineering advisory |
Embedded in Serco's AI Lab, delivering day-to-day. Between them, Ras and Don cover the full delivery stack — from business analysis and product ownership through to production infrastructure.
| Role | Person | Location | Basis | Primary Focus |
|---|---|---|---|---|
| AI Product Lead / Business Analyst | Ras | UK | Full-time | Business analysis, AI product ownership, experimentation, use case lifecycle |
| Data/ML DevOps Engineer | Don | UK | Full-time | Machine learning engineering, data engineering, DevOps, CI/CD, LLMOps |
Security clearance: All four fully UK-based team members (Wayne, Ras, Don, and Suranga) have been SC cleared within the last few years. None currently hold active SC/DV. Blackstone& is able to provide SC cleared resources at scale through its vetted subcontractor network.
Strategy, methodology, stakeholder engagement, quality assurance. Owns the engagement relationship and overall delivery quality. Ensures the methodology is applied consistently and the team delivers against Serco's objectives.
Kanad Hospital, Abu Dhabi — Defined AI adoption strategy using the same methodology proposed for Serco. Built the hospital's first AI prototype. Currently delivering production use cases: customer support agents (AWS Bedrock) and website development agent, both integrating with Microsoft Fabric. Designed AI roadmap aligned to UAE healthcare regulation and sensitive patient data governance.
HMRC — Hawk Platform — Built a microservices platform giving businesses in trade a single self-serve interface. Components pre-approved by governance, security, data, and architecture boards — directly comparable to the AI Capability Library's golden path approach.
HMRC — GVMS — Oversaw delivery of the UK imports/exports trade system for UK ports post-Brexit. Introduced agile contracting principles. Matured the platform to Critical National Infrastructure standards.
DWP — DevOps Capability Delivery — Delivered DWP's first DevOps maturity assessment, leading to capability delivery in the Fraud, Error & Debt directorate. Built capability directly for DWP rather than creating consulting dependency.
DSIT — GenAI Product — Created DSIT's first GenAI-powered product on Salesforce Einstein. First use of generative AI in a production-facing government context.
Three Mobile — AI Labs Function — Engagement lead for the AI Lab service rollout, working alongside Ras Fernando and Don Capito on the build. Established the AI Labs methodology and delivery framework that forms the foundation of the approach proposed for Serco.
DfE, MOD (via DESA) — Software development, DevOps services, and Salesforce rescue/transition across multiple government departments.
Founded Blackstone& — Built the Collaboration Hub prototype, AI Capability Library, and Agile Contracting Toolkit before this bid using the same rapid delivery approach proposed for Serco.
Deep experience in governance, delivery frameworks, and organisational design for technology functions across UK government. Involved in the majority of the engagements listed above, bringing complementary delivery and operating model expertise to Kieran's programme leadership. Current focus is designing and deploying AI-native product and platform operating models that drive employee engagement and accelerated productivity into organisations.
Morae Global — Augmented Product & Platforms Operating Model — Designed and rolled out a globally distributed operating model for this legal technology organisation. Bringing together multiple timezones into a cohesive way of working, introduced augmented engineering techniques which allowed teams to accelerate delivery while working within highly regulated domains. Initially driven by geographical hackathons to drive awareness and get buy-in.
Morae Global — Legal Intelligence Product Mobilisation — Working with a legal technology company to define and validate the team, governance and architecture for an AI intelligence layer. This includes a multi-agent system (orchestrator, contract analysis, eDiscovery, reporting agents) built on Azure AI Search, Neo4j knowledge graph, Databricks, and LangGraph orchestration. This system is the initial product that will begin to develop their AI Platform.
DSIT — Product & Platforms Operating Model — Analysed, designed and rolled out the Target Operating Model for DSIT as part of their departmental restructure. With a heavy focus on change management and role definition, created core work management backbones and setup core events to manage and route work effectively across the organisation.
Mastercard — DevOps Transformation — Led the transformation of their faster payments product to be orientated towards cross-functional teams with a focus on engineering excellence and fast flow. Stopped a failing re-architecture programme and pivoted resources to a modern architecture.
Cross-Government Delivery — Transformation roles across HMRC, DWP, DfE, and MOD engagements. Consistent focus on governance structures, delivery frameworks, and the organisational design required to make technology functions work after the consultants leave.
Business analysis, AI product ownership, experimentation, use case lifecycle. Runs the day-to-day delivery — from stakeholder discovery through experimentation to production handoff. Owns the Use Case Cards, Experiment Logs, and Build Readiness Packs.
Turner & Townsend (Current) — Supporting AI-enabled contract workflows with embedded guardrails, human-in-the-loop decisioning, and scalable product design within commercial processes. Directly comparable to the Collaboration Hub's contract intelligence use case.
Three Mobile — AI Labs Function — Developed the Blackstone AI Labs methodology: a structured approach to identifying, validating, and scaling AI use cases across an enterprise. This methodology forms the foundation of the approach proposed for Serco's AI Lab.
HSBC — AI Labs Function — Built and delivered an AI Labs function supporting a global customer base and distributed engineering teams. Governance, risk, and controlled experimentation within a heavily regulated, globally distributed organisation.
DWP — DevOps Capability & Delivery — Led capability assessment and development programme in the Fraud, Error & Debt directorate. Translated transformation strategy into measurable delivery outcomes. Same challenge of building internal capability alongside external delivery.
Machine learning engineering, data engineering, DevOps, CI/CD, LLMOps. Builds and operates the AI platform infrastructure. Bridges the gap between data science teams and production systems. Upskills Serco engineers through paired delivery.
Genentech (Roche) — AI/HPC Platform Engineering — Built and scaled cloud-native AI/ML infrastructure on AWS supporting 200+ data scientists across US and Europe. Deployed next-generation AI/HPC platform replacing on-premise clusters — 3x cost reduction, Nvidia GPU instances (B200, H200, H100) for deep learning. Established observability with Grafana, Prometheus, OpenTelemetry. Upskilled L1/L2 support engineers on AI platform operations.
IAVI — Trusted Research Environment — Lead DevSecOps delivering a TRE for medical research institutions globally, handling personal and sensitive data. AWS well-architected framework with security controls, automated with Terraform/CDK. Led cross-functional team of 10 including data scientists.
Three Mobile — AI Labs Function — Automated Databricks, Unity Catalog, and Delta Live Tables CI/CD in Azure. Delivered scalable AlteryxServer cluster integrated to Snowflake. 10x horizontal scaling, deployment frequency from monthly to weekly. Upskilled DevOps and Data Engineers while delivering.
Imperial College London — Research Computing — Led multi-disciplinary team to deliver Trusted Research Environment in AWS. Design through to MVP delivery for researchers.
Security clearance: SC cleared (lapsed). UK-based.
Security clearance: All four fully UK-based team members (Wayne, Ras, Don, and Suranga) have been SC cleared within the last few years. None currently hold active SC/DV. Blackstone& is able to provide SC cleared resources at scale through its vetted subcontractor network.
Every asset listed below is working software. Not a slide deck. Not a template. Evaluators can click through each one.
| Asset | What It Does | How It Accelerates Delivery |
|---|---|---|
| Collaboration Hub Prototype | Working prototype of the exact use case being tendered. Cross-border contract intelligence, AI-powered search, quality scoring, agentic enrichment. | Stakeholders interact with the solution concept on day one. No weeks of discovery before anything is visible. |
| AI Capability Library | 156 AI capabilities mapped across 14 domains and 6 data classification levels. Living reference architecture with strategic data exposure analysis. | Maps infrastructure requirements for any use case. Identifies Enabler capabilities that reduce data classification for downstream deployments, cutting cost and expanding scope. |
| Build Readiness Backlog | Interactive tool mapping Serco's 34 identified use cases against infrastructure maturity levels. | Shows what is buildable now versus what is blocked by infrastructure gaps. Auto-generates a prioritised roadmap. |
| Use Case Submission Portal | 5-pillar intake tool for structured use case assessment. Produces scored Use Case Cards. | Standardises inputs across business units. Evolves to self-serve intake as Serco's AI function matures. |
| Experimentation Hub | Library of premade experiments and test harnesses for AI use case validation. | Accelerates hypothesis validation. Reduces time from idea to evidence. Reproducible and auditable. |
| Agile Contracting Toolkit | Interactive commercial model demonstration showing how cost, risk, and scope are managed in agile delivery. | Builds commercial trust through transparency. Demonstrates the hybrid fixed-price/agile model proposed for this engagement. |
| AI Adoption Framework | End-to-end methodology from discovery through production, with working tooling at every step. | Not a methodology document — a structured, artefact-driven process backed by the tools listed above. |
The core team of three senior professionals is the foundation. The framework agreement's three-year term allows the team to scale as workload demands, using two mechanisms:
Specialist subcontractors. For specific capability needs — Databricks engineering, UX research, domain-specific data engineering — Blackstone& brings in vetted specialists. All subcontractors are assessed for security clearance eligibility and delivery quality before engagement.
Progressive ownership. Serco's own team is the primary scaling mechanism. As capability transfers through paired delivery and structured knowledge transfer, Serco engineers take on delivery directly. The Blackstone& team shifts from hands-on delivery to advisory and quality assurance. This is by design: the goal is a self-sustaining AI Lab, not a permanent consulting dependency.
What we do not do: fill seats with junior staff to meet a headcount target. Every person on this engagement adds delivery value from day one. Every day rate buys output, not overhead.
The measure of this engagement is not what we build. It is whether Serco can build the next one without us.
Knowledge transfer is not a phase that happens at the end of delivery. It is a property of how we work — embedded in every sprint, every ceremony, every artefact, and every decision from day one. We do not transfer knowledge of what we built. We transfer the capability to build, adapt, and evolve independently.
This distinction matters. Technology changes. Models improve. New use cases emerge. If we hand over documentation of a system we built, Serco has a snapshot. If we hand over the methodology, the tools, and the institutional knowledge to adapt them, Serco has a capability that compounds over time.
We structure knowledge transfer around four mechanisms that work together. No single mechanism is sufficient on its own — embedded working builds skills, the Centre of Excellence provides structure, reusable components reduce reinvention, and progressive ownership creates accountability for independence.
Our team works alongside Serco engineers, product leads, and business stakeholders within delivery squads. There is no isolated consultancy layer. If we are in a meeting, Serco is in that meeting. If we are writing code, a Serco engineer is pairing on it. If we are making an architecture decision, Serco's technical lead is in the room.
This is not observation. Every sprint includes Serco team members as active participants — contributing to hypotheses, making design choices, reviewing outputs, and owning artefacts. The work is shared from the start, which means there is nothing to "hand over" later.
Serco's AI ambition spans 700+ contracts across four global divisions. Individual project teams will not sustain this at scale. We support Serco in establishing a central AI capability function — a Centre of Excellence that owns standards, tooling, governance, and the methodology for delivering AI use cases across the organisation.
This includes defining:
Training is tailored by audience, because a Serco engineer needs different capabilities than a business stakeholder:
| Audience | Transfer Method | Capability Developed |
|---|---|---|
| Engineering | Pair delivery, code reviews, architecture decision records | Build and operate AI products independently |
| Data | Pipeline development, data quality frameworks, Databricks integration patterns | Design and maintain data foundations for AI workloads |
| Product | Learning ceremonies, experimentation interpretation, roadmap formation | Identify, validate, and prioritise AI use cases using evidence |
| Risk & Governance | Responsible AI framework, HITL design, risk assessment methodology | Evaluate AI risk proportionately and govern responsibly |
| Business Users | Decision forums, Use Case Card submission, interpreting AI outputs | Commission AI work, make evidence-based investment decisions, apply critical judgement to AI-generated insights |
Every method, tool, and asset we use in delivery is designed for reuse and handed over to Serco. These are not locked behind our IP — they become Serco's operational toolkit:
These compound. Each use case delivered adds patterns, prompts, and learnings to the shared library. By the fourth use case, Serco's teams are drawing on a substantial internal knowledge base that did not exist before.
Transfer follows a defined four-stage model. Each use case progresses through these stages, with clear criteria for transition. This is not a theoretical framework — it is how we structure every engagement, and how we hold ourselves accountable for making ourselves replaceable.
Two things make this work in practice:
Transition is per-capability, not big-bang. Some capabilities transfer faster than others. A Serco engineer may reach Stage 3 on pipeline development while still at Stage 2 on architecture decisions. We track this at the individual and team level, so we can target support where it is genuinely needed.
Transition criteria are observable, not subjective. "Serco team contributing to experiments" is verifiable in sprint artefacts. "Running sprints independently" is visible in ceremony records. We do not declare transfer complete based on hours of training delivered — we declare it complete based on what Serco's team can demonstrably do.
Our five delivery ceremonies are not just project management. Each one is designed to build a specific capability in Serco's team:
| Ceremony | Frequency | What Serco Learns by Participating |
|---|---|---|
| Planning | Weekly | How to identify testable hypotheses, decompose work, and prioritise experiments based on evidence and risk |
| Standup | Daily | How to surface blockers early, coordinate across disciplines, and maintain delivery momentum |
| Learning | Weekly | How to interpret experimental evidence, generate insights, identify patterns, and revise strategy. This is the primary transfer mechanism. |
| Retrospective | Bi-weekly | How to build continuous improvement habits — what to keep, what to change, what to try |
| Decision Forum | Monthly | How to make evidence-based AI investment decisions. When to kill work that is not delivering value. When to scale what is. |
The Weekly Learning ceremony deserves emphasis. This is not a status report. It is a shared analysis session where Serco team members — alongside us — examine what the latest experiments have revealed, debate what the evidence means, and decide what to do next. This builds the analytical and decision-making capability that Serco needs to run the AI Lab independently. Over time, Serco's team leads these sessions. We move from facilitator to participant to observer.
The Monthly Decision Forum transfers the hardest capability of all: the discipline to stop work that is not delivering value. In our experience, organisations that succeed with AI at scale are not the ones that start the most initiatives — they are the ones that kill the wrong ones early and double down on the right ones. The Decision Forum builds this muscle in Serco's leadership team, using real evidence from real experiments, with real consequences.
Transfer is not complete when we have trained people. It is complete when they can do it without us.
We track transfer through observable indicators, not training hours or satisfaction scores:
| Indicator | What It Demonstrates | Target Stage |
|---|---|---|
| Serco team members leading sprint planning | Delivery capability | Stage 3 |
| Engineers making architecture decisions with support, not direction | Technical independence | Stage 3 |
| Build Readiness Packs authored by Serco with Blackstone& review | Assessment capability | Stage 3 |
| New use cases entering the pipeline without Blackstone& involvement in intake | Discovery and prioritisation capability | Stage 4 |
| Decision Forums running with Blackstone& in advisory, not facilitator role | Governance and kill discipline | Stage 4 |
| AI Capability Library updated by Serco team — new capabilities evaluated, classified, and added | Evolving market knowledge | Stage 4 |
| Next use case deployed end-to-end without external support | Full independence | Stage 4 |
These indicators are reviewed monthly. When all Stage 4 indicators are met, the engagement has succeeded on its own terms.
Formal transfer through delivery and ceremonies builds deep capability in the core team. But Serco's AI ambition is organisational — 700+ contracts, four divisions, thousands of potential users. We recommend investing in broader capability acceleration alongside core delivery:
Internal community building. Hackathons, show-and-tells, and lighthouse demonstrations create energy and awareness beyond the delivery team. When a contract manager in the Middle East sees what a team in UK & Europe built in two sprints, that is more powerful than any training programme.
Working out loud. We recommend recording key sessions — architecture decisions, learning ceremonies, experiment reviews — and making them available through a simple, searchable internal knowledge base. A lightweight AI agent can handle PII scrubbing and indexing. This turns tacit knowledge into institutional knowledge, and it means new team members can onboard by watching real decisions being made, not reading sanitised documentation.
Role-specific learning journeys. Not everyone needs the same depth. A business user submitting a Use Case Card needs a 20-minute walkthrough. An engineer joining the delivery squad needs a structured onboarding path covering architecture, tooling, and ways of working. We design these paths and hand them over as part of the CoE toolkit.
Builder profiles. We track what capabilities each team member has developed — across technical skills, methodology understanding, and domain knowledge. This makes capability gaps visible and manageable, and gives Serco's leadership a clear picture of where the organisation is strong and where it needs investment.
Everything described above is how we transfer knowledge. But we also want to propose a use case of our own — one that would enter Serco's AI pipeline alongside the 34 identified opportunities, validated through the same methodology, and delivered using the same approach.
The progressive ownership model relies on human judgement to assess readiness. That works — but it requires our team to be present, observing, and making those calls. What if an AI system could do this continuously, independently, and at a level of detail that no human observer can sustain?
The use case: an agentic system that monitors the engagement itself — not just what is being built, but the capabilities required to build, operate, and evolve it — and compares those requirements against the real capability profiles of every team involved.
The system operates across three layers:
1. Capability Demand Mapping. The agent monitors delivery artefacts in real time — architecture decision records, ceremony recordings and transcripts, sprint outputs, code commits, infrastructure configurations, Build Readiness Packs, and operational runbooks. From these, it extracts the specific capabilities being exercised: which AI patterns are in use, which data engineering techniques, which governance frameworks, which operational practices. Every capability is tagged and tracked against our AI Capability Library taxonomy.
2. Builder Profile Comparison. Every team member — ours, Serco's, and any other third-party consultancy — has a builder profile: a structured record of demonstrated skills, domain knowledge, methodology familiarity, and delivery experience. The agent continuously compares the capability demands of the engagement against the builder profiles of the internal Serco team. The output is a live, evolving gap analysis — not what Serco's team was trained on, but what they can demonstrably do versus what the work actually requires.
3. Gap Closure Recommendations. For each identified gap, the system recommends one of three paths:
| Path | When It Applies | Example |
|---|---|---|
| Train | The capability is best held by a human — judgement-intensive, context-dependent, or requiring stakeholder trust | Architecture decision-making, responsible AI assessment, stakeholder negotiation |
| Automate | The capability can be reliably handled by an agent — repetitive, pattern-based, or requiring consistency at scale | Pipeline monitoring, prompt evaluation, documentation generation, data quality checks |
| Train + Automate | The human needs to understand the capability (for oversight and edge cases) but day-to-day execution is agent-assisted | Code review with AI-assisted analysis, security scanning with human sign-off, experiment log analysis |
This system removes the single biggest risk in any knowledge transfer engagement: the consultancy marking its own homework. The agent tracks capability transfer independently of us. It does not rely on our assessment of whether Serco's team is ready. It watches what Serco's team actually does — in ceremonies, in code, in decisions — and measures that against what the work demands.
It also solves a problem that no amount of traditional training addresses: capability drift. As the AI landscape evolves, the capabilities required to operate Serco's AI estate will change. New model architectures, new security requirements, new regulatory frameworks. The agent continuously updates the demand side of the equation, so Serco always knows where the gaps are — even after we have left.
Finally, it provides Serco's leadership with something they cannot get from timesheets or training records: an honest, real-time view of organisational AI capability — who can do what, where the dependencies are, and what it would take to close each gap.
This is not a standard offering. It is a capability we would build during the engagement, using the same methodology and platform we use for any other use case. It would enter the pipeline through the AI Front Door, be validated through the Experiment Engine, and — if it proves value — be deployed as a production tool that Serco owns and operates independently.
The builder profile is the foundation of the entire system. Every person involved in the engagement — our team, Serco's engineers, business stakeholders — gets one. It is not a CV. It is a living calibration tool that tracks what someone can demonstrably do, how they work, and where their boundaries are.
Below are two examples showing how profiles work for very different roles.
Role: Senior AI Engineer | Organisation: Serco — UK AI Lab (Internal)
Profile Created: 2026-05-12 | Last Updated: 2026-07-18
Background: 8 years across data engineering and machine learning, joining the AI Lab from Serco's Data Platform team. Deep experience with Databricks, Python, and traditional ML pipelines. Newer to LLM-based systems, agentic architectures, and RAG patterns — areas of active growth through the engagement.
Tech Stack Competence:
| Tool / Capability | Level | Notes |
|---|---|---|
| Python | Expert | Primary language, 8 years |
| Databricks | Expert | 3 years, built production ML pipelines on Serco's DSML platform |
| SQL | Expert | Complex queries, performance tuning, data modelling |
| LLM APIs (Claude, GPT) | Proficient | Comfortable with prompt engineering and API integration |
| RAG Architecture | Familiar | Has built one prototype RAG pipeline, needs guidance on production patterns |
| Agentic Workflows | Aware | Understands the concept, hasn't built one independently |
| LLMOps / Evaluation | Familiar | Can run basic evaluations, needs guidance on systematic eval harnesses |
| Responsible AI Frameworks | Familiar | Understands principles, has not led a risk assessment independently |
Responsible Building Controls — Competence Boundary Detection:
Stretch work is encouraged, not blocked. For critical domains (security architecture, data classification, responsible AI guardrails), stretch work requires sign-off before deployment — not just review after the fact.
Growth & Learning Log (extract):
| Date | Progression | Evidence |
|---|---|---|
| 2026-07-18 | RAG Architecture: Familiar → Proficient | Led chunking strategy redesign for contract ingestion pipeline. Architecture decisions documented in ADR-017. |
| 2026-06-22 | LLM APIs: Familiar → Proficient | Built production prompt pipeline for contract summarisation. Versioned prompts, fallback handling, runbook written independently. |
| 2026-06-22 | Responsible AI: Aware → Familiar | Participated in risk assessment for Collaboration Hub. Contributed to HITL design. |
| 2026-06-01 | RAG Architecture: First hands-on build | Prototype search pipeline using Databricks vector store + Claude API. Needed step-by-step guidance. |
Ownership Stage:
| Capability Area | Current Stage | Evidence |
|---|---|---|
| Data pipeline development | Stage 4 — Independent | Deployed contract ingestion pipeline end-to-end without external support |
| RAG architecture | Stage 3 — Lead with Support | Leading chunking redesign, architecture decisions reviewed |
| Agentic workflows | Stage 1 — Learning | Observing and contributing, hasn't led independently |
Role: Director of Contract Operations | Organisation: Serco — UK & Europe Division
Profile Created: 2026-05-05 | Last Updated: 2026-07-15
Background: 20 years in large-scale programme delivery, responsible for a portfolio of 200+ contracts spanning justice, health, and citizen services. No technical background in AI or software development — and doesn't need one. Her role is as a commissioner of work, an investment decision-maker, and a champion of AI adoption across her division.
Tech Stack Competence:
| Tool / Capability | Level | Notes |
|---|---|---|
| AI concepts (LLMs, RAG, agents) | Familiar | Understands what they do and where they apply, cannot build or configure them |
| Use Case Card submission | Proficient | Has submitted 6 use cases through the AI Front Door |
| Experimentation interpretation | Familiar | Can read experiment summaries, sometimes needs help distinguishing signal from noise |
| AI Capability Library | Familiar | Can navigate and filter, understands tier structure |
| Commercial modelling for AI | Proficient | Can model ROI, build business cases, assess cost-benefit at portfolio level |
| Technical architecture | None | Not her role — routes to engineering leads |
Responsible Building Controls — Commission-Safe Defaults:
For any use case Sarah submits through the AI Front Door, technical feasibility assessment is automatically routed to engineers rated Proficient or above in the relevant capability areas.
Growth & Learning Log (extract):
| Date | Progression | Evidence |
|---|---|---|
| 2026-07-15 | Kill discipline milestone | First time Sarah voted to kill a use case she had personally championed (contract compliance scanning — viable but low impact relative to alternatives). |
| 2026-06-20 | AI Capability Library: Aware → Familiar | Used the library independently to assess a proposal from the Middle East division. Correctly identified missing Enabler-tier infrastructure and recommended sequencing. |
| 2026-06-05 | Use Case Card: Familiar → Proficient | 5th submission. Problem statements specific, impact estimates grounded in contract data, data sensitivity classifications correct without review. |
Ownership Stage:
| Capability Area | Current Stage | Evidence |
|---|---|---|
| Use case identification & submission | Stage 3 — Lead with Support | Submitting quality use cases independently |
| Evidence-based investment decisions | Stage 3 — Lead with Support | Leading Decision Forum discussions, killed a use case on evidence |
| Portfolio prioritisation | Stage 2 — Co-Deliver | Uses Capability Library for sense-checking, not yet leading independently |
| Technical feasibility assessment | Stage 1 — Learning | Appropriately routes to engineers (nor should she assess this herself) |
This is how we work. It is not a section we added to a proposal — it is the model we have applied repeatedly across UK government.
Department for Work and Pensions — During our DevOps Capability Delivery programme, we trained DWP apprentices alongside experienced engineers in live delivery. The apprentices were not observers. They were building production services, supported by our team, developing capability that remained in DWP long after we left.
Ministry of Defence — We transitioned responsibility from an underperforming incumbent by first documenting the dependency the existing supplier had created, then building a streamlined service model that reduced external reliance. The goal was not to replace one supplier with another — it was to give MOD the ability to operate independently.
Department for Science, Innovation and Technology — Our team designed and rolled out the Target Operating Model for DSIT's technology function, defining the roles, responsibilities, and ways of working that the department continues to operate under.
In each case, the engagement ended with the client's team running the capability. That is the only outcome we consider successful, and it is the outcome we commit to delivering for Serco.
Our proposed team shape for the Collaboration Hub engagement combines fractional strategic leadership with full-time embedded delivery. This model provides senior expertise without the cost of full-time senior rates, while ensuring day-to-day delivery is consistent and embedded within Serco's operations.
The team scales based on delivery phase — lighter during discovery and experimentation, fuller during MVP build and scaling. As capability transfers to Serco, the Blackstone& team contracts and Serco's internal team expands.
| Name | Role | Basis | Day Rate (£) |
|---|---|---|---|
| Kieran Blackstone | Engagement / Delivery Lead | Fractional Collectively up to 3 days/week | 1,200 |
| Wayne Palmer | Operating Model / Strategy Execution | 1,200 | |
| Suranga Fernando | Data Strategy & Databricks Advisor | 1,200 | |
| Ras Fernando | AI Product Lead / Business Analyst | Full-time (5 days/week) | 900 |
| Don Capito | Data/ML DevOps Engineer | Full-time (5 days/week) | 900 |
The Collaboration Hub use case will be delivered across six phases over approximately 18 weeks. The initial core team (outlined above) will be present throughout the full engagement. From the MVP phase onwards, we add a nearshore tester to support quality assurance through build, pilot, and scale — an additional 13 weeks of coverage.
| Phase | Duration | Team |
|---|---|---|
| 1. Discovery | 1 week | Core team |
| 2. Ideation & Prototyping | 2 weeks | Core team |
| 3. Experimentation | 2 weeks | Core team |
| 4. MVP Build | 3 weeks | Core team + Nearshore Tester |
| 5. Pilot | 4 weeks | Core team + Nearshore Tester |
| 6. Scale | 6 weeks | Core team + Nearshore Tester |
| Phases 1-3: Core team (5 weeks × £12,600) | £63,000 |
| Phases 4-6: Core team (13 weeks × £12,600) | £163,800 |
| Phases 4-6: Nearshore Tester (13 weeks × £1,500) | £19,500 |
| Total Collaboration Hub Delivery | £246,300 |
| Role | UK Day Rate (£) | Nearshore Day Rate (£) |
|---|---|---|
| Engagement / Delivery Lead | 995 | — |
| Product Lead | 995 | — |
| Security Architect | 1,100 | — |
| Data Architect | 1,100 | — |
| Platform Architect | 1,100 | — |
| Principal AI Engineer | 1,100 | 605 |
| Senior AI Engineer | 950 | 495 |
| AI Engineer | 875 | 500 |
| Principal Full Stack Engineer | 895 | 485 |
| Senior Full Stack Engineer | 825 | 465 |
| Full Stack Engineer | 750 | 435 |
| Principal Data Engineer | 1,150 | 610 |
| Senior Data Engineer | 910 | 505 |
| Data Engineer | 875 | 500 |
| Principal DevOps Engineer | 950 | 515 |
| Senior DevOps Engineer | 850 | 480 |
| DevOps Engineer | 750 | 435 |
| Principal QA Engineer | 825 | 430 |
| Senior QA Engineer | 700 | 385 |
| QA Engineer | 600 | 345 |
| Senior Business Analyst | 800 | 435 |
| Business Analyst | 700 | 395 |
| Junior Business Analyst | 500 | 290 |
| Fractional Subject Matter Experts (AI Strategy, Data Strategy, TOM SME, Security SME) | 1,500 | — |
We propose a phased commercial approach that allows both parties to build confidence in the engagement, the partnership, and the ways of working before committing to a long-term commercial structure.
Supporting tool: agile-contracting-toolkit.pages.dev
Pure T&M for the first 90 days. Scope is clarified, dependencies mapped, and ways of working established.
Scope still being defined. Dependencies not yet understood. Teams need time to establish rhythms. Premature milestones would create friction.
Building a shared understanding of the problem space, delivery landscape, and the partnership itself.
T&M remains the billing model. All invoicing continues on a day-rate basis. No change to the commercial relationship.
Hybrid Agile model runs as a comparison. Sprint milestones defined, delivery tracked, costs calculated — without money changing hands differently.
Can we define meaningful milestones together? How does the hybrid model perform financially vs T&M? What's the right milestone allocation percentage based on real delivery, not assumptions? At the end, both parties have 90 days of comparative data — a concrete, evidence-based foundation for the long-term model.
Transition to Hybrid Agile Contracting. Milestone allocation, sprint cadence, and deliverable definitions informed by six months of real engagement data.
That remains a perfectly valid choice. The dual-track phase ensures the decision is informed, not forced.
| Assumption | Impact If Not Met |
|---|---|
| Access to contract data within first 2 weeks | Discovery delayed; prioritisation based on incomplete picture |
| Cloud infrastructure and Databricks workspace access provisioned | Build phase cannot begin; team idle time |
| SSO/IAM integration available or scheduled within discovery | Workaround needed for access control |
| Named business stakeholders available 2-4 hrs/week | Decisions deferred; sprint velocity reduced |
| Security clearance guidance and sponsorship in first week | Team access to classified environments delayed |
| Existing governance and classification frameworks shared at start | Duplicate effort defining controls already in place |
| Exclusion | Clarification |
|---|---|
| Databricks platform build or management | We integrate; we do not own it |
| Data cleansing or migration | We work with data as provided; flag quality issues for Serco's data team |
| Custom hardware procurement | All delivery uses Serco-approved cloud infrastructure |
| Legal or regulatory advice | We identify requirements; Serco legal provides interpretation |
| Microsoft Copilot configuration | Separate programme |
| Penetration testing or formal security certification | We build to standards; formal testing is Serco's responsibility |
| Division | Key Regulations | Considerations |
|---|---|---|
| UK & Europe | UK GDPR, Data Protection Act 2018, EU AI Act | Majority of initial use cases; well-understood landscape |
| Middle East | Data localisation requirements (UAE, KSA, Qatar each distinct) | In-country hosting likely required |
| North America | ITAR (potential), PIPEDA, state privacy laws | US defence contracts may restrict model/hosting choices |
| Australia & NZ | Privacy Act 1988, Australian Government ISM | Five Eyes alignment simplifies some cross-border considerations |
Where subcontractors are used: disclosed and approved by Serco in advance, bound by same security and data handling requirements, cleared at the same level, covered by Blackstone&'s contractual commitments (single point of accountability), and no access without written Serco approval.
| Category | Ownership | Detail |
|---|---|---|
| Bespoke outputs | Serco | All deliverables created specifically for Serco — full ownership, unrestricted use |
| Methodology & frameworks | Blackstone& | Perpetual, royalty-free licence to Serco for internal use |
| Open-source components | Per licence terms | Identified, catalogued, licence-compatible. No copyleft contamination. |
| Item | Our Position |
|---|---|
| Uncapped liability provisions | Seek cap proportionate to contract value |
| Breadth of supplier IP licence | Clarify scope: bespoke outputs, not pre-existing IP |
| Termination notice period | Align with sprint cadence for orderly handover |
Standard negotiation points — these do not represent objections to Serco's Terms and Conditions.
Blackstone& operates as a remote-first, lean consultancy. Our delivery model minimises environmental impact by design:
We commit to procuring an EcoVadis Sustainability Assessment within 6 months of contract execution, as required under the Framework Agreement, and to sharing results via the EcoVadis portal.
Our engagement model directly supports Serco's social value objectives:
Blackstone& will procure an EcoVadis Sustainability Assessment within the contractual timeframe and work with Serco to meet or exceed the Minimum Rating Score. We will share all assessment results promptly via the EcoVadis portal and implement any required corrective actions within agreed milestones.