The default answer in the market is a grade. A model reads a response, applies a mark scheme, and returns a number. That can look tidy in a demo. It is not enough for any marking operation that has to survive a candidate complaint, a regulator's audit, or simply the long-term trust of the people sitting the exam.

What a co-pilot delivers

Vision Marker marks every response that's sent to us. For each one, the API returns a mark, a structured rationale that traces back to the mark scheme criteria, a confidence score, and (where helpful) a draft feedback response in the client's voice. Across a cohort, we return the analytical layer the operation needs: distribution against the mark scheme, flagged anomalies, low-confidence responses surfaced for review, consistency signals across markers.

What the client does with that output is the client's decision. Some operations surface every mark to an examiner for sign-off. Some escalate only the responses below a configured confidence threshold. Some return feedback directly to the learner without intermediate review. Some batch for moderation. We do not dictate that, and we should not. The integrator knows their own stakes, their own appeals processes, their own regulators. Our job is to deliver the marks and the analytical apparatus that the operation needs in order to decide, with evidence, where its human oversight should live.

The "co-pilot" in Co-Pilot for Marking® is what we are to the client's marking operation. We do the marking and surface the signals. The client stays in command of what happens next.

What it isn't

Most "AI marking" in the market today is a model in a textbox. A response goes in, a grade comes out. The mark scheme is sometimes a system prompt. The audit trail is a generation log. The framing is "look how fast it is."

This is not a marking operation. It is a guess at a number, with no reliable way to explain how the number was reached, no practical way to show that repeated marking would stay within a controlled standard, and no audit trail a serious operation can lean on.

The issue is not capability. It is reproducibility, auditability, and trust. None of those come for free. They have to be designed in from the first request, not bolted on after the model has spoken.

Why the framing matters for the people integrating it

If you are building a learning platform, running a certification body, or scaling a training programme, the question is not whether AI marking can be made to work for the simplest case. It can. The question is whether the marking will hold up to the case that ends up in front of an appeals panel.

Co-Pilot for Marking® is built around that case. Every mark traces back to the mark scheme criteria it was assessed against. Every confidence score shows where the model was more confident and where it was not. The marking path is built to be repeatable under controlled conditions, with a stable record of the response, criteria, rationale, and confidence behind the result. Every output is explainable: the rationale comes back with the mark, not as an afterthought. When something is challenged, the operation can show exactly how the decision was reached.

That is slower to demo. It is harder to market. It is less impressive in a thirty-second video. We can live with that.

Where this position comes from

Vision Marker is built by a team of two: co-founders Barry (35 years of teaching and 25 as a Chief Examiner, marking and standard-setting for awarding bodies that run national qualifications) and James (PhD in Machine Learning), a father-and-son team. Co-Pilot for Marking® is the product of years of arguments about what marking actually demands and what AI can be made to do about it.

Over a million scripts marked for clients later, Vision Marker works with national Ministries of Education and integrates with leading international awarding bodies. We sponsor the AI in Marking Award at the 2026 International e-Assessment Awards.

What this looks like as an integration

Vision Marker is a white-label API. You send us a response and a mark scheme. We return a mark, a structured rationale, a confidence score, cohort-level signals, and (where helpful) a draft feedback response in your voice. Your platform decides what to do with that output. We do not see and do not need to see what your downstream workflow looks like.

If you are building a platform, running a certification body, or thinking about how to handle written-response marking at scale, please get in touch.