The default answer in the market is a grade. A model reads a response, applies a mark scheme, and returns a number. That is a perfectly serviceable demo. It is also a terrible answer for any marking operation that has to survive a candidate complaint, a regulator's audit, or simply the long-term trust of the people sitting the exam.
We mean it as design.
What a co-pilot delivers
Vision Marker marks every response that's sent to us. For each one, the API returns a mark, a structured rationale that traces back to the mark scheme criteria, a confidence score, and (where helpful) a draft feedback response in the client's voice. Across a cohort, we return the analytical layer the operation needs: distribution against the mark scheme, flagged anomalies, low-confidence responses surfaced for review, consistency signals across markers.
What the client does with that output is the client's decision. Some operations surface every mark to an examiner for sign-off. Some escalate only the responses below a configured confidence threshold. Some return feedback directly to the learner without intermediate review. Some batch for moderation. We do not dictate that, and we should not. The integrator knows their own stakes, their own appeals processes, their own regulators. Our job is to deliver the marks and the analytical apparatus that the operation needs in order to decide, with evidence, where its human oversight should live.
The "co-pilot" in Co-Pilot for Marking® is what we are to the client's marking operation. We do the marking and surface the signals. The client stays in command of what happens next.
What it isn't
Most "AI marking" in the market today is a model in a textbox. A response goes in, a grade comes out. The mark scheme is sometimes a system prompt. The audit trail is a generation log. The framing is "look how fast it is."
This is not a marking operation. It is a guess at a number, with no way to explain how the number was reached, no way to reproduce it next week, and no way to demonstrate that the same response would receive the same mark twice. There is nothing to audit because there is no audit trail.
The issue is not capability. It is reproducibility, auditability, and trust. None of those come for free. They have to be designed in from the first request, not bolted on after the model has spoken.
Why the framing matters for the people integrating it
If you are building a learning platform, running a certification body, or scaling a training programme, the question is not whether AI marking can be made to work for the simplest case. It can. The question is whether the marking will hold up to the case that ends up in front of an appeals panel.
Co-Pilot for Marking® is built around that case. Every mark traces back to the mark scheme criteria it was assessed against. Every confidence score is honest about where the model was sure and where it was not. Every output is reproducible: the same response and the same mark scheme produce the same result, every time. Every output is explainable: the rationale comes back with the mark, not as an afterthought. When something is challenged, the operation can show exactly how the decision was reached.
This is slower to demo. It is harder to market. It is considerably less impressive in a thirty-second video. We are fine with that.
Where this position comes from
Vision Marker is built by a team of two: co-founders Barry (35 years of teaching and 25 as a Chief Examiner, marking and standard-setting for awarding bodies that run national qualifications) and James (PhD in Machine Learning), a father-and-son team. Co-Pilot for Marking® is the product of years of arguments about what marking actually demands and what AI can be made to do about it.
Over a million scripts marked for clients later, Vision Marker works with national Ministries of Education and integrates with leading international awarding bodies. We sponsor the AI in Marking Award at the 2026 International e-Assessment Awards.
What this looks like as an integration
Vision Marker is a white-label API. You send us a response and a mark scheme. We return a mark, a structured rationale, a confidence score, cohort-level signals, and (where helpful) a draft feedback response in your voice. Your platform decides what to do with that output. We do not see and do not need to see what your downstream workflow looks like.
If you are building a platform, running a certification body, or thinking about how to handle written-response marking at scale, please get in touch.