This is part of a series on rethinking ISO 27001 compliance from first principles. The previous article posed six auditor questions that challenge conventional thinking. This one asks: what happens when you take those questions and use them to build an audit programme that tests itself, continuously, structurally, and with a level of rigour that most internal audits never approach?
I once sat through an internal audit that lasted forty-five minutes.
The auditor (an internal resource, well-meaning, competent in their domain) opened a spreadsheet, asked twelve questions, ticked twelve boxes, and produced a report that said “no findings.” Clause 9.2 satisfied. Internal audit complete. See you next year.
The auditor hadn’t opened a single admin portal. Hadn’t checked whether the Conditional Access policies described in the access control policy actually existed. Hadn’t asked a single follow-up question. Hadn’t tested whether the risk assessment methodology produced consistent results when applied by different people, because only one person had ever applied it.
The report was technically correct: no findings were identified. But no findings were possible in a forty-five-minute review that never looked beyond the documentation layer. The audit tested whether the documents existed. It didn’t test whether the ISMS worked.
This is the internal audit problem that nobody discusses, not because people don’t know, but because the alternative seems impossibly expensive.
The Clause 9.2 paradox
ISO 27001 Clause 9.2 requires internal audits. The standard is specific about what they must achieve: audits shall be conducted at planned intervals, shall assess whether the ISMS conforms to the organisation’s own requirements and the standard’s requirements, and shall determine whether the ISMS is effectively implemented and maintained. Clause 9.2.2 goes further: auditors must be impartial and must not audit their own work.
Most organisations interpret this as “review all controls annually, document the results, keep the reports.” The minimum viable interpretation. And for good reason: rigorous internal auditing is expensive. It requires trained auditors, access to systems, time to investigate, and the organisational willingness to find things that are broken.
But here’s the paradox: the standard explicitly envisions that internal audits will find things. A programme that consistently reports zero findings isn’t evidence of a perfect ISMS. It’s evidence of an audit that isn’t looking hard enough. Findings that are identified, tracked through corrective action, and resolved are evidence of a functioning system. Zero findings is a red flag, not a green one.
The external auditor knows this. When they review your internal audit reports and see clean results across every control, their next question is: “How do I know your audits are rigorous?”
I’ve been building an answer to that question.
Scheduling as architecture
The first thing most internal audit programmes get wrong is scheduling. Controls are reviewed on a fixed annual cycle: Q1 gets access controls, Q2 gets operations, Q3 gets physical security, Q4 gets governance. The schedule is set during implementation and rarely revisited.
This approach has two problems. First, it treats all controls as equally important and equally stable. A.8.1 (endpoint devices) in a rapidly changing environment deserves more frequent attention than A.7.1 (physical security perimeters) in an office that hasn’t changed its layout in three years. Second, it disconnects the audit from the evidence. If your evidence is collected daily but your audit happens annually, you’re reviewing twelve-month-old conclusions with current data.
The audit programme I built generates quarterly schedules automatically, but the coverage isn’t uniform. Controls with recent evidence failures, open corrective actions, or compliance scores below threshold are scheduled more frequently. Controls that have been consistently compliant for multiple cycles are reviewed less often, but they’re never skipped entirely, because the standard requires full coverage over the audit cycle.
Each audit entry tracks the auditor, the scope (which specific controls are in the review), the audit type (technical, management, or compliance), planned and completed dates, and, critically, whether the auditor has declared independence from the controls being reviewed. That last element matters more than most programmes acknowledge: in a small team, the person who configured the Conditional Access policy shouldn’t be the one auditing it. The programme enforces this by tracking auditor assignments against control ownership and flagging conflicts.
Per-control conversations
Here’s where the programme diverges from anything I’ve seen in the compliance industry.
When an auditor starts reviewing a specific control (say A.8.5, Secure Authentication) they don’t open a checklist. They start a conversation. A chat session scoped to that specific control, backed by the full evidence corpus for A.8.5: the relevant policy sections, the risk analysis, and the current compliance snapshot.
The auditor asks questions in natural language. “What’s the current MFA coverage?” “Are there any service accounts excluded from Conditional Access?” “When was the last time someone reviewed the authentication strength assignments?”
The system retrieves structured evidence and produces answers with citations. Not generic knowledge: specific, current data. “As of the latest collection run, 98.7% of users have MFA enforced. Three service accounts are excluded via exception group EXC-SVC-ACCOUNTS, last reviewed on 15 February 2026. Authentication strength policy ‘Phishing-Resistant MFA’ was last modified on 3 January 2026.”
But the conversation doesn’t stop at retrieval. The system probes.
The qualifying question depth
This is the capability that changes the quality bar. The system doesn’t just answer questions; it asks them. Structured qualifying questions that probe deeper than most human auditors go, because the questions are compiled from hundreds of real audit interactions and calibrated by difficulty level.
For A.8.5 (Secure Authentication), the conversation might proceed:
“MFA is enforced for 98.7% of users. Is the authentication method phishing-resistant (FIDO2 keys, certificate-based auth), or is it app-based push notifications?”
The auditor answers. The system follows up:
“Your three excluded service accounts, are they protected by workload identity Conditional Access policies restricting them to specific IP ranges? Or do they authenticate with client secrets that could be used from any location?”
Then deeper:
“When was the last time you verified that a compromised session token couldn’t bypass your MFA policy through token replay? Does your Conditional Access configuration enforce token binding or continuous access evaluation?”
Each question is a qualifying gate. The first establishes whether the control exists. The second tests implementation quality. The third tests operational resilience. The fourth tests whether the control has been validated, not merely configured.
This is the rigour that distinguishes a Big 4 engagement from a compliance checkbox exercise. The difference is that this rigour is encoded in a system; it doesn’t depend on whether the auditor in the room happens to know about token binding or workload identity policies. The question bank contains 788 questions, each classified by difficulty level, tagged with specific clauses and controls, and annotated with the expected evidence pattern. The system draws from this bank based on the control being reviewed, the current evidence state, and the conversation context.
From conversation to observation
A conversation that probes but doesn’t record is just a chat. The audit programme converts conversations into structured observations automatically.
When the compliance reasoning agent produces an assessment during the conversation, it generates a structured output: a conformity classification (Conforming, Opportunity for Improvement, Minor Non-Conformity, or Major Non-Conformity), a confidence score, a justification citing specific evidence, the audit methods used (document review, testing, interview), and references to the evidence that supports the assessment.
This assessment appears as a draft observation, automatically populated but not automatically finalised. The auditor reviews the draft, confirms or modifies the conformity classification, adds their own notes if needed, and approves the observation. The system drafts; the human decides. That separation is deliberate: the AI handles recall and evidence assembly; the auditor provides judgment and accountability.
The confirmed observation is linked to the specific chat session that produced it, creating a complete audit trail: the questions asked, the evidence retrieved, the assessment generated, and the auditor’s decision. When the external auditor asks “show me the working for this observation,” the answer isn’t a summary someone wrote after the fact. It’s the actual conversation, with citations.
The auto-draft mechanism
For controls where evidence is fully automated (where compliance scores and corrective action data are current) the programme can generate draft observations without a conversation at all.
The logic is straightforward. If the compliance score is below 50 or there are critical open corrective actions, the draft assessment is Major Non-Conformity. If the score is between 50 and 80 or there are open corrective actions, it’s Minor. Between 80 and 100, it’s an Opportunity for Improvement. A score of 100 with no open corrective actions produces a Conforming assessment.
These drafts are generated for every in-scope control at the start of an audit cycle, giving the auditor a dashboard of suggested assessments before they begin reviewing. It’s a starting point, not a conclusion. The auditor can disagree with every draft, and the system records their override with the same audit trail rigour.
The value isn’t in the automation. It’s in the coverage. A human auditor reviewing 93 controls manually might spend most of their time on the first twenty and rush through the rest. Auto-drafted observations ensure every in-scope control gets at least an evidence-based starting assessment, and the auditor’s time is directed to the controls that need human investigation, the ones flagged as non-conforming, the ones where the evidence is ambiguous, the ones where the qualifying questions reveal gaps the score doesn’t capture.
The findings lifecycle
When an observation identifies a non-conformity, whether from conversation, auto-draft, or manual review, it becomes a finding. Findings have their own lifecycle: identification, root cause analysis, corrective action planning, remediation, and closure with evidence.
Each finding tracks its severity (Major, Minor, or Observation), the related control, a description of what was found, the root cause analysis, the planned corrective action with a due date, and, when resolved, closure evidence and resolution notes documenting what was done and who verified it.
Findings link to the corrective action system. When a non-conformity is identified in the audit programme, it can create a corresponding corrective action that enters the same closed-loop remediation workflow described in an earlier article: detection, ticketing, remediation, two-check verification, and auto-closure. The audit finding and the corrective action are cross-referenced, so the external auditor can trace from the finding through remediation to verified closure without switching systems.
This closed loop is what most internal audit programmes lack. They identify findings. They record them. They might track them in a spreadsheet. But they don’t connect findings to the remediation system that actually fixes them, and they certainly don’t provide evidence that the fix was verified through re-evaluation of the same evidence rules that identified the problem.
The paradox of findings, revisited
I want to return to the point about perfect results being suspicious, because the audit programme provides concrete evidence for this claim.
In the tenants I manage, the first full audit cycle using this programme produced findings. Not because the ISMS was failing (it wasn’t) but because the qualifying questions probed deeper than previous audits. They found service accounts with overly broad permissions, authentication policies that were configured but not validated against token replay, exception groups with review dates that had lapsed, and improvement opportunities in how management review outputs were documented.
Every one of those findings was legitimate. Every one led to a corrective action. Every corrective action was tracked through remediation and verified. And every verified closure is now evidence: evidence that the audit programme has teeth, that the organisation is willing to find imperfection, and that the corrective action process works.
When the external auditor asked “How do I know your internal audits are rigorous?”, the answer was a dashboard showing: 93 controls reviewed, 14 non-conformities identified, 21 opportunities for improvement documented, all tracked through corrective action, all with conversation transcripts showing the qualifying questions that surfaced them.
That’s not a clean audit report. It’s a functioning management system.
The question I’ll leave you with
When was the last time your internal audit found something that genuinely surprised you?
Not a procedural gap you already knew about. Not a documentation deficiency you’d been meaning to fix. Something the audit discovered: a risk you hadn’t considered, a control that was configured but not working, an assumption baked into your evidence that nobody had questioned.
If the answer is “never,” the audit isn’t testing your ISMS. It’s confirming your assumptions. And untested assumptions are the most dangerous kind of risk, the ones that don’t appear in any register because nobody thought to look.
The standard asks for internal audits that determine whether the ISMS is “effectively implemented and maintained.” Effectiveness requires testing. Testing requires questions that probe beyond documentation. Questions that probe beyond documentation require either an expert auditor with deep technical knowledge and the time to investigate, or a system that embodies that expertise and applies it consistently, across every control, every cycle.
I built the system because the expert auditor model doesn’t scale. But the system works because the expertise was captured first (question by question, control by control, qualification by qualification) before it was encoded. The 788 questions in the bank aren’t generic compliance prompts. They’re the residue of years of real audit interactions, distilled into a structure that can be applied repeatedly, consistently, and with a depth that no forty-five-minute checklist review will ever match.