Business Functions · Internal Audit

AI in internal audit: the modern audit plan, powered by models

TL;DR

AI is reshaping every phase of the internal audit lifecycle — from risk scoping and sampling to fieldwork automation, anomaly detection, and continuous monitoring. This deep dive examines the use cases gaining traction, the vendor categories enabling them, and the questions audit leaders should be asking before they buy.

Deep Dive · Internal Audit

The audit plan is no longer just an annual document. AI is turning it into a living system.

Internal audit has long operated on a familiar rhythm: agree the universe of auditable entities, score risks using last year's data, select a sample, send fieldwork teams into the business, draft findings, and archive the workpapers. The cycle repeats annually — or quarterly for higher-risk areas. That rhythm served the function well when data was scarce and manual review was the only option. Neither constraint holds today.

AI tooling — ranging from classical machine learning applied to transaction logs, to Generative AI drafting audit procedures from control frameworks, to agentic AI orchestrating multi-step testing workflows — is compressing that cycle and expanding its coverage. The result is not a different audit methodology so much as the same methodology executed faster, at greater scale, and with more of the available evidence actually reviewed.

This page maps the use cases that are moving from pilot to production, the vendor categories enabling each phase, and the questions audit leaders should press in vendor evaluations. It is aimed at Chief Audit Executives, audit transformation leads, and the IT and data teams supporting them.

Why this intersection is under pressure now

Three converging pressures are pushing audit functions toward AI adoption faster than most functions have moved.

Coverage gaps are visible and documentable. Boards and audit committees increasingly receive dashboards showing what percentage of the auditable universe was covered in a given cycle. When that figure is low — and in large, complex organizations it often is — the function is exposed. AI-assisted continuous monitoring and automated testing raise that coverage number without requiring proportional headcount growth.

Regulatory expectations are rising. Financial services regulators in multiple jurisdictions have published guidance on model risk, third-party risk, and operational resilience that explicitly requires audit functions to opine on AI systems themselves. Audit teams that lack AI-capable tooling struggle to audit AI — a compounding problem.

Data availability has outpaced audit methodology. Enterprise data estates now contain structured transaction logs, unstructured communications, procurement records, HR data, and access logs at a scale no sample-based manual process can meaningfully cover. The gap between what the data could reveal and what manual audit actually reviews has become strategically significant.

Use cases across the audit lifecycle

The following use cases are organized by audit phase. Each describes what the AI does, what data it requires, the vendor category that addresses it, and the type of outcome a function can expect.

Risk assessment and audit universe scoping

Dynamic risk scoring. Machine learning models ingest financial, operational, and external signal data — ERP transaction volumes, headcount changes, regulatory filings, news sentiment — and re-score each auditable entity continuously rather than annually. Data required: ERP exports, HR records, external news feeds, prior audit findings. Vendor category: risk intelligence platforms with ML scoring engines. Outcome: audit plans that reflect current exposure rather than last year's judgment.
Audit universe gap detection. Natural language processing models parse process documentation, org charts, and prior workpapers to surface auditable areas that have never appeared in a formal plan or have not been reviewed in a defined window. Data required: process inventory documents, prior audit reports (unstructured text). Vendor category: document AI / knowledge extraction platforms. Outcome: meaningful reduction in coverage blind spots.
Third-party risk prioritization. Models score the vendor population on financial health, contract exposure, operational dependency, and public-source signals to flag which third parties warrant audit attention in a given cycle. Data required: contract management system exports, accounts payable data, vendor financial filings. Vendor category: third-party risk management platforms with AI scoring. Outcome: more targeted allocation of limited third-party audit capacity.

Sampling and fieldwork preparation

Risk-based population stratification. Rather than random or judgmental sampling, clustering and anomaly detection models stratify a transaction population by risk profile before a sample is drawn, ensuring outlier clusters receive proportional representation. Data required: full transaction population from ERP or sub-ledger. Vendor category: data analytics platforms with statistical sampling modules. Outcome: samples that are more likely to surface control failures than random selection.
Automated control walkthrough documentation. Large language models ingest process narratives, control matrices, and prior workpapers, then draft the control understanding and walkthrough procedure sections of the working paper. Auditors review and confirm rather than write from scratch. Data required: prior workpapers, control framework documentation. Vendor category: Generative AI writing assistants with audit-specific fine-tuning or prompt templating. Outcome: material reduction in fieldwork preparation time.
Intelligent document request management. AI models classify, route, and track documents received from the auditee against a structured request list, flagging missing items and flagging unexpected documents that may indicate undisclosed process changes. Data required: document request list, received document metadata. Vendor category: workflow automation platforms with document classification. Outcome: fewer fieldwork delays from incomplete evidence packages.

Testing and anomaly detection

Full-population transaction testing. Rather than sampling, AI models test 100% of a transaction population for defined control attributes — approval thresholds, segregation of duties violations, duplicate payments, round-dollar amounts, weekend processing. Data required: full ERP transaction log. Vendor category: continuous controls monitoring platforms. Outcome: population-level assurance rather than sample-based inference.
Journal entry anomaly detection. Models trained on the organization's own journal entry patterns flag entries that deviate statistically — unusual account combinations, entries posted outside business hours, preparer–approver patterns that suggest collusion risk. Data required: general ledger journal entry detail with metadata. Vendor category: AI-augmented forensic analytics or continuous controls monitoring. Outcome: higher detection rate for manual journal entry manipulation.
Access and identity anomaly flagging. Models monitor user access logs to flag accounts accessing systems outside normal hours, accounts with dormant patterns suddenly active, or privilege escalations that were not accompanied by a documented change request. Data required: identity and access management (IAM) logs, change management records. Vendor category: user and entity behavior analytics (UEBA) platforms. Outcome: earlier detection of access control failures and potential insider threat indicators.

Full-population testing changes the audit's epistemic position. When every transaction has been reviewed against defined criteria, the finding is a fact — not an inference from a sample.

— Xither editorial analysis

Reporting and issue management

AI-assisted finding drafting. Large language models ingest testing results, control criteria, and finding templates to produce draft audit observations — condition, criteria, cause, effect, recommendation. Auditors review, revise, and approve. Data required: testing results, control criteria documentation, prior finding language. Vendor category: Generative AI writing assistants with audit report templates. Outcome: faster report issuance without quality reduction.
Management response tracking and pattern analysis. NLP models monitor management action plans across the issues register, flag overdue items, and cluster recurring themes — identifying systemic root causes that span individual audit findings. Data required: issues management system, management response text. Vendor category: GRC platforms with AI analytics layers. Outcome: CAE-level insight into whether remediation is substantive or nominal.

Continuous monitoring

Always-on control environment monitoring. Agentic AI workflows — distinct from simple chatbots or copilots in that they execute multi-step tasks autonomously, querying systems, evaluating results, and escalating findings — run predefined control tests on a scheduled or triggered basis, surfacing exceptions to audit staff without manual intervention. Data required: live or near-real-time ERP and operational data feeds. Vendor category: continuous auditing platforms with agentic orchestration. Outcome: transition from annual point-in-time assurance toward a continuous coverage model.

Agentic AI — a clarification

Agentic AI refers to AI systems that plan and execute multi-step tasks autonomously — querying a database, evaluating the output, deciding whether to escalate, and logging the result — without a human approving each step. This is materially different from a copilot (which responds to a human prompt) or a chatbot (which answers questions). In audit, agentic workflows are most relevant to continuous monitoring and automated testing pipelines.

Vendor categories to evaluate

The AI-in-audit landscape is not served by a single platform. Audit leaders typically assemble a portfolio of capabilities across several categories. Understanding each category's boundaries helps avoid over-buying from one vendor and under-investing in another.

Vendor category	Core capability	Primary audit phase
Continuous controls monitoring (CCM) platforms	Automated testing of 100% transaction populations against defined control rules; exception alerting	Testing · Continuous monitoring
AI-augmented GRC platforms	Risk scoring, audit universe management, issues tracking with NLP-based pattern analysis	Risk assessment · Issue management
Generative AI writing assistants (audit-configured)	Drafting workpapers, procedures, findings, and reports from structured inputs	Fieldwork prep · Reporting
Document AI and knowledge extraction	Parsing unstructured documents — contracts, policies, prior reports — for structured data	Scoping · Fieldwork
User and entity behavior analytics (UEBA)	Access log monitoring, anomaly scoring, insider threat indicators	Testing · Continuous monitoring
Data analytics and visualization platforms (audit modules)	Population stratification, statistical sampling, ERP data querying and visualization	Sampling · Testing

Vendor categories relevant to AI-enabled internal audit. Most functions will use two to four of these categories simultaneously.

What to ask in vendor demos

Vendor demonstrations for audit AI tools tend to show the best case: clean data, a clearly defined control, an obvious anomaly. These questions pressure-test real-world performance.

Show me a false positive rate from a production deployment. Every anomaly detection system generates false positives. Ask how many flagged exceptions in a real deployment turned out to be immaterial or explainable — and how the system was tuned afterward. A vendor who cannot answer this is likely showing you a demo dataset.
How does your model explain its output to an auditor? Audit workpapers require documented rationale for conclusions. If the model flags a transaction as anomalous, can it produce a human-readable explanation that meets professional standards for evidence? Ask to see an actual model output, not a mockup.
What happens when my ERP data is messy? Real transaction data has missing fields, duplicate records, and inconsistent coding. Ask the vendor to demonstrate how their pipeline handles data quality issues — and whether exception handling is visible to the audit team.
How does the system handle changes to the control environment? Controls change. Thresholds are adjusted, new approval workflows are introduced, legacy systems are decommissioned. Ask how quickly the model or rule set can be updated, and whether that requires vendor involvement or can be managed by the audit team.
What access does your platform require to our systems, and how is that access governed? AI audit tools often require read access to sensitive financial and operational data. Understand the data residency model, the access control architecture, and how the vendor's own access is logged and auditable.
Can the platform produce output in a format that satisfies external auditor review? If internal audit results feed into the external audit process — as they increasingly do — the output format matters. Ask for examples of how workpaper documentation from the platform has been accepted by external auditors.
What is your model retraining cadence, and who controls it? For ML-based anomaly detection, the model's baseline matters. Ask how often the baseline is updated, whether retraining is triggered automatically, and whether the audit team can freeze a model version for the duration of a specific audit engagement.

Common pitfalls

Treating AI output as a finding rather than a signal. An anomaly score or a flagged transaction is the start of an audit procedure, not its conclusion. Functions that skip the human judgment step — reviewing the flagged item, understanding the business context, determining whether a control failure occurred — expose themselves to professional standards risk and to the reputational cost of false positive findings communicated to management.
Underinvesting in data readiness. AI audit tools are only as useful as the data they ingest. Functions that deploy continuous controls monitoring before establishing clean, consistent data feeds from source systems will spend most of their time managing data quality issues rather than generating audit insight. A data readiness assessment before tool selection is not optional.
Buying a platform for one use case and ignoring the rest. Several GRC and analytics vendors market broad AI-in-audit platforms but derive most of their customer value from a single capability — often CCM or risk scoring. Understand which capabilities are production-grade in the vendor's client base and which are roadmap features, and buy accordingly.
Neglecting the skills transition. AI tools change what auditors do, not whether auditors are needed. Audit teams that are not trained to interpret model output, challenge anomaly scores, and document AI-assisted conclusions will underutilize the tools and generate lower-quality workpapers. Change management and training investment is part of the implementation cost.
Failing to audit the AI. Once AI tools are embedded in the audit process, the audit function has a responsibility to assess whether those tools are operating as intended — whether the model is drifting, whether the rule set is still calibrated to the current control environment, whether outputs are being reviewed rather than rubber-stamped. The meta-audit obligation is real.

Best practice

Before any AI tool touches live audit data, document the human review checkpoints in your methodology. Professional standards — IIA, PCAOB, IAASB — do not currently allow AI to replace auditor judgment at key conclusions. Building the review layer into the workflow design from the start is easier than retrofitting it after a standards review.

Implications for the Chief Audit Executive

The strategic question for audit leadership is not whether to adopt AI — the coverage, speed, and analytical depth advantages are real enough that the function's credibility with the board will eventually depend on it. The question is sequencing.

Functions that start with continuous controls monitoring on a high-volume, well-structured data domain — accounts payable, payroll, access logs — build both capability and institutional confidence before extending to less structured use cases. Functions that start with Generative AI-assisted report drafting often see faster visible wins with lower data dependency, which can fund the slower data readiness work that CCM requires.

Either path works. What does not work is treating AI adoption as a technology project rather than a methodology change. The tools instrument the audit plan. The audit plan still reflects professional judgment about what matters most to the organization — and that judgment remains the CAE's job.

AI readiness checklist for internal audit functions

Data inventory complete: key source systems (ERP, IAM, payroll, procurement) mapped and data quality assessed
Control matrix documented in a structured format suitable for model ingestion
Professional standards review completed — documented human review checkpoints for AI-assisted conclusions
Vendor demo script built around real organizational data scenarios, not vendor-supplied demo datasets
Skills gap assessment done for audit team — training plan in place for model output interpretation
Audit committee briefing prepared on AI tool use in the audit process
Meta-audit plan defined — how will the audit function review its own AI tools annually?