← Back to knowledge base

Measuring what counts

Why awareness programmes mostly report participation instead of knowledge, behaviour and risk reduction, how the research says you should measure it, and what logical place engaging formats such as games and escape rooms deserve.

Recently updated

From insight to action

See how to turn this topic into a practical awareness program with training, phishing simulations and clear management reporting.

Founder & Security Awareness Specialist · 2LRN4

In short

  1. Most awareness programmes report what is easiest to count: participation and completion. But completion measures activity, not outcome. An awareness programme exists to reduce the security and privacy risks in which employee behaviour plays a part, and that risk does not fall simply because someone finishes a module or plays a game.
  2. Knowledge, attitude and behaviour are three different things, and you measure them differently too. A knowledge test measures knowledge, not behaviour. The research puts that gap in numbers. Training raises knowledge roughly three times as strongly as behaviour (effect size d ≈ 1.02 against d ≈ 0.36). Measure only participation or knowledge and you miss precisely what counts.
  3. Behaviour can be measured, and at scale too: not with a three-hour game for thirty people, but with observable signals such as how people report suspicious messages, broken down by risk group and reported in the language of risk. Engaging formats such as games and escape rooms deserve a clear place in that, as an engine for engagement and conversation, not as proof that risk has fallen.

Recently I watched the presentation of an awareness game that an organisation of nearly eight thousand employees had built itself. It looked beautiful and a great deal of work had clearly gone into it. Groups of around thirty employees play it over three hours, and along the way exactly the conversation about security and privacy you want takes shape. The boards with the questions also hang in various spots around the building, so you solve one now and then as you pass by. A fine initiative, and one that should stay just as it is.

And yet a question lingered. Thirty employees per session is, out of nearly eight thousand, less than half a percent. Reach everyone that way and you are at it for years, and the same goes for the physical escape rooms, the cyber trucks and the roadshows that keep popping up everywhere. A second question follows. Can a game like that also measure whether behaviour changes? Because that is ultimately why an awareness programme exists: to reduce the security and privacy risks in which employee behaviour plays a part.

Behavioural science sums that task up in three conditions. To change behaviour you need capability, motivation and opportunity at the same time (Michie, van Stralen and West, 2011). Too often a programme starts with an e-learning or a presentation, and stops there as well. That way you do something about knowledge, but nothing about opportunity and motivation. And when the time comes to report, it is no longer even about knowledge, but about participation. This report is a literature study into the measuring and reporting of security awareness. It does not want to stop the fine initiatives, but to give them a logical place, because awareness is allowed to be fun. The only question is: what do you measure, and what do you report, so that you steer on risk and not on turnout?

About this study

Type
Literature study based on peer-reviewed research, supplemented with authoritative standards and industry figures.
Sources
Kirkpatrick's evaluation model, the ISO/IEC 27001:2022 and 27002:2022 standards, validated measurement instruments (KAB, HAIS-Q, SeBIS), empirical work on the knowledge-behaviour gap, a meta-analysis of 69 training studies, a large-scale phishing study among 14,733 employees, systematic literature reviews on gamification, and figures from ENISA, SANS and providers of awareness training.
Cut-off date
June 2026.
Main question
How do you measure and report security awareness so that you steer on risk reduction rather than on participation, and what place do engaging formats such as games and escape rooms have in that?

Sub-questions

  1. What do organisations actually report about their awareness programme in practice, and why that in particular?
  2. Which different things can you measure, and with which instruments?
  3. Why does measuring knowledge, or even participation, say too little about behaviour?
  4. Can behaviour be measured, and on the scale of an entire organisation at that?
  5. What place do engaging, small-scale formats deserve within a measurable programme?
  6. How do you set up your measuring and reporting so that it steers on risk?

01 · FindingWhat we report is participation, not outcome

The standard model for evaluating training comes from Kirkpatrick (1959, collected in Kirkpatrick and Kirkpatrick, 2006) and has four levels: how participants react, what they have learned, whether their behaviour changes, and what results that delivers for the organisation. Those four levels form a ladder. The higher you climb, the more a measurement says about the real effect. In security awareness, however, most programmes get stuck on the bottom rung. A review of evaluation practice shows that organisations rely above all on easy measures close to level one, such as completion rates and satisfaction scores, while they rarely measure at the level of behaviour or results (Jayatilaka et al., 2021). Studies that do reach the behaviour level, such as that of Khan and colleagues (2023), are the exception that proves the rule.

That preference is understandable, and for that reason dangerous. A learning platform delivers the completion rate at the touch of a button, whereas measuring behaviour costs a study design, time and money. The number that is easiest to produce thus becomes, of its own accord, the number you report. The problem is that this number measures activity and not outcome. A completion rate of nearly a hundred percent feels like a win, but it is not the win you are after. In The participation paradox we already showed that a mandate reliably pushes participation up without behaviour following along of its own accord. So the completion rate says something about how many people went through the training, and almost nothing about whether the organisation became safer.

The number that is easiest to produce becomes, of its own accord, the number you report. And that number measures activity, not outcome.

The heart of the reporting problem

That awareness programmes stay stuck on this bottom rung is, moreover, no isolated case but a widespread pattern. The annual industry survey by SANS (2024), based on more than a thousand professionals across over seventy countries, shows that many programmes remain stuck in a phase of compliance and awareness, where it is all about attendance and being aware, and do not grow on into a phase where behaviour and culture take centre stage. The result is reporting that reassures the board without saying anything about the real risk.

By now the standard is catching up with this too. ISO/IEC 27001 requires, in clause 9.1, that the effectiveness of controls be evaluated, and with the 2022 revision awareness also became a standalone control (6.3) in the underlying ISO/IEC 27002, one that turns on staff who demonstrably behave in line with policy and not on a one-off training session. In audit practice that sets a harder bar. An attendance list or a completion rate no longer counts as sufficient evidence, precisely because it says nothing about effectiveness. With that, the bottom of the bar shifts. Where you once met the control with attendance and a finished e-learning, the standard since the transition to ISO 27001:2022, which for existing certificates ran out at the end of October 2025, asks for evidence that behaviour and knowledge have genuinely changed. A programme that stays on the bottom two rungs does not deliver that evidence.

The evaluation ladder Kirkpatrick's four levels; most programmes stay on the bottom rung 1 · Participation reaction and completion "how many people took part" 2 · Knowledge what was learned "do they know it now" 3 · Behaviour what they do differently "do they act more safely" 4 · Risk result for the organisation "does the risk fall" this is where most programmes stall this is what to steer on easy to count harder to measure, but what counts

Figure 1 Kirkpatrick's evaluation ladder with four levels: participation (reaction), knowledge (learning), behaviour and risk (result). The higher you climb, the harder the measuring becomes and the more it says. Most awareness programmes report on the bottom rung. Based on Kirkpatrick and Kirkpatrick (2006).

02 · DistinctionKnowledge, attitude and behaviour are not the same, and you do not measure them the same way

Anyone who wants to climb higher up the ladder soon discovers that awareness is not a single thing but a composite of three: what someone knows, how someone feels about it and what someone actually does. The research captures this in the so-called KAB model of knowledge, attitude and behaviour, a triptych we used earlier in The difference between security awareness and privacy awareness. Kruger and Kearney (2006) supplied the first practical scoring model for this, with which you express awareness not as a feeling but as a measurable, weighted score. The key insight from that model is that the three components can move independently of one another. Someone can know the rules, think them sensible and still not behave accordingly.

Since then, validated instruments have measured these things reliably. The best known is the Human Aspects of Information Security Questionnaire, or HAIS-Q for short, which we also drew on in The vulnerable first months and which maps knowledge, attitude and self-reported behaviour across various themes (Parsons et al., 2014). In a second validation round the researchers also showed that higher scores on the questionnaire went hand in hand with better performance in a real phishing simulation, which gives the instrument a predictive value that a homemade quiz lacks (Parsons et al., 2017). For the behaviour side, Egelman and Peer (2015) developed the Security Behavior Intentions Scale, or SeBIS for short, a validated scale that measures the intention to behave securely along four dimensions, from updating software to handling passwords.

The practical lesson is as simple as it is uncomfortable. Proven instruments exist to measure beyond participation, so an organisation that wants to measure awareness seriously does not have to start from scratch. A homemade knowledge test at the end of the e-learning measures at most whether someone can still reproduce the information just shown. It measures no attitude, no behaviour, and you cannot compare it over time or between departments. The difference between measuring and measuring well is precisely the difference between such a test question and a validated instrument.

Three things, three ways of measuring A knowledge test measures knowledge; for behaviour you need a different measure Knowledge knowing knowledge test HAIS-Q (knowledge part) easy to measure Attitude thinking HAIS-Q (attitude part) SeBIS intention via validated questionnaire Behaviour doing observable signals reporting, clicking the measure that counts the knowledge-behaviour gap Training raises knowledge roughly three times as strongly as behaviour. Effect size d ≈ 1.02 on knowledge against d ≈ 0.36 on behaviour. Based on Prümmer et al. (2024).

Figure 2 Knowledge, attitude and behaviour are three things, each with its own way of measuring. Knowledge and attitude you measure with validated questionnaires, behaviour with observable signals. The meta-analysis by Prümmer and colleagues shows how far knowledge and behaviour diverge.

03 · ExplanationWhy measuring knowledge is not enough

Between knowing and doing yawns a gap that the research gave a name of its own: the knowing-doing gap. It is not a chance observation, but a repeatedly demonstrated phenomenon. Workman, Bommer and Straub (2008) and Cox (2012) gave the gap a theoretical structure and showed that people who know the threat and understand the protective measure still often fail to take it. More recent work confirms that picture. Zwilling and colleagues (2020) found, across four countries, ample knowledge of the threat but scarcely any protective behaviour, and Lee and Chua (2023) showed that knowledge does not predict behaviour directly, but only through intervening factors. The message for anyone who wants to measure is hard. A knowledge test measures knowledge, and knowledge does not predict behaviour of its own accord.

How large that gap is emerges most sharply from a meta-analysis of sixty-nine studies into the effect of cybersecurity training (Prümmer, van Steen and van den Berg, 2024), which we also drew on in our report on phishing simulations. Training raises knowledge considerably, with a large effect size of around 1.02, but behaviour far less, with a weak effect size of around 0.36. Put differently: the average training lifts knowledge roughly three times as strongly as behaviour. Measure only knowledge and you report precisely the thing that moves most and says least about the risk.

That the gap is not about a lack of information is shown mercilessly by a large-scale field measurement. In Proofpoint's annual phishing study (2024), seventy-one percent of users took a risky action, and ninety-six percent of them knew it was risky. More information would have changed little here, because the knowledge was already there. This explains why a programme that bets on knowledge alone stays stuck. The behaviour model COM-B sums up why. Behaviour only arises when capability, motivation and opportunity come together (Michie, van Stralen and West, 2011). An e-learning delivers the capability, that is, the knowledge, but the motivation and the opportunity have to come from the organisation itself. As we described in The buy-in problem, it is precisely that opportunity, the time during working hours, the budget and the priority, that lies almost entirely in the hands of management. A knowledge test measures none of this.

04 · FindingBehaviour can be measured, and at scale

Here the two questions from the start come together, the one about measuring and the one about scale. A three-hour game for thirty people is a wonderful experience, but not a measurement instrument, and it never reaches a whole organisation. Fortunately, you can measure behaviour at scale, with signals the organisation gives off of its own accord during ordinary work. The best-known example is the phishing simulation, provided you look at the right number. The large-scale study by Lain, Kostiainen and Capkun (2022), carried out among 14,733 employees over a period of fifteen months, shows two things. First, a low click rate per email of barely six percent conceals that nearly a third of staff click at least once over that period. And second, training at the moment of clicking did not improve resilience, whereas the collective reporting of suspicious messages did deliver a durable and usable signal.

With that, attention shifts from the click rate to the reporting rate, the share of employees who actively report a suspicious message. That is a better measure, because reporting goes further than being careful. Whoever reports protects not only themselves but warns the whole organisation. The click rate itself remains a murky measure, because whether someone clicks depends heavily on their attention at that moment and on the circumstances, and not only on what someone is capable of. In Proofpoint's study (2024) the average reporting rate stood at around nineteen percent against a failure rate in the simulation of just over nine percent, that is, a resilience factor of about two. That kind of number belongs to the upper rungs of the ladder, because it measures what people do and not whether they were present. The phishing simulation is, moreover, not the only behavioural signal. The number of reported incidents and the number of unsafe actions that become visible in systems, such as a data loss prevention rule that fires, also tell you something about what people really do. A peer-reviewed framework for measuring awareness underlines that behavioural metrics in particular, unlike participation figures, correlate with actual risk reduction (Chaudhary, Gkioulos and Katsikas, 2022), and ENISA (2021) too explicitly counts the measuring of behaviour among the core of an awareness strategy.

A game is an excellent engine for engagement and conversation. It is just not a measurement instrument, and it does not lower the risk of eight thousand people.

On the place of engaging formats

And with that the fine initiatives get their logical place. The research into gamification and serious games is strikingly unanimous. Engaging formats reliably raise engagement, motivation and knowledge in the short term, but on lasting behaviour change little has yet been demonstrated. Gwenhure and Rahayu (2024) find mainly short-term effects, Ng and Hasan (2025) show that many games reach precisely the employees who are already skilled, which undercuts the promise of reach, and Chen and colleagues (2023) demonstrate most sharply that fun and immersion raise awareness, but do not translate into behavioural intention of their own accord. Translated into the COM-B model, a game, an escape room or a cyber truck deliver mainly motivation and part of the knowledge, and that is valuable. They deliver no opportunity, no scale and no measurement of behaviour. So measure a format like that on what it does do: on reach, experience and the quality of the conversation, and not as though an intervention for thirty people should lower the risk of eight thousand. That this is no luxury problem is clear from the same SANS survey (2024), in which a lack of time and staff is the most cited obstacle and the most mature programmes soon take more than four full-time roles. That is exactly why you deploy your time and people only once, and the scalable behaviour measurement must not give way to the finest, but smallest, intervention.

Reach against measurability of behaviour Where the engaging formats sit, and where the scalable behaviour measure sits Reach within the organisation low high Measurability of behaviour low high Game & escape room high experience, small reach E-learning + knowledge test wide reach, measures mainly knowledge Validated questionnaire HAIS-Q, attitude and intention Incident reports & DLP signals derived from work data Phishing simulation (click rate) also measures attention, not pure behaviour Reporting rate cleanest behaviour signal

Figure 3 Measurement formats placed by reach (horizontal) and the degree to which they make behaviour measurable (vertical). Games and escape rooms score high on experience but low on reach and behaviour measurement; an e-learning and a validated questionnaire reach everyone but measure mainly knowledge and attitude; the click rate of a phishing simulation also measures attention and circumstances, not pure behaviour, while the reporting rate and signals from everyday work data, such as incident reports and data loss prevention, give the cleanest, scalable behaviour signal.

05 · ApproachHow to measure and report

The preceding findings translate into a measuring and reporting approach that steers on risk without throwing the engaging formats overboard. Five steps.

  1. Climb the evaluation ladder and do not stay on rung one.Measure deliberately at all four of Kirkpatrick's levels: participation, knowledge, behaviour and ultimately risk. Participation remains useful as a process measure, because without reach nothing changes, but never make it the outcome you report. For every number, ask yourself first: does this measure activity or outcome? Since ISO 27001:2022 the standard expects this too, namely evidence of effectiveness and not just of participation.
  2. Measure knowledge and attitude with a validated instrument.Use a proven measurement instrument such as the HAIS-Q and not a homemade quiz at the end of the e-learning (Parsons et al., 2017; Kruger and Kearney, 2006). Then your scores mean something, you compare them over time and between departments, and you see movement rather than a snapshot.
  3. Measure behaviour with observable signals, not with self-reporting alone.The most usable measure is how people report suspicious messages, not the bare click rate (Lain et al., 2022; Proofpoint, 2024). A rising reporting rate shows that people are not only careful but also take action, and that is exactly why the programme exists.
  4. Report in the language of risk, and per risk group.Break the figures down by department and role, and link them to the residual risk that the board must be able to weigh under its duty of care (see The buy-in problem). An organisation-wide completion rate of nearly a hundred percent hides precisely the one department where the real risk sits.
  5. Give the engaging formats their own place.A game, an escape room or a cyber truck is a strong engine for engagement, conversation and culture. So measure that format on what it delivers, namely reach, experience and the quality of the conversation. Keep engagement and behaviour change apart, and do not judge an intervention for thirty people on the risk of eight thousand. That way it stays fun and the story keeps holding up.
The measuring and reporting route Five steps to steer from turnout to risk 1 Up the ladder measure at four levels, not participation alone 2 Validated knowledge and attitude via the HAIS-Q, no quiz 3 Measure behaviour reporting rate instead of the click rate alone 4 Risk language per risk group, linked to residual risk 5 Format in its place game and escape room as engine of engagement What you report determines what the organisation values. Report participation and you get participation; report behaviour and risk and you get steering.

Figure 4 The measuring and reporting route in five steps: from the evaluation ladder, via a validated instrument and a real behaviour measure, to reporting in the language of risk, with the engaging formats in their own place.

06 · ConclusionMeasure what counts

In our report on phishing simulations we concluded that the problem does not lie with the employee, and in The buy-in problem that it usually does not lie with the e-learning either. This report adds a third insight. The problem often lies in what we measure and report. An awareness programme exists to reduce the security and privacy risks in which employee behaviour plays a part, but as long as we report participation, we steer on turnout. And what you report determines what the organisation values. Report participation and you get participation; report behaviour and risk and you get steering.

Fortunately it can be done better, without becoming duller. You measure behaviour at scale with signals the organisation gives off itself, you measure knowledge and attitude with instruments that have proven their worth, and you report in the language of risk, precisely the language a board is at home in. And the beautiful game for those thirty employees? It deserves a place, and keeps it too. Not as proof that the risk has fallen, but as what it really is: an infectious engine for engagement and the good conversation. Awareness is allowed to be fun. It just also has to be measurable.

Limitations

  • This report is a literature study that summarises existing research and contains no new research of its own.
  • Several of the cited measurement instruments, such as the HAIS-Q and SeBIS, measure self-reported knowledge, attitude and intention. They are validated, but remain self-reporting and do not observe behaviour directly.
  • Behavioural measures such as the reporting rate and the click rate approximate the real risk, but do not capture it fully.
  • The cited figures from Proofpoint, KnowBe4 and SANS come from providers of awareness training and may therefore be coloured; they give an impression of the order of magnitude, but provide no independent evidence.
  • The research into gamification and serious games shows mainly short-term effects on engagement and knowledge; about lasting behaviour change little is yet known.

Sources

  1. Chaudhary, S., Gkioulos, V., en Katsikas, S. (2022). Developing metrics to assess the effectiveness of cybersecurity awareness program. Journal of Cybersecurity, 8(1), tyac006. doi.org/10.1093/cybsec/tyac006
  2. Chen, H., Zhang, Y., Zhang, S., en Lyu, T. (2023). Exploring the role of gamified information security education systems on information security awareness and protection behavioral intention. Education and Information Technologies, 28(12), 15915–15948. doi.org/10.1007/s10639-023-11771-z
  3. Cox, J. (2012). Information systems user security: A structured model of the knowing–doing gap. Computers in Human Behavior, 28(5), 1849–1858. doi.org/10.1016/j.chb.2012.05.003
  4. Egelman, S., en Peer, E. (2015). Scaling the Security Wall: Developing a Security Behavior Intentions Scale (SeBIS). Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 2873–2882. doi.org/10.1145/2702123.2702249
  5. ENISA (2021). Raising Awareness of Cybersecurity: A Key Element of National Cybersecurity Strategies. European Union Agency for Cybersecurity. enisa.europa.eu/publications/raising-awareness-of-cybersecurity
  6. Gwenhure, A. K., en Rahayu, F. S. (2024). Gamification of Cybersecurity Awareness for Non-IT Professionals: A Systematic Literature Review. International Journal of Serious Games, 11(1), 83–99. doi.org/10.17083/ijsg.v11i1.719
  7. ISO/IEC (2022). ISO/IEC 27001:2022 — Information security, cybersecurity and privacy protection — Information security management systems — Requirements (clausule 9.1). Genève: International Organization for Standardization. iso.org/standard/27001
  8. ISO/IEC (2022). ISO/IEC 27002:2022 — Information security, cybersecurity and privacy protection — Information security controls (beheersmaatregel 6.3). Genève: International Organization for Standardization. iso.org/standard/75652
  9. Jayatilaka, A., Beu, N., Baetu, I., Zahedi, M., Babar, M. A., Hartley, L., en Lewinsmith, W. (2021). Evaluation of Security Training and Awareness Programs: Review of Current Practices and Guideline. arXiv:2112.06356. arxiv.org/abs/2112.06356
  10. Khan, N. F., Ikram, N., Murtaza, H., en Javed, M. (2023). Evaluating protection motivation based cybersecurity awareness training on Kirkpatrick's Model. Computers & Security, 125, 103049. doi.org/10.1016/j.cose.2022.103049
  11. Kirkpatrick, D. L., en Kirkpatrick, J. D. (2006). Evaluating Training Programs: The Four Levels (3e druk). San Francisco: Berrett-Koehler. (1e druk 1994; oorspronkelijke artikelserie 1959–1960.)
  12. KnowBe4 (2024). 2024 Phishing by Industry Benchmarking Report. blog.knowbe4.com/knowbe4-2024-phishing-by-industry-benchmarking-report
  13. Kruger, H. A., en Kearney, W. D. (2006). A prototype for assessing information security awareness. Computers & Security, 25(4), 289–296. doi.org/10.1016/j.cose.2006.02.008
  14. Lain, D., Kostiainen, K., en Capkun, S. (2022). Phishing in Organizations: Findings from a Large-Scale and Long-Term Study. 2022 IEEE Symposium on Security and Privacy (SP), 842–859. doi.org/10.1109/SP46214.2022.9833766
  15. Lee, C. S., en Chua, Y. T. (2023). The Role of Cybersecurity Knowledge and Awareness in Cybersecurity Intention and Behavior in the United States. Crime & Delinquency, 70(9), 2250–2277. doi.org/10.1177/00111287231180093
  16. Michie, S., van Stralen, M. M., en West, R. (2011). The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implementation Science, 6:42. doi.org/10.1186/1748-5908-6-42
  17. Ng, C. Y., en Hasan, M. K. B. (2025). Cybersecurity serious games development: A systematic review. Computers & Security, 150, 104307. doi.org/10.1016/j.cose.2024.104307
  18. Parsons, K., McCormac, A., Butavicius, M., Pattinson, M., en Jerram, C. (2014). Determining employee awareness using the Human Aspects of Information Security Questionnaire (HAIS-Q). Computers & Security, 42, 165–176. doi.org/10.1016/j.cose.2013.12.003
  19. Parsons, K., Calic, D., Pattinson, M., Butavicius, M., McCormac, A., en Zwaans, T. (2017). The Human Aspects of Information Security Questionnaire (HAIS-Q): Two further validation studies. Computers & Security, 66, 40–51. doi.org/10.1016/j.cose.2017.01.004
  20. Proofpoint (2024). 2024 State of the Phish. proofpoint.com/us/resources/threat-reports/state-of-phish
  21. Prümmer, J., van Steen, T., en van den Berg, B. (2024). Assessing the effect of cybersecurity training on end-users: A meta-analysis. Computers & Security, 150, 104206. doi.org/10.1016/j.cose.2024.104206
  22. SANS Institute (2024). 2024 Security Awareness Report: Embedding a Strong Security Culture. sans.org/security-awareness-training/resources/reports/sar
  23. Uchendu, B., Nurse, J. R. C., Bada, M., en Furnell, S. (2021). Developing a cyber security culture: Current practices and future needs. Computers & Security, 109, 102387. doi.org/10.1016/j.cose.2021.102387
  24. Workman, M., Bommer, W. H., en Straub, D. (2008). Security lapses and the omission of information security measures: A threat control model and empirical test. Computers in Human Behavior, 24(6), 2799–2816. doi.org/10.1016/j.chb.2008.04.005
  25. Zwilling, M., Klien, G., Lesjak, D., Wiechetek, Ł., Cetin, F., en Basim, H. N. (2020). Cyber Security Awareness, Knowledge and Behavior: A Comparative Study. Journal of Computer Information Systems, 62(1), 82–97. doi.org/10.1080/08874417.2020.1712269
Next step

Use this article as the foundation and then see how 2LRN4 turns this topic into audience segmentation, training and reporting.