As companies seek to improve their cybersecurity postures, they are increasingly using a variety of metrics, scoring systems, and reputational rankings to measure their efforts. But in many cases, businesses are asking too much of the various systems that attempt to measure security.
The old saw says that you need to measure something to manage it, but many systems that have flourished — from the Common Vulnerability Scoring System (CVSS) to organizational security posture scoring and ratings for software development projects — are sometimes only successful at expressing measurable risk. Yet corporate boards are turning some security measurements into key performance indicators (KPIs), and some industries — such as insurance firms — are using them to determine risk. Their conclusion: Scoring risk and reputation tools are imperfect but better than nothing.
Part of the reason is that companies look to manage risk, not just improve security, says Bruce Schneier, chief technology officer of Inrupt, a user-focused data management provider, and an adjunct lecturer at the Harvard Kennedy School. Schneier is critical of many attempts to measure security.
“Whenever I’ve had a company that could do it, I’ve always tried to build comparative metrics — how am I doing compared to everybody else that does this?” he says. “That does help. People do want to know how they compare to their peers, and that’s also good lawsuit protection.” They may say, “Yes, this is a problem, but look, everybody else is doing the same thing.”
From software and vulnerabilities to corporate security and human risk, efforts to assign scores and reputations to various components of the information technology ecosystem are growing. This week, detection and response platform Sweet Security inked a deal to use the early-stage startup Illustria to offer a package reputation service to detect risky changes to open source software packages. Providers of security posture ratings — such as Bitsight, SecurityScorecard, and UpGuard — have gained a following among cyber insurers, while human-risk management firms, such as Living Security and Mimecast, are increasingly assigning scores to users’ cybersecurity awareness.
Common Vexations of Scoring Security
CVSS — the standard way to grade potential criticality of software flaws — highlights many of the issues that continue to dog rating and reputation systems. CVSS allows security researchers and software companies to assess the basic severity of vulnerabilities using a 10-point scoring system, but organizations need to evaluate the vulnerabilities’ impacts in their own environments. This step that is often overlooked and gives critics significant fodder to attack the approach.
As a result, CVSS garners some praise but also a great deal of criticism. The scoring system is more like grading a high dive rather than tallying a baseball game, wrote Richard Brooks, co-founder and lead software engineer at consulting firm Business Cyber Guardian, in tepid defense of the system that often veered into criticism.
“It’s highly subjective and each party needs to decide for themselves if there is risk from a vulnerability, based on their own circumstances and the information known about the vulnerability and its exploitation methods,” he stated.
A major problem for any scoring systems is that security is often subjective and frequently amounts to proving a negative — a difficult application of metrics and scoring, says Inrupt’s Schneier.
Using scores to fuel checklists can help, he says. Checklists are used in environments where reliability is critical, such as airplanes, hospitals, and spacecraft. To some degree the software security community has pursued this approach, creating lists of vulnerabilities — such as the OWASP Top 10 and the CWE Top 25 lists — that are intended to focus remediation efforts.
“Checklists are a way to turn the unprovable negative into a demonstrable positive,” Schneier says. Yet we still have trouble creating metrics for security because “security is fundamentally not about capabilities. It’s not about functionality. It’s about denying functionality.”
Adoption by the Kings of Metrics (Insurers)
One group that’s hungry for scores and metrics is the insurance industry. Insurers aim to boil down events into data, and security events and cyberattacks are no different. Cyber insurers are increasingly collecting their own data to infer which products have good security and determine what to charge potential policyholders based on their use of those products.
Models that assign companies scores based on their observable cybersecurity posture, for example, can save insurance firms significant money by identifying the worst performers. Using information from Bitsight and internal data, for example, reinsurance firm Gallagher Re identified the bottom 20% of companies, which had a 3.17 times greater likelihood of suffering a loss — an approach that could reduce insurance firm losses by about 16%, the reinsurer stated in a 2024 study. A second study by professional services firm Marsh McLennan and Bitsight found that the lowest-scoring tier of companies were nearly five times more likely to have a cybersecurity incident than the highest-scoring tier.
A scoring system works only if companies are using it to reach their end goals (more security) rather than trying to just increase their scores (compliance), says Stephen Boyer, co-founder and chief technology officer at Bitsight.
“I do think that as long as it’s communicating something that drives an action that ends up being risk-reducing, that’s good,” he says. “If it’s a regulatory focus and [the company] is doing that to optimize the score and is not actually reducing risk, then it is a wasted effort for everybody.”
Unsurprisingly, more regulated industries tend to score higher on organizational ratings. Financial firms, utilities, energy, and healthcare all average a score of 720 or higher, while communications services average a score of a 630 and industrials a score of 690, according to a report on cybersecurity oversight of corporate boards.
Software Ratings Gain Traction
As software supply chain worries mount, companies and the open source community are aiming to rate the reputation and development processes of open source projects and assign scores to the components they produce. The OpenSSF Scorecard, for example, conducts a number of automated checks and ranks a project by a numerical score for each area, including whether the project has binary artifacts, whether the branch protection is on, the cadence of commits, and whether the project shows signs of using automated tools and fuzzers. The popular machine-learning library TensorFlow, for example, currently has an overall score of 8.2, with low scores for its Code Review practices and the failure to pin dependencies.
In some ways, we have too much data, and often it’s not the right data, says Dylan Thomas, senior director of product and engineering at IT conglomerate OpenText.
“Because there’s so much more data, the biggest challenge is understanding that we’re using it in an effective way and that we’re using the right data to draw the right conclusions, [so we don’t] misrepresent a particular data point or metric or scoring system,” he says. “It’s one of the reasons that LLM-based machine-learning algorithms really can provide a lot of value to augment security decision-making [and] can synthesize the vast amounts of data into potential patterns that we can actually make sense of.”
The Open Source Select service offered by software supply chain security firm Debricked, part of OpenText, uses ratings for the contributors, the popularity, and the security of open source components to summarize their practices using a scale of 1 to 100, assigning a traffic-light color to each component. TensorFlow, for example, received green ratings for its contributors (score: 73) and popularity (score: 84) but only a yellow rating (score: 42) for security.
The ratings are not necessarily a way to detect whether a software component is dangerous but a way to automate the approval and intake process for the proposed use of open source components, speeding up decision-making, Thomas says.
“The benefit is, as a developer, I’m not waiting weeks to work through an open source intake process,” he says. “I can quickly get a decision in a subset — and, hopefully, a meaningful subset — of use cases. Either quickly not waste my time going through a long process for a particular component or get green-lit very quickly.”
The question that companies should ask when they use metrics is whether those metrics are speeding up decision-making processes, and if not, why not.
“Part of what we need to do is make sure that we are not just measuring for the sake of measuring, but that we’re also taking time to measure the measuring stick,” Thomas says.