Mandatory annual security training has become ubiquitous for IT professionals. The security policies define the expected behavior of associates in security-relevant situations, and are backed by various levels of disciplinary action for non-compliance.
Why is it so hard to get people to follow these policies? In many cases it is because the unwritten policies embedded in the corporate culture carry a higher cost politically or personally to the employee asked to carry them out.
When policies conflict with project realities in such a way that there is no “right” action, people will take the path of least harm first to themselves, then to the company. The unwritten rules of the company’s culture create systemic incentives that shape the path those non-compliant actions take.
If we understand the motivations for non-compliant actions, it is possible to reverse-engineer the cultural incentives in order to fix them. In my experience, this is far more important to preventing security breaches than the technical controls.
Patch Un-Management
One of my past clients had a large messaging network with many nodes on back-levels of software. The projected cost estimates to upgrade the entire network ranged from $3.5M to $4M. To mitigate operational risk, extended support was purchased from the vendor. But most of the back-level nodes were so old they were ineligible for support. An inventory found that about $1.5M of the yearly extended support bill was wasted on ineligible nodes.
In this case, the incentive program that rewarded waste reduction was available to business analysts but not line-level support people. There had been several cases in which the vendor declined to provide support, and it was an open secret among the front-line support team that these servers were orphaned. Much of the work of patching fell on the same operations team, but they operated strictly on triage basis and patching was always a low priority.
The incentives shaping the behavior of the Operations team were clearly communicated to associates during performance reviews. Bonuses were based on clearing requests as quickly as possible and keeping the backlog of work to a minimum. Adding a huge bolus of patch work to the schedule would reduce or eliminate bonuses across the team. Forget incentive - the company’s system created a strong disincentive to doing the right thing.
Ideally, the Enterprise Security policies would catch this and there was in fact a high initial cost to get a patch policy exception, including several management approval gateposts. But a manager who made that investment of time saved the team a lot of work and minimized disruption to the schedule.
Subsequent re-approvals were routine, so running unpatched became even more cost effective over time. As an application grew, the project team became completely dependent on patching exceptions. That dependency then became the justification for re-approval.
Policy exceptions were always evaluated on a per-server basis. The potentially catastrophic aggregate risk of hundreds of unpatched business critical servers was never evaluated formally.
Sticks and Carrots
Some time ago, I ran a request-driven Ops Support team and our incentives were tied to customer satisfaction. Our internal customers cared deeply about turnaround time for functional requirements on new server builds, but not at all for the items they considered non-functional, such as security and monitoring. They would have been perfectly happy if we delivered a server with no security and no monitoring, and because incentives shape behavior, we often did.
This violated about a dozen policies and I wanted some insight as to how much trouble I might be in. How big was the backlog? What was the average line-item latency? Were we clearing the work or did we have servers that were never properly secured or monitored? Being a coder at heart, I wrote a program to track and report all this. Then I made the near-fatal mistake of showing it to my manager.
We had never been evaluated on completion of security or monitoring tasks but the moment they became visible we were, and we scored poorly on our own test. The team performed as well as before and the clients were just as happy, but performance reviews took a decidedly negative turn.
It wasn’t hard to write a new script that performed all of the install tasks rather than simply reporting on their status. Where it had taken most of a day to properly build out a node by hand, the new process took about 15 minutes. As soon as it was easier, faster, and far more accurate to build out a new server with the tool, all manual builds stopped and we started turning requests around in near-real time. That freed up time to work the backlog and we soon completed all outstanding build tasks. Everyone involved was happier, and we mitigated a lot of latent risk.
Lessons Learned
The problem in both scenarios was that the policies that defined the structure of the organization were in deep conflict with the culture. Both companies assumed that security compliance would be primarily driven by policies and penalties, but structured remuneration and recognition so that success depended on externalizing as much risk as possible.
The Ops team in the patching story knew about the wasted maintenance fees but would have had to work harder for less money had they reported it. The LOB owner filed the exception requests to keep the project teams on time and in budget. The Enterprise Security team approved the exceptions initially because they were few, then later because they were so numerous that to force remediation was political suicide.
After I reported the issue it took two years and management turnover across multiple departments before the servers were fully patched. Because the systemic incentives remained intact, a new cohort of unpatched servers is about to go out of service and the cycle will likely repeat.
The stick-and-carrot story was about the break-even point of quality versus speed. Increased turnaround time on build requests and unhappy customers would exact a toll in recognition, remuneration, and team turnover, whereas the cost of policy non-compliance was largely hypothetical. Equilibrium consisted of balancing quality and speed until their costs were perceived as equal, which consistently compromised quality. The solution was to radically adjust the cost of properly building a server to near zero, after which risking disciplinary action over policy compliance made no economic sense.
Hacking Company Culture
In the long run, no amount of technical skill can overcome a company culture that opposes security. Fixing that requires hacking the culture. Incentives that work tend to have a few things in common:
- The intuitive action is the right one. A person given minimal guidance should tend toward the preferred action.
- More carrot than stick. Anticipation of reward is more motivating than fear of loss.
- They are in the moment. The most significant systemic incentives are those that impact on the day-to-day function of the team. This is why risking termination by bypassing security controls to get work done is so common.
- They reward accountability. People like being caught doing the right thing. When people prefer anonymity, look for a perverse incentive.
- They are forgiving. Incentives should encourage prompt disclosure of mistakes and strongly discourage covering them up. The company can recover only from mistakes it knows about.
- They are cost-effective. When the combined cost of quality plus speed exceed your team’s capacity, one or both must be compromised. The cost of doing the right thing should always be trending toward zero.
Most importantly though, incentives that work are nearly always deliberately engineered. Left to chance, cultural incentives always follow the path of least resistance and, as with water, that is always downhill.