The defensive side of Security technology is an interesting place to be at the moment, with a vast number of products and techniques trying to defend against an ever-changing attack landscape.
Where there is uncertainty, people want to be assured, and reduce the likelihood that they will get breached. Is the information gathered real and actionable? Have we been breached or not? What is the probability?
Bayes Theorem is fashionable across a number of fields today, and the idea of ‘machine learning’ to solve a security problem seems compelling.
Bayes was an “amateur” mathematician and Church Minister in the 18th Century, so no knowledge of computers, but he set out to solve a fundamental problem and this is where lasting ideas come from.
So why Bayes?
If you have read Daniel Kahneman’s book “Thinking Fast and Slow” (highly entertaining read), you will be aware that humans are not always great at instinctive decisions based on statistics. Or if you add context it sometimes overrides the facts, when it really shouldn’t.
Consider a drug being brought to market that definitely cures a disease 99.9% of the time. (I like it, where can I get it?) I know what 99.9% means, that means pretty much a sure thing? Hold on, how often does it fail and what are the consequences when it does? Well in a 40,000 seater stadium this fails for 40 people. What if then, when it fails, it kills the person 50% of the time? So 20 people in that stadium would die. Clearly not acceptable, the drug is shelved.
Extending the base-rate fallacy. If I have a test for a disease that is 99% accurate, what are the odds I have the disease if am tested positive? That is 99 out of 100 people who have the disease will test positive, and 99 out of 100 who don’t will test negative. Turns out the answer is around 50%. Take the test again, the odds go back up to 99%. Take a 90% accurate test, and even after the 2nd test the odds are still not at 50%. As an exercise, once you have been through the below examples, and fully understand, I encourage you to take these numbers and plug them into the Bayes equation.
Another example of this flaky judgement is a conjunction fallacy.
The classic example is below:
“Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
- Linda is a bank teller.
- Linda is a bank teller and is active in the feminist movement.”
The idea is that some may think the second is more likely, when the “and” makes it less likely given the base rates. How many 31 year old females are Bank Tellers? How many 31 year old females are active in a feminist movement? Does the “AND” make sense?
The additional information often confuses our instinctive noodle.
The whole idea with Bayes is we can add numbers to things that seem subjective or confusing, use any additional info, and get a more accurate read.
Bayes states that…
The prior odds times the likelihood ratio equals the posterior odds.
The formula for Bayes is not difficult, the hard part is what to plug into the formula and how. You need to decide on your tests and events. There is a test for a condition, and there is the event that someone actually has that condition.
You are saying “what is the probability of the event given the following test was positive?”
Also what is the probability of the positive test being accurate? (False positives – and you can use false negatives depending how you frame the test and the event.)
You can see how this can get a little confusing but let’s have a go anyway with Bayes below.
The point about Bayes is that you have some Data and you make a claim about the data, or a hypothesis.
So you have a Hypothesis and you want to know what the probability of that hypothesis is given the Data.
The notation P(A|B) can be summarised as the Probability of A assuming B to be true
So P(A|B) where A is the hypothesis given B (which is the Data). You can see the “|” sign as the word “given” if you like. Altogether this is called the Posterior
So this is equal(=) to
P(A) the Probability of the hypothesis – we call this the Prior
P(B|A) – the probability of the Data given a particular hypothesis, call this the Likelihood
Take all this and divide by
P(B) the probability of the data itself.
Got that? Good. Let’s plug in some numbers with an example
We will do this in steps and then the equation.
We have a condition, let’s call it “Geekiness”. What if we had a test to try to identity Geekiness? (aside from writing blog posts about Bayes)
We are testing 100 students for this condition.
We know Geekiness affects 20% of the students tested
The test for Geekiness involves watching a Star Trek film trailer and seeing if the pupils dilate excessively.
Among the students with Geekiness, 90% of the pupils dilate when tested.
But among those without Geekiness, 30% also dilate when seeing the trailer.
So what is the probability that the test shows a student actually has “Geekiness” from 100 students?
Or the hypothesis – what is the probability that students who get excited at a Star-Trek trailer have “Geekiness” given a positive test result.
Step 1: Find the probability of a true positive on the test. That is people who actually have Geekiness (20%) multiplied by true positive results (90%) = 0.18 (or 18 out of 20 from the 100 students)
Step 2: Find the probability of a false positive on the test. That equals people who don’t have the Geekiness (80%) multiplied by false positive results (30%) = 0.24 (or 24 people out of the 100)
Step 3: Figure out the probability of getting a positive result on the test. That equals the chance of a true positive (Step 1) plus a false positive (Step 2) = .0.18 + 0.24 = 0.42
Step 4: Finally find the probability of actually having Geekiness given a positive result. Divide the chance of having a real, positive result (Step 1) by the chance of getting any kind of positive result (Step 3) = .0.18/0.42 = 0.43, or 43%. So considerably less than the 90% that we started with. With that additional info, a test that starts as 90% accurate for those with the condition, is less that 50% accurate when you take into account everyone (the base rate).
Surprising for some but we get a real figure, and this is the power.
Let’s now plug the same info as above into the Bayes equation.
A Posterior, a Prior and a Likelihood walk into a bar….
P(A|B) is the probability the student has “Geekiness” given a positive test result. (Posterior)
P(A) = Probability of having Geekiness = 20% (Prior)
P(B|A) = Chance of a positive test result given student actually has Geekiness = 90%. (Likelihood)
P(B) = Chance of a positive test in the overall student population of 100, which is 42%
Now we have all of the information we need to put into the equation:
P(A|B) = P(B|A) * P(A) / P(B)
P(A|B) = P(0.9) * P(0.2) / P(0.42) = 0.43 (43%)
P(A|B) = (90% * 20%) / 42% = 43%
Another way to express this:
Prior odds * Relative likelihood = Posterior Odds
So there you have it. Try some examples yourself, and be patient, don’t expect to be a whizz in 5 minutes.
There is more to Bayes than I have covered, but hopefully you should get a feel for how taking into account data, sample size, and accuracy can affect your probability. We need to be rigorous in questioning the data. A number of Security start-ups are using these techniques to better predict and detect anomalies or breaches, and although it doesn’t promise to be a panacea, I am excited to see where all this leads over the next few years.