Poisson Distribution: A Cybersecurity Defender’s Ally in Detecting Brute-Force Attacks
Cybersecurity is a constant battle of wits, and staying ahead of attackers requires both technical expertise and a smart approach to data. One of the most effective, yet often underutilized, tools in this fight is the Poisson distribution. This statistical workhorse can help us model normal network behavior and, by extension, spot the telltale signs of a brute-force attack or other malicious login attempts.
Think of the Poisson distribution as a way to predict how often a random, independent event will happen over a fixed period. In our case, the “event” is a failed login attempt on a server. By tracking these attempts over a long enough period of “normal” operation, we can calculate an average rate, known as lambda (λ). This lambda becomes the heart of our model. It represents the number of failed login attempts we’d expect to see in a given time frame — say, per hour.
With our lambda established, we can use the Poisson formula to calculate the probability of observing a specific number of failed login attempts. The formula itself looks a bit intimidating:
P(X = k) = (λ^k * e^-λ) / k!
But what it’s really telling us is the likelihood of seeing exactly k failed logins, given our average rate λ. This allows us to establish a clear baseline of what’s considered “normal” traffic.
The real power of this model comes into play when we compare our predictions to what’s happening in real-time. If our baseline tells us we should expect, on average, two failed logins per hour (λ = 2), and suddenly we see 10 failed attempts in a single hour, something is likely wrong. The Poisson distribution can tell us the probability of this happening naturally, which in this case would be incredibly low. A super low probability is a red flag, suggesting that this isn’t just a handful of users mistyping their passwords — it’s a targeted, automated attack.
We can use a simple script to put this into practice. Let’s look at how we might do this using R. First, we need to get our baseline data and figure out our average rate.
# Sample historical data of failed logins per hour failed_logins_per_hour <- c(1, 0, 3, 2, 1, 0, 0, 4, 1, 2, 1, 3, 0, 2, 1, 0, 0, 2, 1, 3, 0, 0, 1, 2)
# Calculate the average rate (lambda) average_rate <- mean(failed_logins_per_hour) print(paste("Average rate of failed logins per hour:", average_rate))
Running this code gives us our lambda. Now, imagine we’re monitoring our system and we suddenly see 7 failed logins in the last hour. We can use R to calculate the probability of this happening by chance. We’re interested in the probability of seeing 7 or more failed attempts, since a higher number is what we’re looking for.
observed_failures <- 7 # Calculate the cumulative probability of seeing 0 to 6 failed attempts probability_6_or_less <- ppois(observed_failures - 1, lambda = average_rate) # The probability of seeing 7 or more is 1 minus that number probability_7_or_more <- 1 - probability_6_or_less print(paste("Probability of", observed_failures, "or more failed logins:", probability_7_or_more))
In the above case, the probability of 7 or more failed logins is: 0.032% which is significantly less than our pre-defined threshold of 1%. Therefore, an alert should be immediately created. This statistical anomaly is a strong indication that a brute-force or credential-stuffing attack is underway.
By continuously monitoring login attempts and using the Poisson distribution to compare real-time data against a statistically sound baseline, we can build a robust defense mechanism. This approach doesn’t just react to threats; it predicts them, turning statistical analysis into a critical line of defense in our fight to secure digital assets.