All writing
Notes on building

The 17-Day Silence

A green light over an empty room: how a job that reported success every night, while doing nothing, taught me to stop trusting status codes.

June 5, 2026 · 5 min

Every night, a small job on my system woke up, reached out to the database, and put the day's data away. It had done this without complaint for a long time. Then, for seventeen nights in a row, it lied to me.

Not loudly. That's the part that still gets under my skin. There was no crash, no red text, no alert on my phone at two in the morning. Every night the job ran. Every night the request went out, the server answered, and the answer was the cheerful little number that means "Created." Every night the log showed success. And every night, nothing was written. Zero rows. The data went out the door and into the air.

I want to be precise about what was happening, because the mechanism is the whole lesson. The database has a security layer that decides who is allowed to write to what. The role my script was using had quietly lost permission to add records to that one table. So when the script asked to write, the database did the most polite thing imaginable. It accepted the request. It returned success. And then it threw the data away. No error, because from the database's point of view, nothing went wrong. It had been asked to discard the write, and it discarded it, on time, every time.

A green light over an empty room.

What I believed

I believed what the log told me. That is the honest version. I had built a system that reported on itself, and the reports were good, so I trusted them. A success code felt like proof. The request was accepted. The status said "Created." What more do you want?

This is the kind of mistake that only an operator who is happy makes. When things are visibly broken, you go looking. When the dashboard is green and the briefing says nothing is pressing, you do the most natural thing in the world: you look elsewhere. The system was, in effect, reassuring me with a straight face. I had no reason to doubt it, and that absence of doubt was the whole problem.

What was actually true

It came apart not through any clever monitoring but through a human noticing. Me. I was looking at the data for some other reason, and it was thinner than it should have been. Sparser. The shape was wrong in a way I felt before I could name it. So I went digging, and the floor gave way.

About sixty-eight records were simply gone. They had never existed. There was no corrupt file to recover, no backup to restore, because nothing had ever been saved in the first place. I sat down and put them back by hand, one window of missing days at a time. That is the cost of a silent failure stated plainly: not just the lost work, but the work of reconstructing what you lost, after the trail has gone cold, from memory and fragments.

And I got lucky. Sixty-eight records is a number I could rebuild in an afternoon. Seventeen days is short. If I had not happened to glance at that data, if the silence had run for months, the hole would have been too big to climb out of by hand. The failure didn't announce itself. It waited to be discovered, and it would have kept waiting.

What I changed

I turned this into a rule, and the rule is short enough to keep in my head, which is the only kind of rule that survives. "Completed" is wrong if anything was skipped silently. A success code is not proof of success. It is proof that someone, somewhere, was willing to say yes.

So the systems no longer take yes for an answer. After a job claims to have written something, it goes back and checks that the thing is actually there. It counts the rows. It compares what it meant to write against what landed, and if those two numbers disagree, it says so, out loud, where I can see it. The status code is now treated as an opinion, not a fact. The fact is the count.

That sounds like a small engineering adjustment, and mechanically it is. But it changed how I think about every system I run, including the ones that have nothing to do with databases. I stopped asking whether a thing reported success and started asking whether I could see the result of it with my own eyes. Trust the artifact, not the announcement. The receipt that says the package shipped is not the package.

The part that generalizes

Here is the thing I keep coming back to. A loud failure is a gift. It is annoying, it ruins your evening, it pulls you away from whatever you'd rather be doing. But it tells you the truth immediately, while the problem is still small, while you still remember what you were doing when it broke. A loud failure costs you an hour and saves you a month.

A silent failure does the opposite. It compounds in the dark. Every night that nothing was written, the gap got wider, and every night the log told me it was fine. The most dangerous problem in any system you run is not the one that screams. It is the one that looks, from every angle you happen to be looking from, exactly like everything is working.

I run a restaurant and I build systems, and these turn out to be the same job. In both, the failures that hurt you are rarely the dramatic ones. They are the quiet drift no one is measuring, the thing that has been broken for a while and reporting success the whole time. The work is not to react faster when the alarm goes off. The work is to make sure the alarm is wired to the truth, and then to go and look at the empty room yourself, every so often, even when the light is green.

/ar/