The Bug That Hid in a Timestamp
A date silently failed to parse on one row out of thirteen. I caught it only because it sat next to twelve that were right — and the failures that survive a system are the ones clever enough to look normal alone.
I found this one by accident, which is the only way I ever find these, and that is the actual subject of this essay.
I was building a list. The system was going to start reminding me, every morning, about the things sitting in my inbox that I had never answered: the job applications, the customer messages, the leads gone cold. To do that, it had to print each one with how long it had been waiting. Ten days. Five days. Sixteen days. Simple. And when I ran it for the first time, the list came out clean except for a single line, where instead of "four days ago" there was just a raw date, naked, with no count beside it. One row out of thirteen, looking slightly wrong in a way I almost scrolled past.
I almost scrolled past it. That is the part I keep thinking about.
Here is what had happened, and it is so small it is almost funny. The database stores timestamps down to fractions of a second. Most of the time it writes six digits of that fraction. But for that one row, it had written five. And the version of the date-reading code my system runs on, an old one, quietly insists on exactly three digits or exactly six. Five, it refuses. Not loudly. It does not crash. It throws a small error, which my own code politely catches, shrugs, and falls back to showing the raw date with no count. So a timestamp with the wrong number of digits after the dot became a missing number on a screen, and nothing, anywhere, recorded that a thing had failed.
Five digits instead of six. That was the whole bug. A difference no human would ever notice in a value no human was ever meant to read, sitting in a format the database considered perfectly valid and the code considered garbage, and the disagreement between those two opinions resolved itself silently into a slightly wrong answer.
I fixed it in about four minutes once I saw it. That was never the hard part. The hard part is everything around the seeing.
Because think about how close I came to not seeing it. If that list had contained one item instead of thirteen, the broken one, I would have had nothing to compare it to. A single line with a raw date looks fine. A date is a reasonable thing for a program to show you. It is only wrong in contrast, only when it sits in a column next to twelve siblings that all say "five days ago" and it alone says nothing. The error was invisible in isolation and obvious in a crowd. I did not catch it because I was careful. I caught it because the layout happened to put the lie next to the truth, and my eye snagged on the one that did not match.
This is the same lesson I keep being handed, in a new disguise. I have written before about jobs that report success while doing nothing, about a green light over an empty room. I thought I had internalized it. And here it was again, wearing different clothes: not a job that failed loudly, not even a job that lied about succeeding, but a value that was just a little bit wrong, produced by a chain where every single link believed it had done its job correctly. The database wrote a valid timestamp. The code caught its own exception, exactly as written. The fallback did precisely what fallbacks are for. Every component behaved. The system, as a whole, was wrong. That is the kind of failure that does not get caught by checking whether things ran. They all ran. They ran themselves right into a quiet mistake.
So what do you actually do about a failure that only appears in contrast?
You manufacture the contrast. That is the entire move, and it took me embarrassingly long to name it. The reason I saw this bug is that I happened to render thirteen things in a column where uniformity was the expected state, so the one deviation popped. That was luck. The discipline is to stop relying on luck and build the column on purpose. Put the values next to each other. Show the count, the total, the distribution, not just the individual result, because an individual result has nothing to betray it but a row in a list of its peers cannot hide. The wrong number is camouflaged when it stands alone and exposed the instant it stands in line.
I think most real bugs in a running business are this kind. Not the dramatic crash that pages you at midnight, but the small, plausible, slightly-wrong value that every part of the pipeline signed off on. The payroll figure that is off by one category nobody reconciles. The total that is right except for the rows that silently got dropped before they were counted. The report that looks exactly like a report should look, and is wrong only in a way you would catch if you ever set this month beside last month and asked why the shape changed. These do not announce themselves. They sit in the data looking completely normal, and they stay there until something forces them next to a number they cannot match.
The uncomfortable truth under all of this is that my detection system is still, mostly, me, noticing. I would like it to be something better. But until it is, the least I can do is rig the environment so that noticing is easy, so that the things most likely to be wrong are displayed in the company most likely to expose them. A wrong value alone is a stranger you wave through. A wrong value in a lineup is the one whose face does not fit.
The timestamp is patched now. Five digits, six digits, three, it does not care anymore. But I am not really keeping the lesson about timestamps. I am keeping the lesson about the lineup: that the failures which survive in a system are the ones clever enough to look normal by themselves, and the only reliable way to catch them is to refuse to look at anything by itself.
/ar/