It was a fairly typical afternoon. I was knee-deep in debugging an issue with an SSIS package that a customer had raised (something about inconsistent row counts between source and target) when a Teams ping broke my focus:

The database maintenance job is missing on the ACME server, can you check?

Now, I’m not the DBA, I’m a software engineer. But when something odd happens to the database behind our Analytics platform, I tend to get the call.

I jumped onto the ACME SQL Server and did the usual: check SQL Server Agent job history, look for failed executions, dig through logs. Only this time… the job wasn’t there.

Not failed. Not disabled. Gone.

I stared at the empty job list for a few seconds, assuming I’d connected to the wrong server. I hadn’t.

Step 1: Questioning Reality

The job in question was our regular database maintenance task. It performs index rebuilds, stats updates, all the usual suspects. It had been running for years on multiple servers without issue. It is installed as part of the product upon initial installation. This server had no recent deployments. No config changes. The job just… vanished.

My first assumption was that someone had deleted it… accidentally, or otherwise. But there were no audit logs to confirm that. SQL Server Agent, by default, doesn’t log job deletions. There was nothing useful in the SQL logs either.

Step 2: Looking for Ghosts

A handy query I’ve used in the past came to the rescue:

SELECT *
FROM msdb.dbo.sysjobhistory
WHERE job_id NOT IN (SELECT job_id FROM msdb.dbo.sysjobs);

It checks for orphaned job history… records for jobs that no longer exist. Sure enough, there it was! Entries showing successful runs right up until the previous early morning but were missing for this morning.

That told me two things:

  • The job had definitely existed.
  • It had been deleted sometime after its last run.

But by who?

Step 3: The Hunt for Clues

We weren’t running SQL Server auditing at the time (lesson learned). No Extended Events session to help. No trigger in place on msdb.dbo.sysjobs (and yes, I know that’s unsupported—but desperate times, and all that).

I pulled the server’s Windows Event Logs and scanned for user logins and permission changes around the timeframe. Nothing stood out.

In the end, I couldn’t prove who deleted it. Might’ve been an accidental cleanup during maintenance. Could’ve been a silent misfire from an old deployment script. We’ll probably never know.

Step 4: Preventing Future Vanishings

I hate mysteries. So I took steps:

  • Nightly Job Export Script: I wrote a quick PowerShell script to export all job definitions into JSON files, compare with correct values and e-mail when something changes. At least now we’ll know what changed, even if we still can’t prove who.
  • Change Control Reminder: We quietly reminded the team that jobs are production assets and shouldn’t be touched outside of planned releases.
  • Added Light Auditing: Nothing too heavyweight, but I added a lightweight Extended Events session just to catch job create/delete actions.

Final Thoughts

SQL Server Agent is brilliant when it’s working—but the moment something goes missing, you realise how little visibility it actually gives you.

If you’re not auditing changes to jobs (or exporting them regularly), you’re flying blind. In this case, no real damage was done. I recreated the job using an old script and it was back up and running that evening.

But it was a useful nudge to harden things up a bit. Because if something disappears in production and no one can prove it… it’ll probably be you who gets blamed.

Share on: