articles
/
hr-data-apocalypse.md
# How I Saved My Company from an HR-Induced Data Apocalypse (and Built a Slick Intranet in the Process)
What started as an intranet request turned into a full-scale exercise in data engineering and ghost-hunting
If you've ever worked in tech, you know there is one universal law: human beings will always find a way to break a spreadsheet.Recently, I was tasked with building a shiny new corporate Intranet Hub. It had all the bells and whistles: a beautiful, real-time “Townhall Directory” with smooth filtering, personal profile cards, and an automated “Birthday Spotlight” web part to make sure everyone gets their workplace cake notifications on time.
The webparts looked gorgeous. The React components were singing. I was ready to coast into the weekend.
But then, I opened the source data files provided by HR.
Cue the dramatic horror movie music. 🎬
The Chaos: 50 Shades of “Accounts”
To feed our new Intranet directory, the system needed to read Excel files uploaded by the HR teams. Simple, right?
Except the data was fragmented. The team managing local staff kept their own spreadsheets, and the team managing expatriate staff kept theirs. And because humans are humans, consistency didn't exist.
When I looked under the hood at the departmentcolumn, I didn't find a clean list. I found a wild west of typos and abbreviations:
Finance & Accounts
→
Digital
→
Dispatch
→
// and ~47 other creative interpretations of the same 31 official departments
Why This Mattered
Computer programs are incredibly literal. If my code is looking for Digital to put someone in the right department bucket on the frontend UI, and the spreadsheet says DTS, that employee simply vanishes into thin air. No profile card. No birthday cake announcement. Just digital oblivion.
This is the full pipeline I needed to build — and the places it needed to protect itself:
Step 1: Building a Translation Engine
(The Backend Trench Warfare)
Instead of complaining, I decided to build a bulletproof, self-healing backend data pipeline. If HR was going to feed the system chaos, my pipeline was going to act as a data filter.
I built a custom data dictionary layer. Think of it like a universal translator inside the ingestion engine. If a spreadsheet row came in saying Account, the engine intercepted it, said “Ah, you mean Finance & Accounts,” fixed it on the fly, and mapped it cleanly into our master database.
I even wrote code to handle invisible human errors — like when someone accidentally hits the spacebar at the end of a column header (looking at you, “EMPLOYEE CODE ”). My pipeline used fallback expressions to hunt down those trailing spaces and neutralize them before they could crash the server.
Step 2: Ghost Hunting and Avoiding Accidental Layoffs
Just when I thought the pipeline was safe, two massive “boss battles” appeared in the data architecture.
👻 The 512 Ghost Rows
During testing, a tiny demo file with exactly one employee row took foreverto run. Why? Because someone had scrolled down to row 500 in Excel weeks ago, hit backspace, and left. Excel permanently flagged those 512 empty rows as “active data.”
Poor Power Automate was spinning its wheels trying to onboard 512 blank ghosts with no names or codes! I had to write a pre-flight memory filter to instantly vaporize empty rows the second the file was dropped into the system.
🛑 The Accidental Mass Layoff Bug
Because the local staff and expat staff files were uploaded at completely different times of the month, a massive logical hazard emerged.
Originally, when the expat file was uploaded, the system would scan the corporate directory, see that all 370+ local nationals were “missing” from that specific file, panic, and automatically toggle them all to Inactive.
I had essentially engineered an automated, accidental mass layoff engine! 😂
To fix this, I refactored the architecture to execute a dynamic scope check. The system now looks at the file's metadata first to see who is uploading. If it's an Expat file, it places a protective shield around the National database rows, and vice versa. Disaster averted.
Step 3: Fixing the Source
(Data Governance over Coffee)
Once the backend was self-healing and the frontend looked like a million bucks, I realized something important: good engineering doesn't just fix bad data downstream — it stops it at the source.
I packaged up my findings, counted up the 50+ wild naming variations I discovered, and condensed them into 31 official corporate department strings.
Instead of sending a passive-aggressive email, I set up a collaborative workshop with the HR team. I showed them how a single typo could make a teammate invisible on the hub, handed them a locked master template where Row 1 can't be messed with, and established a clear “data contract.”
Now, if an upload contains a completely unapproved department string, the system gracefully skips it and automatically sends a neat error report to the uploader telling them exactly what to fix.
The Outcome ✨
What started as a request for an intranet hub turned into a full-scale exercise in enterprise data engineering.
✓
100% automated and stable pipeline in production
✓
Frontend UI crisp — filter by National or Expat with one click
✓
Birthdays roll out flawlessly
✓
Zero ghost rows, zero accidental offboards, zero invisible employees
Key Takeaways
1. Design for the Unhappy Path
Code for the messy reality of human data entry, not the perfect fantasy of your sample files. Real users are creative in ways your tests never anticipated.
2. Empathy over Anger
When data is messy, don't get frustrated with the users. Sit down with them, show them why the system breaks, and give them the tools to succeed. Data governance is a people problem first.
Now, if you'll excuse me. I'm going to go check if it's anyone's birthday today. 🎂