Configuration Management: The Silent Technical Debt

MaGong··Engineering

Configuration Management: The Silent Technical Debt

Here's a simple question: How does your team manage configuration?

If the answer is "database," ".env file," or "we use Apollo," ask a follow-up: Who updates it? How do you audit changes? How do you roll back?

Most teams can't answer clearly.

I spent the last two months methodically removing .env files from a project because I couldn't stand the chaos anymore. In that process, I discovered something unsettling: the industry has no agreed-upon best practices for configuration management. Apollo, Nacos, AWS Parameter Store, LaunchDarkly — each tool solves a piece of the puzzle, but no one has articulated a complete methodology.

This article isn't a product pitch. And it's not a claim that I've solved this. Quite the opposite: the industry hasn't figured out configuration management yet.

Multiple Sources of Truth: A Comedy of Errors

The first problem is straightforward: too many places store configuration.

Take a Rails project. You see config.timeout = 30 in the code, but at runtime it might be 60 (overridden in .env), or 90 (overridden again in the database), or actually 30 (the code default). Engineers have to mentally run through the priority logic to know what will actually execute.

Business configuration is worse. Say you're running an online gaming service in the Netherlands with a list of blocked keywords updated monthly. Where should this live? Database? Configuration centre? Hardcoded? Ask 10 engineers, get 5 different answers: Apollo, Parameter Store, database table, hardcoded, or LaunchDarkly feature flags.

Every approach has advocates. Every approach has traps.

The .env Disaster

Most teams rely on .env files, a habit borrowed from the 12-Factor App principles (Heroku, 2011). It was progress at the time — better than hardcoding. But in 2025, .env is a ticking time bomb.

Consistency problem: .env is inherently host-scoped. You have one version on your dev machine, another on staging, another on production. One careless deployment and staging and production configs get swapped. Production goes down.

Security nightmare: .env exposes secrets through the /proc filesystem to any process with read permissions. Credentials are stored in plaintext, committed to git (despite .gitignore), and shared over Slack and email. Anyone with /proc access sees your API keys.

Obsolete design: .env solved the 2011 problem when Docker and Infrastructure as Code didn't exist. Back then, environment inconsistency was common, so you needed fallback mechanisms to keep servers running. Now that Docker and IaC make environment consistency trivial, missing configuration should fail fast, not silently fall back to defaults.

AWS Parameter Store: Solves the Wrong Problem

"Doesn't AWS Parameter Store fix this?"

Superficially, yes. Encryption, IAM permissions, multi-environment isolation. Sounds complete. But it doesn't solve the actual problems.

Who updates? PMs don't use AWS Console (and shouldn't have production IAM access). Engineers update manually? Then who audits? Parameter Store's version history is useless — you see "parameter updated," not the business context.

Lifecycle management: That Netherlands keyword list updates monthly. Who's responsible? What process? How do you roll back? Parameter Store doesn't care — it's just a key-value store.

Cost and complexity: Standard tier is free but limited (10,000 parameters, 4KB each). Advanced tier charges per API call. Scaling gets expensive.

The credential rotation trap: Parameter Store supports automatic rotation, which sounds magical. Then Terraform rotates your database password, old credentials die immediately, but 15 microservices haven't restarted yet. Production is down 2-5 minutes until all services restart.

Why? Traditional databases don't support dual-password grace periods. Cloud-native tools assume your entire stack is cloud-native, but reality is hybrid: cloud-native apps + legacy databases. So teams disable automatic rotation and coordinate manually.

Apollo/Nacos: Parameter Store in a Fancy UI

Many large Chinese tech companies use Apollo or Nacos. Honestly, these are just bloated Parameter Store clones — same configuration, moved to a different location.

AWS Parameter Store does everything Apollo and Nacos do, without requiring you to maintain your own infrastructure. But Apollo and Nacos require ops teams: deployment, high availability, version upgrades. Apollo doesn't even support custom encryption keys, which is a non-starter for compliance-heavy teams.

The real problem: Neither tool addresses the core issue — who updates? How do you audit? How do you manage lifecycle? They're just pretty UIs.

I've seen this play out: PM messages Slack, "update the Netherlands keyword list." Engineer logs into Apollo, manually changes config, clicks publish. How is this different from directly editing the database? It's not. No code review, no audit trail, all manual.

LaunchDarkly: Solves a Different Problem

LaunchDarkly is a good product, but people confuse it with configuration management.

Feature flags handle dynamic runtime switches: A/B tests, canary releases, kill switches. You need to flip a switch in real-time without redeploying.

Business configuration is static. The Netherlands keyword list updates monthly, not in real-time, and doesn't need percentage-based rollouts. Using LaunchDarkly for this is like driving a Ferrari to the grocery store.

LaunchDarkly's enterprise plan costs $70,000+ per year (minimum 50 users). To manage a list that changes once a month? Feature flags and configuration management are different problems. They need different tools.

The Root Issue: No Framework Exists

Behind all these tools lurks a bigger truth: the industry has no coherent configuration management framework. Every approach solves a local problem (hosting, encryption, access control), but no one has tackled the real challenge — managing configuration's entire lifecycle, cost-effectively.

Cost isn't just money. It's the time engineers spend debugging config, the process PMs navigate to update config, the friction in team collaboration.

Configuration needs categorization. Different types demand different approaches. But the industry lacks this taxonomy:

  • IT config (database connection strings) vs business config (market rules)
  • Credentials (API keys) vs plaintext (timeouts)
  • Frequently changing (daily) vs stable (quarterly)
  • Required (crash if missing) vs optional (has defaults)
  • Service-specific vs multi-service shared

For each combination, you need answers:

  • Who updates?
  • PR or UI?
  • Git, database, or Parameter Store?
  • How does config propagate?
  • How do you keep services in sync?
  • Who has permissions?

The industry has no systematic answers. Teams guess, hit problems, then refactor.

A Partial Solution: GitOps for Business Config

I haven't found a silver bullet. But for business configuration, there's a reasonable approach: treat configuration as code.

Hardcode business config into your codebase. PMs update via pull request. Engineers review, merge, CI/CD deploys automatically. Git gives you complete audit history — you see who changed what, when, and why.

This approach fits: business config (keywords, rules, pricing), infrequent updates (monthly, quarterly), audit requirements, and PMs who can use git or write text files.

Not suitable for: credentials (never commit secrets), real-time feature flags (use LaunchDarkly), infrastructure config (use Terraform), or rapidly changing config (use a database).

Plainly: this isn't a silver bullet. It solves one piece of the puzzle. But that piece is crucial, and most teams ignore it.

In the AI Era, This Matters More

Here's where it gets urgent. Last week, I worked with a legacy system where configuration scattered across four places. Even with Claude Code properly configured with AWS credentials, it got confused. The code was full of "likely": This setting is likely coming from.... Even AI can't figure out which config wins at runtime.

When config spreads across .env, databases, code defaults, and YAML files, AI can't judge priority. It just guesses "likely" because it can't determine what executes. This implicit override logic breaks AI reasoning. Humans error too.

In the AI era, configuration chaos multiplies. AI codes 10x faster, so configuration errors propagate 10x faster. AI can't navigate implicit logic — Rails' Convention over Configuration is a nightmare for LLMs. Worse: AI leaks secrets. Misconfigured .env files end up in code or commit history.

The fix is AI-friendly configuration: single source of truth, explicit precedence, fail-fast instead of fallback. GitOps is naturally AI-friendly — config in git, explicit, traceable, complete history.

Bluntly: if AI can't understand your configuration management, neither can humans, eventually.

Conclusion

Configuration management is the silent technical debt. We've automated CI/CD and infrastructure, but configuration is still manual craft. Current tools are fragmented: Apollo manages config, Parameter Store encrypts, LaunchDarkly handles feature flags. No one offers a complete methodology.

My GitOps approach is just one puzzle piece — useful for business configuration, irrelevant for credentials or real-time features. The industry needs a classification framework that gives systematic answers for different configuration types.

There's also the reality check: next year, CISOs won't let teams casually access production databases. Compliance and security will eventually force the industry to improve. Better to prepare now than scramble later.

If you're interested in exploring this problem, let me know. Let's figure this out together.


Originally written in Chinese. Translated by the author.