Today’s post is something I wish I had in my early days.
When I was preparing for security engineering roles, I wasted months jumping between courses, blogs, scattered GitHub repos, and random CTF writeups. None of them helped me understand the problems real security engineers solve inside big companies like Google, Netflix, Meta, or Amazon.
If I could go back, I would spend my time learning the fundamentals through real problem statements.
These are the exact challenges that come up at scale. These are the conversations you have during system design interviews. These are the issues you solve once you start the job.
If you want to break into security engineering in 2025, learn these by doing. Not by memorizing definitions. Not by watching a hundred videos. By understanding how systems actually work.
This guide covers 7 Core Pillars of Security Engineering. We will go deep into the architecture, the constraints, and the solutions for each.
1. Risk, Access, and Identity
The foundation of every secure system. If you don’t know who the user is, you cannot secure them.
➤ Design a secure authentication system for a billion users
Scaling authentication to a billion users isn’t just about checking a password. It is a distributed systems problem. When you operate at the scale of Gmail or Facebook, a 1% failure rate means 10 million angry users locked out.
The Core Challenges:
- Latency: Password hashing is intentionally slow (CPU intensive). How do you handle 100k logins per second without burning down your servers?
- Storage: Storing billions of rows of credential data requires sharding.
- Credential Stuffing: Attackers will throw millions of leaked username/password pairs at your API.
The Solution Architecture:
- Password Hashing:
- Algorithm: Use Argon2id. It is memory-hard and CPU-hard, making it resistant to GPU/ASIC cracking.
- Parameters: Tune your salt length (16 bytes min) and work factor/memory cost so verification takes ~500ms on your hardware.
- Peppering: Store a secret key (pepper) in an HSM (Hardware Security Module) or a secure vault (like AWS KMS). The hash becomes
Argon2id(password + salt + pepper). If your DB is leaked, the hashes are useless without the pepper.
- Handling Scale:
- Async Hashing: For registration, do not block the main thread. Offload hashing to a worker queue.
- Caching Negative Results: If an IP fails login 5 times, cache that “block” at the edge (CDN or Load Balancer) to save your backend CPU.
- Credential Stuffing Defense:
- Rate Limiting: Implement “Leaky Bucket” algorithms per IP and per Username.
- IP Reputation: Check user IPs against threat intelligence feeds (e.g., GreyNoise, Spamhaus).
- Device Fingerprinting: If a user logs in from a new device and a new country simultaneously, trigger a challenge (MFA or CAPTCHA).

➤ Build OAuth-based login for third-party apps
You are building the “Log in with Google” button for your company. You are now an Identity Provider (IdP).
The Protocol:
- OAuth 2.1 / OIDC (OpenID Connect): Do not use legacy OAuth 2.0 implicit flows.
- Authorization Code Flow with PKCE:
- Why PKCE (Proof Key for Code Exchange)? It prevents authorization code interception attacks, especially on mobile devices where custom URI schemes aren’t secure.
- The client creates a
code_verifier(random string) and sends its hash (code_challenge) in the initial request. - When exchanging the auth code for a token, it sends the raw
code_verifier. The server hashes it and compares it to the stored challenge.
Critical Security Controls:
- Scope Limitation: Never grant
adminorrootscopes by default. Use “Least Privilege” scopes (e.g.,read:profileonly). - Redirect URI Whitelisting: Strict matching is required.
https://evil.com/callbackmust be rejected if onlyhttps://app.com/callbackis registered. Avoid wildcard matching like*.app.com. - Consent Screens: Clearly show the user exactly what data is being shared. “This app wants to read your private emails.”
➤ Design a Zero Trust access model inside a large company
The castle-and-moat strategy (VPNs) is dead. In 2025, we assume the network is already compromised.
The Philosophy:
- Never Trust, Always Verify: Every request, even from inside the office, must be authenticated and authorized.
- Identity-Based Routing: Access is granted based on Who you are, not Where you are (IP address).
The Architecture (BeyondCorp Style):
- The Proxy: All internal apps sit behind an Identity Aware Proxy (IAP). No app is exposed directly to the private network.
- The Device Trust Tier:
- Is the laptop corporate-managed (MDM)?
- Is the disk encrypted?
- Is the OS patched?
- If No: Deny access, even if the user has the correct password.
- Short-Lived Certificates: Instead of long-lived SSH keys, users exchange their SSO login for a 1-hour SSH certificate.
Common Pitfall: “Break-glass” scenarios. If your Zero Trust IDP goes down, how do admins get in to fix it? You need a highly secured, monitored emergency backdoor (e.g., a physical safe with a hardware token).
➤ Create an automated privileged access review system
Auditors hate “standing access.” Why does Bob from Engineering still have root access to the Payments database if he switched to the Frontend team 6 months ago?
The Solution: JIT (Just-In-Time) Access
- No Permanent Admins: Bob has 0 permissions by default.
- Request Flow: Bob requests “Write access to Payments DB for 2 hours” via a Slack bot or CLI.
- Approval: The request triggers a notification to the Payments Team Lead. They click “Approve.”
- Ephemeral Grant: The system grants the role for exactly 2 hours.
- Auto-Revocation: At 2 hours and 1 second, a cron job or event-bridge trigger revokes the permission.
- Audit Log: The request, approval, and revocation are logged for compliance.
[Image Description: A flowchart showing the Just-In-Time access lifecycle. User requests access -> Manager approves via Slack -> Lambda function calls IAM API to grant role -> Timer starts -> Timer expires -> Lambda revokes role.]
➤ Build a secure session management system for a mobile app
Mobile devices are hostile environments. Users lose them, they get stolen, and they connect to hostile Wi-Fi.
Key Decisions:
- Session Tokens (Reference vs. Value):
- JWT (Value Token): Stateless, scales well. Hard to revoke immediately.
- Reference Token (Random String): Stored in Redis. Easy to revoke, but requires a DB lookup for every request.
- Hybrid: Use short-lived (15 min) JWTs for access and long-lived (30 day) Refresh Tokens for staying logged in.
- Token Storage:
- iOS: Store in the Keychain.
- Android: Store in EncryptedSharedPreferences (backed by the Keystore system).
- Never store tokens in
localStorageor unencrypted SQLite.
- Rotation: Every time a Refresh Token is used to get a new Access Token, issue a new Refresh Token and invalidate the old one. This is called Refresh Token Rotation. If an attacker steals a refresh token, they can only use it once, and the legitimate user’s next attempt will flag the theft (because the old token is now invalid).
➤ Design an MFA service that works reliably worldwide
SMS is insecure (SIM Swapping) and unreliable (delivery failures in some regions).
The Hierarchy of Trust:
- Hardware Keys (YubiKey/Titan): FIDO2/WebAuthn. Unphishable. Highest security.
- Passkeys: Biometric-backed WebAuthn credentials synced via iCloud/Google Password Manager. The future of auth.
- App-Based TOTP: Google Authenticator/Authy. Secure, but can be phished (attacker sets up a fake site asking for the code).
- Push Notifications: “Is this you trying to sign in?” Susceptible to “MFA Fatigue” (attacker spams requests until user clicks yes).
- SMS/Email: The fallback of last resort.
Implementation Detail: When designing the backend, use a standardized protocol like WebAuthn. The browser handles the cryptography with the authenticator (TouchID, FaceID, USB Key). Your server stores the public key. You send a “challenge” (random nonce), the device signs it with the private key, and you verify the signature.
➤ Handle account recovery securely at scale
“I forgot my password and lost my phone.” This is the hardest problem in Identity.
The “Anti-Pattern”: Security Questions. “What is your mother’s maiden name?” This is public information found on Facebook. Do not use this.
The Secure Pattern:
- Multi-Channel Verification: Verify email AND SMS (if available).
- Social Vouching: (Advanced) Ask 3 trusted friends (pre-selected by the user) to send a recovery code.
- Risk-Based Delays: “We have received your recovery request. For security, we will process this in 48 hours.” This gives the legitimate user time to see the notification and cancel the attack.
- Device History: “You are trying to recover from a device we have never seen before, located in a country you have never visited. Request Denied.”
2. Network Security and Traffic Protection
The pipes that connect the world. If the network is flooded or snooped, the application is useless.
➤ Design a DDoS detection and mitigation pipeline
Types of Attacks:
- Volumetric (L3/L4): UDP floods, SYN floods. Aim is to fill the pipe.
- Application (L7): HTTP floods. “Search for ‘a'” 10,000 times a second. Aim is to exhaust CPU/DB.
Mitigation Architecture:
- Edge Network (Anycast): Advertise your IP address from 50 global locations via BGP. Attack traffic is split across the globe, diluting its impact.
- Scrubbing Centers: Specialized hardware (FPGA-based) that inspects packets. It drops bad traffic (malformed packets, known botnet IPs) and passes clean traffic to your origin.
- SYN Cookies: When the SYN queue fills up, the server responds with a crafted sequence number (cookie) instead of allocating memory. Only if the client responds with the correct ACK does the server allocate resources.
- Rate Limiting (Token Bucket): Give every IP a “bucket” of tokens (e.g., 10 requests/sec). If the bucket is empty, drop the packet.

➤ Build secure service-to-service communication (Service Mesh)
In a microservices world, “Trust the LAN” is dead. Service A should not trust Service B just because they are in the same Kubernetes cluster.
Solution: Mutual TLS (mTLS)
- Normal TLS: Client verifies Server.
- mTLS: Client verifies Server AND Server verifies Client.
- Implementation: Use a sidecar proxy (like Envoy).
- App A talks to localhost (Envoy A).
- Envoy A talks to Envoy B over mTLS.
- Envoy B talks to App B on localhost.
- SPIFFE/SPIRE: These are standards for giving every workload an identity. “I am the Payment Service, here is my signed cryptographic document proving it.”
➤ Design TLS termination and certificate rotation for large fleets
Managing 10,000 servers means managing 10,000 certificates. Manual renewal = outage.
The Automated Pipeline:
- ACME Protocol: Use the protocol popularized by Let’s Encrypt.
- Short-Lived Certs: Issue certificates valid for only 7 days. Why? If a private key is stolen, it is useless next week.
- Rotation Agent: A sidecar process on every web server checks the cert expiry. If < 24 hours remain, it contacts the internal CA (Certificate Authority), gets a new cert, and gracefully reloads Nginx/Envoy without dropping connections.
➤ Build an internal VPN replacement for employees
See “Zero Trust” in Section 1. The goal is to move from Network-Level Access (you are on the VPN, you can ping everything) to Application-Level Access (you are authenticated, you can access only the Jira web app).
Tooling:
- WireGuard: If you must use a VPN, use WireGuard. It is modern, lean (4k lines of code vs 400k for OpenVPN), and uses state-of-the-art crypto (ChaCha20-Poly1305).
➤ Detect internal port scans or lateral movement attempts
Attackers usually land on a low-value server (e.g., a dev box) and scan the network to find the “Crown Jewels” (Production Database).
Detection:
- Honey Tokens: Place fake AWS keys in environment variables on servers. If anyone tries to use them, an alert fires immediately.
- Honey Ports: Open port 8080 on a server that serves nothing. If anyone connects to it, they are scanning. Busted.
- VPC Flow Logs: Analyze the metadata of network traffic. “Why is the Dev Cluster talking to the Production Payment Gateway?” -> Alert.
➤ Create a network segmentation strategy
Blast Radius Reduction:
- Tiered Zones:
- Zone 1 (Public): Load Balancers.
- Zone 2 (App): Web Servers. Can only talk to Zone 1 and Zone 3.
- Zone 3 (Data): Databases. Can only be talked to by Zone 2. NO internet access.
- Micro-segmentation: In Kubernetes, use
NetworkPolicies. “The ‘Frontend’ pod can talk to ‘Backend’ pod on port 80. Everything else is Deny-All.”
3. Application and API Security
The code logic. Where business rules meet hacker ingenuity.
➤ Design a secure API gateway with rate limits and token validation
The Gateway’s Job: It is the bouncer at the club.
- Authentication Offload: The Gateway validates the JWT signature. The backend services don’t need to know about public keys or crypto. They just receive a validated header
X-User-ID: 123. - Input Validation: Reject JSON bodies that are too large (prevent DoS) or contain malicious patterns (SQLi/XSS filters, though WAF is better for this).
- Quota Management: “User X is on the Free Tier. They get 1000 calls/day.”
➤ Build a system that auto-detects broken access control (IDOR)
The Vulnerability: Insecure Direct Object Reference (IDOR).
- User A logs in. URL is
app.com/receipts?id=100. - User A changes URL to
app.com/receipts?id=101. - User A sees User B’s receipt.
Automated Detection:
- Traffic Replay: Take a captured session of User A. Replay the requests but swap the session token with User B’s token.
- Assertion: If the server responds with
200 OKand the data length is the same, you likely have an IDOR vulnerability.
➤ Create an automated static analysis (SAST) pipeline
Shift Left: Catch bugs before they are committed.
- The Tool: Semgrep or CodeQL. These aren’t just regex (grep). They understand the syntax tree of the code.
- The Rule:
danger-mode: true.- Bad Rule: “Search for ‘password'”. (Too many false positives).
- Good Rule: “Find any function call to
exec()where the argument comes from an HTTP request parameter.” (Command Injection).
- The Workflow:
- Developer opens Pull Request.
- GitHub Action runs Semgrep.
- If High Severity issue found -> Block Merge.
➤ Detect business logic abuse
These bugs aren’t syntax errors. The code works “as designed,” but the design is flawed.
- Example: Use a coupon code “WELCOME20”. Apply it. Remove it. Apply it again.
- Detection: You need stateful monitoring. “Alert if a single UserID has redeemed the same CouponID > 1 time in 1 minute.”
➤ Build a safe secrets storage and rotation service
The Golden Rule: No secrets in Git. Not even in private repos.
The Architecture:
- Vault (HashiCorp): The central secret store.
- Secret Injection: When an app starts up (e.g., in Kubernetes), an Init Container authenticates with Vault, fetches the database password, and writes it to a RAM disk (tmpfs) at
/etc/secrets/db_pass. - The App: Reads the file at startup.
- Rotation: When the password rotates, Vault updates the file and sends a signal (SIGHUP) to the app to reload the config. No downtime.
➤ Design a system that stops replay attacks
Scenario: An attacker intercepts a request: POST /transfer {amount: 100, to: Bob}. They re-send it 10 times. Bob gets $1000.
Defense:
- Nonces: A random string used only once. The server tracks used nonces.
- Timestamps: Include
sent_at: 12:00:00. The server rejects anything older than 5 minutes. This limits the window of attack. - Idempotency Keys: The client generates a UUID
transaction_idand sends it. The server checks its DB: “Have I processed transactionuuid-123already? Yes? Return the previous success response. Do not transfer money again.”
4. Data Protection and Encryption
Protecting the Crown Jewels. Even if the server is hacked, the data should be useless.
➤ Build a client-side encryption flow for personal data
Use Case: A health app storing patient records. Even the database admins shouldn’t see the data.
The Flow:
- Client: Generates a random Data Encryption Key (DEK).
- Encryption: Client encrypts the medical record with the DEK.
- Wrapping: Client encrypts the DEK with the User’s Public Key (or a KEK from KMS).
- Storage: Client sends
EncryptedData+WrappedDEKto the server. - Decryption: The server cannot decrypt the data because it doesn’t have the User’s Private Key.
➤ Design a key management service (KMS) with HSM
HSM (Hardware Security Module): A physical tamper-proof computer that stores keys. It performs crypto operations inside the hardware. The key never leaves the box.
Envelope Encryption:
- Master Key (CMK): Stored inside the HSM. Never leaves.
- Data Keys (DEK): Generated by the HSM, returned to the app in two forms: Plaintext (for immediate use) and Encrypted (to store in the DB).
- Why? The HSM is slow. You can’t send 1TB of data to the HSM to encrypt. You encrypt the data locally with the DEK, and use the HSM only to encrypt the tiny DEK.
➤ Create a secure deletion pipeline (Crypto-Shredding)
The Problem: “Delete my data.” You delete the row from Postgres. But the data is still in:
- Backups (Tape/S3 Glaciers)
- Transaction Logs (WAL)
- Replicas
- CDN Caches
The Solution: Crypto-Shredding
- Encrypt every user’s data with a unique key per user.
- When the user requests deletion, delete the key.
- The data is now mathematical garbage. It doesn’t matter if it exists in backups for 7 years; it can never be read again.
➤ Design end-to-end encrypted messaging (Signal Protocol)
Concepts:
- Double Ratchet Algorithm: Every message changes the encryption keys.
- Forward Secrecy: If I hack your phone today, I can’t read your past messages.
- Post-Compromise Security: If I hack your phone today, but you later update your keys, I can’t read your future messages.
- X3DH (Extended Triple Diffie-Hellman): How two users agree on a key asynchronously (when one user is offline).
➤ Build a solution to track data lineage
The Goal: GDPR Compliance. “Where is User 123’s data?”
The Tech:
- Taint Tracking: Tag data at the ingress point.
- Propagation: If Service A reads User 123’s data and writes to Service B, the tag propagates.
- Graph Database: Store the flow in a Neo4j graph.
Ingress -> Kafka -> Processor -> S3. You can now query the graph to find all locations of the data.
5. Monitoring, Detection, and Incident Response
The Watchtower. Assume breach. Find it fast.
➤ Build a real-time threat detection system (SIEM)
The Pipeline:
- Collection: Fluentd/Logstash sidecars on every container collect logs.
- Normalization: Convert unstructured logs (“Error: user fail”) into JSON (
{"event": "login_fail", "user": "bob"}). - Stream Processing: Kafka -> Flink/Spark Streaming. Real-time analysis.
- Detection Logic: “If
login_failcount > 50 within 1 minute for single IP, emit alert.” - Storage: Hot storage (OpenSearch/Elasticsearch) for recent logs. Cold storage (S3) for long-term retention.
➤ Create an automated anomaly detection pipeline (UBA)
User Behavior Analytics (UBA):
- Baseline: For 30 days, learn that Bob usually logs in from London between 9 AM and 6 PM, accessing the HR portal.
- Anomaly: Bob logs in from North Korea at 3 AM and downloads 5GB of data from the Code Repository.
- Math: Use standard deviation or Isolation Forests (Machine Learning) to score the “weirdness” of an event.
➤ Build a SOC alert prioritization model
Alert Fatigue: If you send 1000 alerts a day, the analyst ignores all of them.
Risk Scoring:
- Contextual Enrichment: When an alert fires, auto-query the CMDB. Is this a Test server or the Prod Payment DB?
- Scoring:
Port ScanonTest Server= Info (Score 1).Port ScanonProd DB= Critical (Score 10).
- Only page the human for Score > 8.
➤ Create an incident response workflow (SOAR)
Security Orchestration, Automation, and Response (SOAR):
- Playbook: “Phishing Reported.”
- Automation Steps:
- Parse email headers.
- Check URL against VirusTotal API.
- If malicious:
- Use API to delete email from all employee inboxes (Exchange/Gmail API).
- Block domain at the Firewall.
- Reset password of the user who clicked.
- Only then notify the analyst “Phishing attack neutralized.”
6. Infrastructure and Cloud Security
The Platform. Securing the ground the applications stand on.
➤ Design a secure CI/CD pipeline (Supply Chain Security)
The Risk: Attackers inject code into your build pipeline (SolarWinds style).
SLSA (Supply-chain Levels for Software Artifacts):
- Signed Commits: Developers must sign git commits with GPG/SSH keys.
- Ephemeral Build Agents: The build server is created from scratch for every build and destroyed afterwards. No persistent malware.
- Hermetic Builds: The build process has no internet access. All dependencies must be pre-fetched and verified against checksums (lockfiles).
- Provenance: The build server signs the final binary. “I, the Build Server, certify that this binary came from Git Commit
abc1234.”
➤ Build policy enforcement for Kubernetes (OPA)
Open Policy Agent (OPA): Policy as Code.
- The Admission Controller: Before a Pod is deployed to the cluster, Kubernetes asks OPA: “Is this allowed?”
- Policies:
denyif image comes fromdocker.io(must use internal registry).denyifprivileged: true.denyif noresource limitsare set (DoS protection).

➤ Design image integrity checks (Container Security)
The Problem: FROM node:latest. What is in that image? Mining malware?
The Solution:
- Distroless Images: Use base images that contain only your app and its runtime dependencies. No shell (
/bin/bash), no package manager (apt). If an attacker gets RCE, they have no tools to use. - Image Scanning: Clair/Trivy scans the image layers for known CVEs daily.
- Image Signing (Cosign): Sign your images. Configure the Kubernetes cluster to only run images signed by your private key.
➤ Create automated guardrails for Cloud (AWS/GCP)
Guardrails > Gates. A gate stops you. A guardrail keeps you safe while moving fast.
- Service Control Policies (SCPs): AWS Org-level rules. “No one, not even root, can disable CloudTrail logs.”
- Event-Driven Remediation:
- Event:
CreateBucket(Public). - Lambda: Detects event. Immediately calls
PutBucketAcl(Private). Notifies user: “I fixed your insecure bucket.”
- Event:
➤ Design a secure boot flow
Chain of Trust:
- Hardware Root of Trust: The CPU checks the signature of the UEFI firmware.
- Bootloader: Firmware checks signature of the Bootloader (GRUB).
- Kernel: Bootloader checks signature of the Linux Kernel.
- Drivers: Kernel checks signatures of kernel modules.
- TPM (Trusted Platform Module): A chip on the motherboard. It records the hash of every component loaded. If a rootkit modifies the kernel, the hash changes, and the TPM refuses to release the decryption keys for the hard drive. The laptop becomes a brick.
7. Product and Consumer Security
Protecting the user from themselves and others.
➤ Build an anti-fraud system for payments
The Signal:
- Velocity: $5000 transaction? Maybe okay. 50 transactions of $100 in 1 minute? Fraud.
- Geography: Card issued in USA, transaction from Nigeria.
- Device: “This device has been associated with 50 different credit cards.”
➤ Design phishing-resistant user onboarding
The Problem: Users choose “Password123”.
The Fix:
- NIST Guidelines: Stop enforcing complex rules (Must have !@#$). They just make users write passwords on sticky notes. Instead, enforce Length (min 12 chars) and check against Breached Password Lists (HaveIBeenPwned API).
- Passkeys First: Prompt the user to create a Passkey (Biometric) during signup. Make the password the fallback.
➤ Create a real-time abuse prevention system
Scenario: A bot is posting spam comments on your platform.
Defense:
- Shadow Banning: Do not tell the bot they are banned. Let them post. Show the post only to them. To everyone else, it doesn’t exist. This wastes the attacker’s time trying to figure out if their script is working.
- Proof of Work (PoW): If a user posts too fast, send a JavaScript challenge to their browser. “Calculate this heavy hash.” It takes the browser 5 seconds. It slows the bot down to a crawl but is invisible to humans.
➤ Design security review tooling (Threat Modeling)
You cannot review every line of code manually.
Self-Service Threat Modeling:
- Create a questionnaire for developers:
- “Does this service handle PII?”
- ” is it exposed to the internet?”
- “Does it parse XML?”
- Based on answers, generate a risk score. High risk = Security Team Review required. Low risk = Automated checks only.
Closing Thoughts
Security Engineering is not about running tools. Anyone can run nmap. Security Engineering is about Architecture. It is about understanding constraints.
- How do I secure this without slowing down developers?
- How do I encrypt this without adding 500ms latency?
- How do I block attackers without blocking valid users?
If you are preparing for interviews in 2025, pick one of the problems above. Draw the diagram on a whiteboard. Write the pseudo-code. Explain why you chose Redis over Postgres for rate limiting. Explain why you chose Argon2 over SHA-256.
That is the job.
If you found this guide helpful, share it with someone trying to break into security engineering. The industry needs more engineers who understand systems, not just scripts.








