Database Internals

20 deep questions — indexes, locking, transactions, scaling, real-world patterns

Score0 / 0
1 of 20
What are the four ACID properties?
AAuthentication, Caching, Indexing, Distribution
BAtomicity, Caching, Isolation, Distribution
CAtomicity, Consistency, Isolation, Durability
DAvailability, Consistency, Isolation, Durability
Hint

A = all or nothing. C = valid state. I = no interference. D = survives crash.

Detailed explanation

Atomicity — all or nothing: a transaction either fully completes or fully rolls back. If a transfer debits account A but crashes before crediting account B, the entire transaction rolls back — A's money is restored. No partial state. Implementation: PostgreSQL uses WAL (write-ahead log) — changes logged before applying, enabling rollback on failure.

Consistency — valid state transitions: the database moves from one valid state to another. Constraints (UNIQUE, CHECK, FK) are enforced. A transaction that violates a constraint is rejected. Example: you can't insert a negative balance if CHECK (balance >= 0) exists.

Isolation — transactions don't interfere: concurrent transactions behave as if they ran sequentially. Without isolation, one transaction could read another's uncommitted changes (dirty read). PostgreSQL offers four isolation levels with increasing strictness:

READ UNCOMMITTED → dirty reads possible (PostgreSQL treats as READ COMMITTED)
READ COMMITTED   → each statement sees latest committed data (PG default)
REPEATABLE READ  → snapshot from transaction start, no phantom reads
SERIALIZABLE     → strictest, as if transactions ran one by one

Durability — committed = permanent: once a transaction is committed, it survives power failure, OS crash, hardware failure. PostgreSQL achieves this by writing to WAL on disk before returning "COMMIT OK." Even if the server crashes immediately after, WAL replay recovers the data.

Why this matters for fintech: financial systems REQUIRE full ACID. A transfer that debits but doesn't credit = lost money. A balance read during a concurrent write = potential double-spend. ACID guarantees prevent these scenarios. NoSQL databases often trade ACID for performance — unacceptable for a ledger.

2 of 20
What is the difference between READ COMMITTED and REPEATABLE READ?
AREAD COMMITTED is faster — no other difference
BREAD COMMITTED: each query sees latest committed data (may change between queries). REPEATABLE READ: sees snapshot from transaction start
CREAD COMMITTED allows writes, REPEATABLE READ is read-only
DThey are identical in PostgreSQL
Hint

Two identical SELECTs in one transaction — can they return different results?

Detailed explanation

The scenario that shows the difference:

-- Transaction A:                    -- Transaction B:
BEGIN;                                
SELECT balance FROM accounts          
  WHERE id = 'A1';                   
-- returns 1000                      
                                     BEGIN;
                                     UPDATE accounts SET balance = 500
                                       WHERE id = 'A1';
                                     COMMIT;
                                     
SELECT balance FROM accounts          
  WHERE id = 'A1';                   
-- READ COMMITTED:  returns 500  (sees B's commit!)
-- REPEATABLE READ: returns 1000 (still sees original snapshot!)
COMMIT;

READ COMMITTED (PostgreSQL default): each statement gets a fresh snapshot. If another transaction commits between your two SELECTs, you see the change. This is usually fine — most business logic doesn't depend on reading the same value twice within one transaction.

REPEATABLE READ: the entire transaction uses a snapshot taken at the start. You see a consistent view of the data throughout. No "non-repeatable reads" or "phantom reads." But: if you try to UPDATE a row that another transaction already modified → serialization error → your transaction must retry.

SERIALIZABLE: strictest level. Guarantees that the result is equivalent to running transactions one after another. Detects write skew anomalies. Most serialization errors → most retries needed. Rarely used outside of very critical financial calculations.

In practice for banking:

• Balance check + debit: use SELECT FOR UPDATE (pessimistic lock) at READ COMMITTED level — lock the row, no need for REPEATABLE READ.

• Reports/aggregations: REPEATABLE READ gives consistent snapshot across multiple queries — "total assets" doesn't change mid-report.

• Most CRUD: READ COMMITTED is sufficient and has the least overhead.

3 of 20
What types of indexes exist in PostgreSQL and when to use each?
AOnly BTree — the universal index type
BBTree and Hash — cover all cases
CBTree, Hash, and Full-text only
DBTree (general), BRIN (append-only time data), Hash (equality only), GIN (full-text/JSON), GiST (spatial)
Hint

Different data patterns need different index structures. One size doesn't fit all.

Detailed explanation

BTree (default): balanced tree structure. Supports =, <, >, <=, >=, BETWEEN, ORDER BY, LIKE 'prefix%'. Universal workhorse. One entry per row → large for big tables.

CREATE INDEX idx_account ON transfers (account_id);
-- Good for: WHERE account_id = 'A1'
-- Good for: ORDER BY account_id
-- Size: proportional to number of rows

BRIN (Block Range Index): stores min/max per block of pages. 100x smaller than BTree. Only works when data is physically ordered (append-only tables where new rows have increasing values).

CREATE INDEX idx_created ON journal USING BRIN (created_at);
-- Perfect for: WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
-- On append-only ledger: data naturally time-ordered
-- 100M rows: BTree ~3GB, BRIN ~30MB
-- Does NOT work for: WHERE created_at = specific_timestamp (imprecise)

Hash: hash table. Only supports exact equality (=). Slightly faster than BTree for equality, but can't do range queries, can't do ORDER BY. Rarely used — BTree handles equality well enough.

CREATE INDEX idx_key ON cache USING HASH (key);
-- Only for: WHERE key = 'exact-value'
-- Cannot do: WHERE key > 'something'

GIN (Generalized Inverted Index): maps values to rows that contain them. For full-text search, JSONB fields, arrays.

CREATE INDEX idx_search ON articles USING GIN (to_tsvector('english', body));
CREATE INDEX idx_tags ON products USING GIN (tags);  -- array column
CREATE INDEX idx_meta ON events USING GIN (metadata);  -- JSONB column

GiST (Generalized Search Tree): for spatial data (PostGIS geometry), range types, nearest-neighbor queries.

For a financial service — recommended indexes:

-- Account lookup (BTree):
CREATE INDEX idx_transfers_account ON transfers (account_id);

-- Time range on append-only ledger (BRIN — 100x smaller):
CREATE INDEX idx_transfers_time ON transfers USING BRIN (created_at);

-- Combined: "recent transfers for account" (composite BTree):
CREATE INDEX idx_transfers_acct_time ON transfers (account_id, created_at DESC);

-- Idempotency deduplication (UNIQUE BTree):
CREATE UNIQUE INDEX idx_transfers_idemp ON transfers (idempotency_key);

-- Partial: only active/pending transfers (smaller, faster):
CREATE INDEX idx_pending ON transfers (id) WHERE status = 'PENDING';
4 of 20
Pessimistic vs optimistic locking — when to use each?
APessimistic (SELECT FOR UPDATE) for financial transfers. Optimistic (version column) for profile updates. Atomic UPDATE WHERE for single-row operations
BAlways use optimistic — better performance
CAlways use pessimistic — safer
DUse neither — eventual consistency is sufficient
Hint

Money transfer: correctness over throughput. Profile edit: rare conflicts, retry is cheap.

Detailed explanation

Pessimistic locking — "lock first, ask questions later":

BEGIN;
SELECT balance FROM accounts WHERE id = 'A1' FOR UPDATE;
-- Row is LOCKED. Other transactions trying to read FOR UPDATE will WAIT.
-- balance = 1000

-- Application logic:
-- 1000 >= 800? Yes → proceed

UPDATE accounts SET balance = 200 WHERE id = 'A1';
COMMIT;
-- Lock released. Next waiting transaction proceeds.
-- It reads balance = 200 → 200 >= 500? No → REJECT

When to use: high contention (many concurrent writes to same row), correctness is critical (money, inventory), cost of conflict is high (complex business logic already executed).

Optimistic locking — "hope for the best, check at write time":

-- Table has a version column:
SELECT balance, version FROM accounts WHERE id = 'A1';
-- balance = 1000, version = 5

-- Application logic (no lock held):
-- 1000 >= 800? Yes → proceed

UPDATE accounts SET balance = 200, version = 6 
WHERE id = 'A1' AND version = 5;
-- affected_rows = 1 → success (nobody changed it)
-- affected_rows = 0 → someone else modified → RETRY entire operation

When to use: low contention (profile updates, comments), reads vastly outnumber writes, retries are cheap (simple operations).

Atomic UPDATE WHERE — the third option:

UPDATE accounts SET balance = balance - 800 
WHERE id = 'A1' AND balance >= 800;
-- One statement. Atomic. No separate SELECT needed.
-- affected_rows = 1 → success
-- affected_rows = 0 → insufficient funds

-- PostgreSQL bonus:
UPDATE accounts SET balance = balance - 800 
WHERE id = 'A1' AND balance >= 800
RETURNING balance;  -- returns new balance in same round trip

When to use: single-row operations, simple conditions, no complex business logic between read and write. The database does check + update atomically. Simplest and often the best choice.

Decision matrix:

Scenario                    Best approach
Simple debit (one row)      Atomic UPDATE WHERE
Transfer between accounts   Pessimistic (SELECT FOR UPDATE)
Profile update              Optimistic (version column)
Last-item inventory         Pessimistic
Wiki page edit              Optimistic
Batch processing            Optimistic (retry on conflict)
5 of 20
What is the N+1 query problem?
AA query that returns N+1 rows
BA deadlock between N+1 transactions
CAn index that needs N+1 lookups
D1 query fetches N parents, then N separate queries fetch each parent's children. Fix: JOIN or batch fetch
Hint

Load 100 users → for each user load their orders = 1 + 100 queries. Should be 1 JOIN.

Detailed explanation

The problem — hidden in ORM lazy loading:

// Kotlin + Hibernate:
val users = userRepository.findAll()          // Query 1: SELECT * FROM users
for (user in users) {                         // 100 users
    println(user.orders.size)                 // Query 2-101: SELECT * FROM orders WHERE user_id = ?
}
// Total: 101 queries! Each round-trip: ~1-5ms
// 101 * 3ms = 303ms. Should be ~5ms with one JOIN.

Why it happens: Hibernate's lazy loading fetches related entities only when accessed. Looks elegant in code (user.orders) but generates a separate SQL query for EACH parent entity. The ORM hides the cost.

Solutions:

-- Solution 1: JOIN FETCH (eager in one query)
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id
-- One query, all data. But: large result set if many orders per user.

-- Solution 2: Batch fetching (Hibernate @BatchSize)
@BatchSize(size = 25)
-- Loads orders in batches: SELECT * FROM orders WHERE user_id IN (?, ?, ..., ?)
-- 100 users / 25 per batch = 5 queries instead of 100

-- Solution 3: Entity graph (JPA)
@EntityGraph(attributePaths = ["orders"])
fun findAll(): List<User>

-- Solution 4: jOOQ / Exposed (manual control)
dsl.select(USERS.NAME, DSL.count(ORDERS.ID))
   .from(USERS)
   .leftJoin(ORDERS).on(ORDERS.USER_ID.eq(USERS.ID))
   .groupBy(USERS.NAME)
   .fetch()
-- You write the SQL. You SEE the SQL. No surprises.

Why fintech prefers jOOQ/Exposed over Hibernate: N+1 is invisible with Hibernate. With jOOQ you write every query explicitly — impossible to accidentally generate 100 queries. In financial systems, every millisecond and every query matters. Full SQL control = no surprises.

How to detect: enable SQL logging (spring.jpa.show-sql=true), use P6Spy or datasource-proxy to count queries per request. If you see the same query repeated 100 times with different parameters → N+1.

6 of 20
What is a composite index and when is column order important?
AColumn order doesn't matter — the optimizer handles it
BA composite index is an index on a computed column
CIndex on multiple columns. (A, B) supports queries on A alone or A+B, but NOT B alone — leftmost prefix rule
DAn index that spans multiple tables
Hint

Phone book sorted by (last_name, first_name). Find all Smiths? Easy. Find all Johns? Must scan everything.

Detailed explanation

How a composite index is structured:

CREATE INDEX idx ON transfers (account_id, created_at DESC);

-- Internally sorted like:
-- account_id | created_at
-- A1         | 2026-04-24
-- A1         | 2026-04-23
-- A1         | 2026-04-22
-- A2         | 2026-04-24
-- A2         | 2026-04-20
-- B1         | 2026-04-24

-- First sorted by account_id, then by created_at within each account

Leftmost prefix rule — which queries can use this index:

✅ WHERE account_id = 'A1'
   → Jump to A1 section → scan. Uses index.

✅ WHERE account_id = 'A1' AND created_at > '2026-04-01'
   → Jump to A1 section → scan from date. Uses index perfectly.

✅ WHERE account_id = 'A1' ORDER BY created_at DESC
   → Jump to A1 section → already sorted. Index covers both filter AND sort.

❌ WHERE created_at > '2026-04-01'  (without account_id)
   → created_at is the SECOND column. Data is NOT sorted by created_at globally.
   → Must scan entire index. Practically useless.

❌ ORDER BY created_at  (without account_id filter)
   → Same problem. Global order is by account_id, not created_at.

Column order strategy:

1. Put the equality filter column first (account_id = 'A1')

2. Put the range/sort column second (created_at DESC)

3. More selective columns first (fewer matching rows = faster)

For banking: (account_id, created_at DESC) is the #1 index — supports "show me recent transactions for this account" which is the most common query in any banking app.

7 of 20
What is a partial index?
AAn index that only covers some columns
BAn index with a WHERE clause — only indexes rows matching the condition. Smaller and faster
CAn index partially loaded into memory
DAn incomplete index still being built
Hint

CREATE INDEX ... WHERE status = 'PENDING' — only index the rows you actually query.

Detailed explanation

The problem: you have 100M transfers. 99.9% are COMPLETED. You frequently query pending transfers. A full index on status wastes space indexing 99.9M completed rows you never query by status.

-- Full index (wasteful):
CREATE INDEX idx_status ON transfers (status);
-- Indexes ALL 100M rows. 99.9% of entries are useless.

-- Partial index (efficient):
CREATE INDEX idx_pending ON transfers (id, created_at) 
WHERE status = 'PENDING';
-- Only indexes ~100K pending rows. 1000x smaller.
-- Queries: SELECT * FROM transfers WHERE status = 'PENDING' → uses tiny index

More real-world examples:

-- Only unprocessed messages in outbox:
CREATE INDEX idx_outbox_unpublished ON outbox (created_at)
WHERE published = false;
-- Outbox relay only queries unpublished rows. Index is tiny.

-- Only active users (skip deleted/suspended):
CREATE INDEX idx_active_email ON users (email)
WHERE status = 'ACTIVE';

-- Only recent failed transactions (for retry queue):
CREATE INDEX idx_recent_failed ON transfers (created_at)
WHERE status = 'FAILED' AND created_at > NOW() - INTERVAL '7 days';

Benefits: smaller index = less disk, fits in memory, faster updates (fewer entries to maintain on INSERT/UPDATE), faster queries (scan less data).

Limitation: the query's WHERE clause must match the index's WHERE clause for PostgreSQL to use it. WHERE status = 'COMPLETED' won't use the partial index on WHERE status = 'PENDING'.

8 of 20
What is database sharding?
ACreating read replicas for load balancing
BSplitting data across multiple database instances by a shard key. Each shard holds a subset of data
CCompressing old data to save disk space
DEncrypting data at the column level
Hint

One DB can't handle 100M accounts. Split: accounts A-M on DB1, N-Z on DB2.

Detailed explanation

Sharding vs replication vs partitioning:

Replication:   Same data on multiple servers (for reads and HA)
               Primary   → Replica 1, Replica 2
               All have ALL data. Writes go to primary only.
Partitioning:  Split data within ONE server
               transfers_2024_q1, transfers_2024_q2, ...
               Same server, smaller tables.
Sharding:      Split data across MULTIPLE servers
               Shard 1 (accounts A-M) on Server 1
               Shard 2 (accounts N-Z) on Server 2
               Different servers, each has SUBSET of data.

How sharding works:

Shard key: account_id
Hash function: hash(account_id) % num_shards

account_id = "A1" → hash("A1") % 4 = 2 → Shard 2
account_id = "B5" → hash("B5") % 4 = 0 → Shard 0

Application or proxy layer routes queries to correct shard:
  SELECT * FROM transfers WHERE account_id = 'A1'
  → Route to Shard 2 → fast, small dataset

Challenges — this is why you avoid sharding until necessary:

Cross-shard queries: "total balance across all accounts" requires querying ALL shards and aggregating. Slow.

Cross-shard transactions: transfer from A1 (Shard 2) to B5 (Shard 0) — can't use a single DB transaction. Need distributed saga.

Rebalancing: adding a new shard means moving data. Consistent hashing helps but adds complexity.

JOINs: can't JOIN across shards efficiently. Need to denormalize data.

Alternatives to manual sharding: CockroachDB, Vitess (YouTube's MySQL sharding layer), Citus (PostgreSQL extension), Amazon Aurora (handles scaling automatically). These provide sharding-like scaling without you managing it.

Rule of thumb: scale vertically first (bigger machine). Add read replicas. Use table partitioning. Shard only when you hit the ceiling of a single PostgreSQL instance (~10TB, ~100K transactions/second). Most companies never need sharding.

9 of 20
What is WAL (Write-Ahead Log)?
AA log of user access for security auditing
BA query execution log for debugging
CChanges written to a log BEFORE applying to data files. Ensures durability — after crash, replay WAL to recover
DA cache layer in front of the database
Hint

Power failure during write. How does PostgreSQL know what was committed and what wasn't?

Detailed explanation

The problem without WAL: you UPDATE a row. PostgreSQL modifies the data file on disk. Power fails mid-write. The file is now corrupted — half-written data. Database is inconsistent.

How WAL solves it:

Step 1: Write the CHANGE to WAL file (sequential write — fast)
        WAL entry: "Set row 42 balance from 1000 to 700"
Step 2: Return "COMMIT OK" to client
        (data file NOT yet updated!)
Step 3: Later (checkpoint): apply WAL changes to actual data files
        (async, in background)

Power fails after Step 2, before Step 3:
  → Data file still has old value (1000)
  → WAL has the change recorded
  → On restart: PostgreSQL replays WAL → applies change → data file updated
  → No data lost! ACID Durability guaranteed.

Why WAL is fast: WAL writes are sequential (append-only to end of file). Data file writes are random (row could be anywhere on disk). Sequential I/O is 100x faster than random I/O on HDD, 10x on SSD. So: fast sequential WAL write → respond to client → slow random data write happens later in background.

WAL is used for MORE than just crash recovery:

Streaming replication: primary sends WAL records to replicas → replicas apply them → stay in sync. This is how PostgreSQL read replicas work.

Point-in-time recovery (PITR): take a base backup + archive WAL files. To restore to "yesterday at 3pm": restore base backup → replay WAL up to 3pm.

Change Data Capture (CDC): tools like Debezium read WAL to capture changes and publish to Kafka. This is the basis of the outbox pattern without an actual outbox table.

Logical replication: replicate specific tables to another PostgreSQL instance by streaming decoded WAL.

Analogy: WAL is like a journal/diary. Before doing anything, you write down what you're going to do. If you're interrupted, you can read the journal and finish the work. The actual work (data files) can happen whenever — the journal ensures nothing is forgotten.

10 of 20
What is connection pooling and why is it critical?
ACaching query results across connections
BSharing data between database instances
CReusing pre-established DB connections instead of creating new ones per request. Opening a connection costs 5-50ms
DPooling queries to execute in batch
Hint

TCP handshake + TLS negotiation + PostgreSQL auth = 10-50ms. At 10,000 requests/sec that's a problem.

Detailed explanation

Cost of opening a connection:

1. TCP handshake              ~1ms
2. TLS negotiation            ~5-10ms
3. PostgreSQL authentication  ~2-5ms  
4. Connection setup           ~1-2ms
Total:                        ~10-20ms per connection

At 10,000 requests/second:
  Without pool: 10,000 × 15ms = 150 seconds of connection overhead!
  With pool:    10,000 × 0ms = connections already open, just grab one

How connection pooling works:

Application starts:
  Pool creates 10 connections to PostgreSQL → keeps them open

Request arrives:
  1. Get connection from pool     (~0.01ms — just grab from list)
  2. Execute query                (~5ms)
  3. Return connection to pool    (~0.01ms — put back in list)
  Connection stays open for next request

Pool configuration:
  minimumIdle = 5     (always keep 5 connections ready)
  maximumPoolSize = 20 (never open more than 20)
  connectionTimeout = 30s (wait max 30s for a free connection)
  idleTimeout = 10min (close idle connections after 10 minutes)

HikariCP is the standard for JVM (default in Spring Boot). Fastest connection pool, battle-tested. Configuration:

spring:
  datasource:
    hikari:
      maximum-pool-size: 20    # max connections
      minimum-idle: 5          # min idle connections  
      connection-timeout: 30000  # 30s wait for connection
      idle-timeout: 600000       # 10min before closing idle

Common mistake — too many connections: PostgreSQL handles ~100-300 concurrent connections well. 10 app instances × 20 pool size = 200 connections. More than ~300 → PostgreSQL degrades (memory per connection, context switching). Solution: use PgBouncer (external connection pooler) between app and PostgreSQL to multiplex many app connections onto fewer DB connections.

With coroutines: Kotlin coroutines share a small thread pool. But each coroutine doing DB work still needs a connection from the pool. Semaphore can limit concurrent DB calls: val dbSemaphore = Semaphore(20) matching pool size.

11 of 20
What is a deadlock in a database?
ATwo transactions each waiting for a lock held by the other. Neither can proceed. DB detects and kills one
BA query that runs forever
CA table that can't be written to
DA connection pool exhausted
Hint

TX1 locks row A, waits for B. TX2 locks row B, waits for A. Both stuck forever.

Detailed explanation

How it happens:

TX1: SELECT * FROM accounts WHERE id = 'A1' FOR UPDATE;  -- locks A1
TX2: SELECT * FROM accounts WHERE id = 'A2' FOR UPDATE;  -- locks A2
TX1: SELECT * FROM accounts WHERE id = 'A2' FOR UPDATE;  -- WAITS for TX2 to release A2
TX2: SELECT * FROM accounts WHERE id = 'A1' FOR UPDATE;  -- WAITS for TX1 to release A1

TX1 waits for TX2. TX2 waits for TX1. DEADLOCK!

PostgreSQL detects and resolves: deadlock detector runs periodically (~1 second). When detected, PostgreSQL kills one transaction with: ERROR: deadlock detected. The victim transaction is rolled back. The other proceeds.

Prevention — lock ordering:

-- ALWAYS lock in consistent order (e.g., by account ID):
-- Both TX1 and TX2 lock A1 first, then A2:
TX1: SELECT ... WHERE id = 'A1' FOR UPDATE;  -- locks A1
TX2: SELECT ... WHERE id = 'A1' FOR UPDATE;  -- WAITS for TX1 (not deadlock — just waiting)
TX1: SELECT ... WHERE id = 'A2' FOR UPDATE;  -- locks A2, proceeds
TX1: COMMIT;                                  -- releases both
TX2: locks A1, then A2, proceeds             -- no deadlock!

This is exactly why our Kotlin banking tutorial sorts accounts by ID before locking:

val (first, second) = if (from.id < to.id) from to to else to to from
first.mutex.withLock { second.mutex.withLock { ... } }

Other prevention strategies: lock timeout (SET lock_timeout = '5s' — give up after 5s instead of waiting forever), reduce transaction duration (hold locks for shortest time), use atomic UPDATE WHERE (no explicit locks).

12 of 20
What does EXPLAIN ANALYZE do?
AAnalyzes table statistics for the optimizer
BShows the table schema
CExplains SQL syntax errors
DActually executes the query and shows the real execution plan with timings, row counts, and index usage
Hint

"Why is this query slow?" → EXPLAIN ANALYZE shows exactly what happened.

Detailed explanation

EXPLAIN vs EXPLAIN ANALYZE:

EXPLAIN SELECT * FROM transfers WHERE account_id = 'A1';
-- Shows PLANNED execution (estimated rows, estimated cost)
-- Does NOT actually run the query

EXPLAIN ANALYZE SELECT * FROM transfers WHERE account_id = 'A1';
-- ACTUALLY RUNS the query
-- Shows REAL timings, REAL row counts, REAL execution plan
-- Much more useful for debugging

Reading the output — what to look for:

EXPLAIN ANALYZE SELECT * FROM transfers WHERE account_id = 'A1' 
AND created_at > '2026-01-01';

Index Scan using idx_transfers_acct_time on transfers
  (cost=0.43..156.32 rows=127 width=96)
  (actual time=0.028..0.156 rows=142 loops=1)
  Index Cond: (account_id = 'A1' AND created_at > '2026-01-01')
Planning Time: 0.152 ms
Execution Time: 0.198 ms

Good signs: "Index Scan" or "Index Only Scan" — index is being used. Low actual time. Row estimate close to actual rows.

Bad signs:

Seq Scan on transfers  (cost=0.00..1856432.00 rows=100000000)
  (actual time=0.024..45230.156 rows=142 loops=1)
  Filter: (account_id = 'A1')
  Rows Removed by Filter: 99999858
-- Seq Scan = sequential scan = NO INDEX USED
-- Scanned 100M rows to find 142. Took 45 seconds.
-- Fix: CREATE INDEX idx ON transfers (account_id)

Other bad signs: "Nested Loop" on large tables (exponential!), "Sort" with "external merge Disk" (data too large for memory sort), "Hash Join" with very uneven bucket sizes.

Pro tip: use EXPLAIN (ANALYZE, BUFFERS, FORMAT YAML) for detailed buffer hit/miss information — shows if data came from memory cache or disk.

13 of 20
What is table partitioning?
ASame as sharding — splitting across servers
BSplitting one table into smaller physical tables by a partition key (e.g. month). Same server, transparent to queries
CSplitting columns into multiple tables
DCreating multiple indexes on a table
Hint

transfers_2024_01, transfers_2024_02, ... managed automatically by PostgreSQL.

Detailed explanation

The problem: a journal_entries table with 500M rows. Queries slow down, indexes are huge, VACUUM takes hours, backups grow. But you can't drop old data — regulatory requirement to keep 7 years.

Table partitioning — one logical table, many physical tables:

-- Create partitioned table:
CREATE TABLE journal_entries (
    id BIGINT,
    account_id TEXT,
    amount DECIMAL,
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

-- Create partitions:
CREATE TABLE journal_2024_q1 PARTITION OF journal_entries
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE journal_2024_q2 PARTITION OF journal_entries
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- ... etc

-- Queries work transparently:
SELECT * FROM journal_entries WHERE created_at > '2024-06-01';
-- PostgreSQL automatically scans only Q2 and later partitions
-- Skips Q1 entirely → "partition pruning"

Benefits:

Query performance: scan only relevant partitions. Query for April data skips 11 other months.

Index size: each partition has its own index. Smaller indexes = fit in memory, faster lookups.

Maintenance: VACUUM one partition at a time. Reindex one partition. Less impact on running queries.

Archival: ALTER TABLE journal_entries DETACH PARTITION journal_2020_q1 — instant. Move to cold storage. DROP TABLE journal_2020_q1 — instant (vs DELETE 100M rows = hours).

Partition types: RANGE (by date — most common), LIST (by country, status), HASH (even distribution). For financial data: RANGE by created_at (monthly or quarterly).

Partitioning vs sharding: partitioning = same server, automatic routing. Sharding = different servers, needs application-level routing. Start with partitioning. Shard only when one server isn't enough.

14 of 20
What is a materialized view?
AA precomputed query result stored on disk. Fast reads, but must be refreshed when source data changes
BA view that prevents data modification
CA temporary table for the session
DA view that pulls from external systems
Hint

Regular view: re-runs query every time. Materialized view: caches the result on disk, reads instantly.

Detailed explanation

Regular view vs materialized view:

-- Regular view: just a saved query. Runs EVERY time you SELECT from it.
CREATE VIEW account_balances AS
SELECT account_id, SUM(amount) as balance FROM journal_entries GROUP BY account_id;
-- SELECT * FROM account_balances → runs SUM on 500M rows every time = 30 seconds

-- Materialized view: stores the RESULT on disk.
CREATE MATERIALIZED VIEW account_balances AS
SELECT account_id, SUM(amount) as balance FROM journal_entries GROUP BY account_id;
-- First creation: 30 seconds (computes once)
-- SELECT * FROM account_balances → reads from disk = 5ms
-- But: data is STALE until you refresh

Refreshing:

-- Full refresh (locks view during refresh):
REFRESH MATERIALIZED VIEW account_balances;

-- Concurrent refresh (no lock, but needs UNIQUE index):
CREATE UNIQUE INDEX idx_mv_account ON account_balances (account_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY account_balances;
-- Users can read during refresh. New data appears atomically.

Refresh strategies:

Scheduled: cron job every 5 minutes. Simple, predictable staleness.

Event-driven: after INSERT to journal_entries, trigger refresh. Fresher but more load.

On-demand: application calls REFRESH when it knows data changed. Most control.

This IS CQRS at the database level: journal_entries = write model (source of truth). account_balances materialized view = read model (optimized for fast reads, slightly stale). The same pattern we discussed in distributed systems, but within a single PostgreSQL instance.

Alternative to materialized views: Redis cache (faster, more flexible, but separate system to manage), read replicas (eventual consistency, no aggregation), application-level cache (in-memory, fastest, lost on restart).

15 of 20
What is the atomic UPDATE with a WHERE condition?
AUPDATE SET balance = balance - 800 WHERE id = 'A1' AND balance >= 800 — check + update in one atomic operation
BAn UPDATE that can't be rolled back
CAn UPDATE that runs on multiple rows simultaneously
DAn UPDATE with automatic retry on failure
Hint

For single-row operations: no SELECT FOR UPDATE needed. Database handles atomicity.

Detailed explanation

Three approaches to "debit if sufficient funds" — from complex to simple:

-- Approach 1: Pessimistic (2 statements, explicit lock):
BEGIN;
SELECT balance FROM accounts WHERE id = 'A1' FOR UPDATE;  -- lock + read
-- app checks: 1000 >= 800? yes
UPDATE accounts SET balance = 200 WHERE id = 'A1';         -- write
COMMIT;
-- 2 round trips. Explicit lock management. Needed for multi-row operations.

-- Approach 2: Optimistic (2 statements, version check):
SELECT balance, version FROM accounts WHERE id = 'A1';     -- read (no lock)
-- app checks: 1000 >= 800? yes
UPDATE accounts SET balance = 200, version = 6 WHERE id = 'A1' AND version = 5;
-- affected_rows = 0? retry entire operation
-- 2 round trips. Retry logic needed.

-- Approach 3: Atomic UPDATE (1 statement, simplest):
UPDATE accounts SET balance = balance - 800 
WHERE id = 'A1' AND balance >= 800;
-- 1 round trip. Atomic. No explicit lock. No retry.
-- affected_rows = 1 → success
-- affected_rows = 0 → insufficient funds

Why this works: PostgreSQL automatically acquires a row-level lock during UPDATE. The WHERE condition (balance >= 800) is checked atomically with the update. Two concurrent UPDATEs: one waits for the other to finish, then checks the condition with the new balance.

PostgreSQL RETURNING bonus:

UPDATE accounts SET balance = balance - 800 
WHERE id = 'A1' AND balance >= 800
RETURNING id, balance as new_balance;
-- Returns: (A1, 200) if succeeded
-- Returns: empty set if insufficient funds
-- Combined UPDATE + SELECT in one round trip!

When atomic UPDATE is NOT enough: multi-row operations (transfer between two accounts — need to debit one AND credit another atomically). For those, use SELECT FOR UPDATE within a transaction. Atomic UPDATE only works for single-row, self-contained operations.

16 of 20
What is the difference between DELETE and TRUNCATE?
AThey are identical
BDELETE: row-by-row, logged, triggers fire, can have WHERE. TRUNCATE: deallocates pages instantly, minimal logging, no triggers
CDELETE removes data, TRUNCATE removes the table structure
DTRUNCATE is slower but safer
Hint

DELETE 100M rows: 30 minutes. TRUNCATE 100M rows: instant. Why?

Detailed explanation
DELETE FROM transfers;
-- For EACH of 100M rows:
--   1. Write row deletion to WAL
--   2. Mark row as dead (for VACUUM later)  
--   3. Fire DELETE triggers (if any)
--   4. Update indexes (remove index entries)
-- Time: 30+ minutes for 100M rows
-- Advantage: can have WHERE clause, fires triggers, row-level control

TRUNCATE TABLE transfers;
-- Deallocates ALL data pages at once (like dropping and recreating)
-- No per-row processing
-- No triggers fired
-- Resets sequences/auto-increment (RESTART IDENTITY option)
-- Time: instant (milliseconds), regardless of table size
-- Advantage: fast. Disadvantage: all-or-nothing, no WHERE

PostgreSQL specifics: TRUNCATE IS transactional in PostgreSQL (can be rolled back!). This is different from MySQL where TRUNCATE is NOT transactional. TRUNCATE also acquires an ACCESS EXCLUSIVE lock — blocks all concurrent access.

When to use each:

DELETE: selective removal (WHERE status = 'EXPIRED'), when triggers must fire, within application logic, audit trail needed.

TRUNCATE: wipe entire table (test cleanup, dev reset), drop old partition data, when speed matters and you want ALL rows gone.

Better alternative for old data: table partitioning. DROP TABLE journal_2020_q1 is instant like TRUNCATE, but only removes one quarter of data. No need to TRUNCATE the entire table.

17 of 20
What is the difference between a primary key and a unique constraint?
AThey are the same
BPrimary key is faster
COnly primary key creates an index
DPK: unique + NOT NULL + one per table + defines row identity. UNIQUE: allows NULL + multiple per table
Hint

A table has ONE identity (PK). But can have many uniqueness rules (email, phone, idempotency_key).

Detailed explanation
CREATE TABLE transfers (
    id UUID PRIMARY KEY,                    -- ONE primary key
    idempotency_key TEXT UNIQUE,            -- unique, allows ONE null
    account_id TEXT NOT NULL,               -- not unique
    email TEXT UNIQUE                       -- another unique constraint
);

-- PRIMARY KEY = UNIQUE + NOT NULL + row identity
-- Both PK and UNIQUE create a BTree index automatically
-- PK: only one per table. UNIQUE: as many as needed.

NULL behavior in UNIQUE:

-- UNIQUE allows NULL values:
INSERT INTO transfers (id, idempotency_key) VALUES ('1', NULL);  -- OK
INSERT INTO transfers (id, idempotency_key) VALUES ('2', NULL);  -- OK!
-- Two NULLs allowed because NULL != NULL in SQL

-- PRIMARY KEY does NOT allow NULL:
INSERT INTO transfers (id) VALUES (NULL);  -- ERROR!

In financial systems — both are essential:

Primary key (id): row identity. Used for foreign keys, JOIN operations, API references (GET /transfers/{id}).

UNIQUE (idempotency_key): deduplication. Prevents processing the same transfer twice. Different purpose than PK.

UNIQUE (account_number): business rule enforcement. No two accounts with same number.

UUID vs BIGSERIAL as PK: UUID is better for distributed systems (no coordination needed to generate unique ID). BIGSERIAL is better for single-database (smaller, faster, naturally ordered for BTree). For microservices: UUID. For monolith: BIGSERIAL.

18 of 20
When might you NOT use foreign keys?
AAlways use foreign keys — no exceptions
BMicroservices (each service owns its DB — can't FK across databases), write-heavy append-only tables (FK validation overhead), partitioned tables (FK limitations)
CNever use foreign keys — they are legacy
DForeign keys only work in Oracle
Hint

Each microservice has its own database. FK can't point to a table in another database.

Detailed explanation

What FK does: ensures referential integrity. journal_entries.account_id REFERENCES accounts.id — can't insert a journal entry for a non-existent account. Database enforces this automatically.

When FK is problematic:

1. Microservices architecture: Account Service has accounts table in DB1. Transfer Service has transfers table in DB2. FK from DB2.transfers.account_id to DB1.accounts.id is impossible — different databases, possibly different servers. Integrity enforced at application level or via API calls.

2. High-throughput append-only tables: every INSERT into journal_entries must check if account_id exists in accounts table. At 100K inserts/second, that's 100K lookups/second just for FK validation. Some teams drop FK for performance and enforce integrity via application code + periodic consistency checks.

3. Partitioned tables: PostgreSQL has limitations on FK with partitioned tables (FK on child tables requires the referenced column to be part of the partition key — complex).

4. Schema evolution: FK couples tables tightly. Renaming or restructuring the referenced table requires updating all FK constraints. In rapidly evolving schemas, this friction slows development.

The trade-off:

WITH FK:     database enforces integrity → safe, automatic
             write overhead (validation check on every INSERT)
             tight coupling between tables

WITHOUT FK:  application enforces integrity → requires discipline
             faster writes (no validation check)
             loose coupling
             risk: orphaned records if application has bugs

Practical approach: use FK within a monolith or within a single microservice's database. Don't use FK across service boundaries. For append-only high-throughput tables, consider dropping FK and adding periodic integrity checks (batch job that finds orphaned records).

19 of 20
What is PostgreSQL's RETURNING clause?
AReturns the execution plan
BReturns the number of affected rows
CReturns data from modified rows in the same statement — combines UPDATE/INSERT with SELECT in one round trip
DReturns the query text for debugging
Hint

INSERT ... RETURNING id — get the generated ID without a separate SELECT.

Detailed explanation

Without RETURNING — two round trips:

-- Round trip 1: insert
INSERT INTO transfers (account_id, amount) VALUES ('A1', 100);
-- Round trip 2: get generated ID
SELECT id, created_at FROM transfers WHERE ... -- how to find it?

With RETURNING — one round trip:

INSERT INTO transfers (account_id, amount) VALUES ('A1', 100)
RETURNING id, created_at;
-- Returns: (550e8400-..., 2026-04-24 12:00:00)
-- One statement, one round trip. ID generated by DB returned immediately.

Works with UPDATE too:

-- Atomic debit with result:
UPDATE accounts SET balance = balance - 800 
WHERE id = 'A1' AND balance >= 800
RETURNING id, balance as new_balance;
-- Returns: (A1, 200) → success, new balance is 200
-- Returns: empty set → insufficient funds (no rows matched WHERE)

-- No separate SELECT needed. One round trip.

Works with DELETE:

DELETE FROM sessions WHERE expires_at < NOW()
RETURNING user_id, session_id;
-- Returns all deleted sessions — useful for cleanup logging

In Kotlin/jOOQ:

val result = dsl.update(ACCOUNTS)
    .set(ACCOUNTS.BALANCE, ACCOUNTS.BALANCE.minus(amount))
    .where(ACCOUNTS.ID.eq(accountId))
    .and(ACCOUNTS.BALANCE.ge(amount))
    .returning(ACCOUNTS.BALANCE)
    .fetchOne()

if (result == null) throw InsufficientFundsException()
val newBalance = result[ACCOUNTS.BALANCE]

Note: RETURNING is a PostgreSQL extension — not standard SQL. MySQL doesn't have it (need LAST_INSERT_ID() instead). This is one of many reasons PostgreSQL is preferred for fintech.

20 of 20
SQL vs NoSQL — when to use which?
ANoSQL is always faster
BSQL is always better
CUse NoSQL for everything in 2026
DSQL for ACID transactions, complex queries, relationships. NoSQL for high write throughput, flexible schema, horizontal scaling
Hint

Bank ledger with JOINs → SQL. Billions of sensor readings → NoSQL. Many systems use both.

Detailed explanation

SQL databases (PostgreSQL, MySQL, Oracle):

• ACID transactions — essential for financial data

• Complex JOINs — "all transfers for accounts owned by users in London"

• Strong schema — data integrity enforced by constraints

• Mature tooling — 40+ years of optimization, EXPLAIN ANALYZE, indexes

• Vertical scaling primarily (but partitioning and read replicas help)

NoSQL databases — different categories:

Document store (MongoDB, CouchDB):
  Flexible schema, JSON documents, nested data
  Use for: content management, user profiles, catalog

Key-Value store (Redis, DynamoDB):
  Fastest reads/writes, simple get/put
  Use for: caching, sessions, rate limiting, feature flags

Column-family (Cassandra, HBase):
  Massive write throughput, horizontal scaling
  Use for: time-series data, event logs, IoT sensor data

Graph (Neo4j, Amazon Neptune):
  Relationships as first-class citizens
  Use for: social networks, fraud detection, recommendation engines

In a typical banking system — BOTH:

PostgreSQL:    Ledger (source of truth, ACID, JOINs)
               Account data (constraints, FK)
               
Redis:         Balance cache (fast reads, CQRS read model)
               Session store
               Rate limiting counters
               
Kafka:         Event stream (durable event log)
               
Elasticsearch: Transaction search (full-text, aggregations)

Each technology for what it does best.

Common mistake: choosing NoSQL because it's "modern" or "scalable" for data that needs ACID transactions. Financial data almost always belongs in PostgreSQL. NoSQL is for auxiliary concerns: caching, event storage, search, analytics.

For the interview: "I'd use PostgreSQL for the core ledger — ACID is non-negotiable for financial data. Redis for balance cache and rate limiting. Kafka for event streaming between services. Each tool for its strength."