Coding Tasks

12 practical problems for Synthesized.io interview — with full solutions

Task 1

Topological Sort of FK Graph

Graph / BFSDatabase25 min

Problem: Given tables and foreign key relationships, return tables in loading order (parents first). Detect cycles.

data class FK(val child: String, val parent: String)

// Input:
//   tables = ["order_items", "orders", "users", "products", "departments"]
//   fks = [FK("users","departments"), FK("orders","users"),
//          FK("order_items","orders"), FK("order_items","products")]
// Output: ["departments", "products", "users", "orders", "order_items"]

fun topologicalSort(tables: List<String>, fks: List<FK>): List<String> {
    TODO()
}

fun topologicalSort(tables: List<String>, fks: List<FK>): List<String> {
    val inDegree = tables.associateWith { 0 }.toMutableMap()
    val children = mutableMapOf<String, MutableList<String>>()

    for (fk in fks) {
        inDegree[fk.child] = (inDegree[fk.child] ?: 0) + 1
        children.getOrPut(fk.parent) { mutableListOf() }.add(fk.child)
    }

    val queue = ArrayDeque<String>()
    for ((table, deg) in inDegree) {
        if (deg == 0) queue.add(table)
    }

    val result = mutableListOf<String>()
    while (queue.isNotEmpty()) {
        val table = queue.removeFirst()
        result.add(table)
        for (child in children[table].orEmpty()) {
            inDegree[child] = inDegree[child]!! - 1
            if (inDegree[child] == 0) queue.add(child)
        }
    }

    if (result.size != tables.size) {
        throw IllegalStateException("Cycle: ${tables - result.toSet()}")
    }
    return result
}

Time: O(V + E) — V tables, E foreign keys. Space: O(V + E). Interview: mention level-based parallelism — tables at the same topo level can be processed in parallel.

Task 2

Deterministic Email Masking

MaskingHashing15 min

Problem: Mask emails deterministically. Same input = same output (for cross-table consistency). Irreversible. Output must look like a valid email.

// mask("alex@gmail.com")  → "a7f3b2c1e9d04f28@masked.io"
// mask("alex@gmail.com")  → "a7f3b2c1e9d04f28@masked.io"  (same!)
// mask("maria@yahoo.com") → "different_hash@masked.io"
// mask(null)              → null

fun maskEmail(email: String?, salt: String = "s3cret"): String? {
    TODO()
}

import java.security.MessageDigest

fun maskEmail(email: String?, salt: String = "s3cret"): String? {
    if (email == null) return null

    val bytes = MessageDigest.getInstance("SHA-256")
        .digest("$salt:$email".toByteArray())

    val local = bytes.take(8).joinToString("") { "%02x".format(it) }
    return "$local@masked.io"
}

Time: O(n) where n = email length (SHA-256 processes input bytes). Collision risk: 16 hex chars = 64 bits = collision at ~4 billion items. For 100M emails — safe. Interview: explain why salt is needed (rainbow table attack prevention).

Task 3

Consistent Database Subset (BFS through FK graph)

Graph / BFSDatabase35 min

Problem: Given seed rows from one table, find ALL rows across ALL tables needed for referential integrity. Follow FKs in both directions: children that reference seed rows, and parents that seed rows reference.

data class FK(val child: String, val childCol: String,
              val parent: String, val parentCol: String)

// Seed: users ids {1, 2}
// Expected: users:{1,2}, orders:{10,11,12}, order_items:{100..108},
//           products:{5,7,12}, departments:{1}

fun collectSubset(
    seedTable: String,
    seedIds: Set<Long>,
    fks: List<FK>,
    // Simulates DB query: "SELECT childCol FROM table WHERE parentCol IN ids"
    fetchIds: (table: String, col: String, filterCol: String, filterIds: Set<Long>) -> Set<Long>
): Map<String, Set<Long>> {
    TODO()
}

fun collectSubset(
    seedTable: String,
    seedIds: Set<Long>,
    fks: List<FK>,
    fetchIds: (table: String, col: String, filterCol: String, filterIds: Set<Long>) -> Set<Long>
): Map<String, Set<Long>> {
    val collected = mutableMapOf<String, MutableSet<Long>>()
    // Queue: (table name, column to filter by, IDs to look for)
    val queue = ArrayDeque<Triple<String, String, Set<Long>>>()

    collected[seedTable] = seedIds.toMutableSet()

    // Forward: seed table is a parent — find children
    for (fk in fks.filter { it.parent == seedTable }) {
        queue.add(Triple(fk.child, fk.childCol, seedIds))
    }
    // Backward: seed table is a child — find parents
    for (fk in fks.filter { it.child == seedTable }) {
        val parentIds = fetchIds(seedTable, fk.childCol, "id", seedIds)
        if (parentIds.isNotEmpty()) {
            queue.add(Triple(fk.parent, "id", parentIds))
        }
    }

    while (queue.isNotEmpty()) {
        val (table, col, filterIds) = queue.removeFirst()

        // Fetch IDs for this table
        val ids = if (col == "id") filterIds
                  else fetchIds(table, "id", col, filterIds)

        // Skip if nothing new
        val existing = collected.getOrPut(table) { mutableSetOf() }
        val newIds = ids - existing
        if (newIds.isEmpty()) continue
        existing.addAll(newIds)

        // Forward: find children of newly collected rows
        for (fk in fks.filter { it.parent == table }) {
            queue.add(Triple(fk.child, fk.childCol, newIds))
        }
        // Backward: find parents of newly collected rows
        for (fk in fks.filter { it.child == table }) {
            val parentIds = fetchIds(table, fk.childCol, "id", newIds)
            val newParents = parentIds - (collected[fk.parent] ?: emptySet())
            if (newParents.isNotEmpty()) {
                queue.add(Triple(fk.parent, "id", newParents))
            }
        }
    }
    return collected
}

Time: O(T * Q) — T tables traversed, Q queries per table. Interview: discuss size explosion (1000 users → 10M order_items), max subset size limit, and how to use temp tables for large IN clauses.

Task 4

Cycle Detection in FK Graph

Graph / DFSDatabase20 min

Problem: Find ALL cycles in a foreign key graph. Return the tables involved in each cycle. Self-references count as cycles.

// Input: employees.manager_id -> employees.id (self-ref)
//        table_a.b_id -> table_b.id, table_b.a_id -> table_a.id (mutual)
// Output: [["employees"], ["table_a", "table_b"]]

fun findCycles(tables: List<String>, fks: List<FK>): List<List<String>> {
    TODO()
}

enum class Color { WHITE, GRAY, BLACK }

fun findCycles(tables: List<String>, fks: List<FK>): List<List<String>> {
    val adj = mutableMapOf<String, MutableList<String>>()
    for (fk in fks) {
        adj.getOrPut(fk.child) { mutableListOf() }.add(fk.parent)
    }

    val color = tables.associateWith { Color.WHITE }.toMutableMap()
    val path = mutableListOf<String>()
    val cycles = mutableListOf<List<String>>()

    fun dfs(node: String) {
        color[node] = Color.GRAY
        path.add(node)

        for (neighbor in adj[node].orEmpty()) {
            when (color[neighbor]) {
                Color.GRAY -> {
                    // Found cycle: extract from 'neighbor' to end of path
                    val cycleStart = path.indexOf(neighbor)
                    cycles.add(path.subList(cycleStart, path.size).toList())
                }
                Color.WHITE -> dfs(neighbor)
                Color.BLACK -> { /* already fully processed */ }
                null -> { }
            }
        }

        path.removeAt(path.size - 1)
        color[node] = Color.BLACK
    }

    for (table in tables) {
        if (color[table] == Color.WHITE) dfs(table)
    }
    return cycles
}

Time: O(V + E). Interview: explain CycleResolutionStrategy — after detecting cycle, break it by making one FK nullable, insert with NULL, update after all tables loaded.

Task 5

Chunked IN-clause Query Builder

DatabasePerformance15 min

Problem: PostgreSQL query planner degrades with large IN clauses. Split 50,000 IDs into chunks of 500, execute each chunk, merge results. Handle the edge case where chunk size > remaining IDs.

// fetchByIds("users", "id", setOf(1,2,...,50000), chunkSize=500)
// → executes 100 queries, each with IN(500 ids), merges results

fun <T> fetchByIds(
    table: String,
    column: String,
    ids: Set<Long>,
    chunkSize: Int = 500,
    conn: Connection,
    rowMapper: (ResultSet) -> T
): List<T> {
    TODO()
}

fun <T> fetchByIds(
    table: String,
    column: String,
    ids: Set<Long>,
    chunkSize: Int = 500,
    conn: Connection,
    rowMapper: (ResultSet) -> T
): List<T> {
    if (ids.isEmpty()) return emptyList()
    val results = mutableListOf<T>()

    for (chunk in ids.chunked(chunkSize)) {
        val placeholders = chunk.joinToString(",") { "?" }
        val sql = "SELECT * FROM $table WHERE $column IN ($placeholders)"
        val ps = conn.prepareStatement(sql)

        chunk.forEachIndexed { i, id -> ps.setLong(i + 1, id) }

        val rs = ps.executeQuery()
        while (rs.next()) {
            results.add(rowMapper(rs))
        }
        rs.close()
        ps.close()
    }
    return results
}

Alternative for large sets — temp table JOIN (faster):

fun fetchByIdsViaTempTable(ids: Set<Long>, conn: Connection): ResultSet {
    conn.createStatement().execute(
        "CREATE TEMP TABLE _seed_ids (id BIGINT PRIMARY KEY) ON COMMIT DROP"
    )
    val copy = CopyManager(conn.unwrap(BaseConnection::class.java))
    val csv = ids.joinToString("\n")
    copy.copyIn("COPY _seed_ids FROM STDIN", csv.reader())

    return conn.createStatement().executeQuery(
        "SELECT t.* FROM users t JOIN _seed_ids s ON t.id = s.id"
    )
}

Chunked: O(N/chunk × query_time). Good for <10K IDs. Temp table: O(N + join_time). Better for >10K IDs. Interview: explain that PostgreSQL planner prefers JOIN over large IN — predictable plan, uses indexes properly.

Task 6

Format-Preserving Masking (Phone Numbers)

MaskingHashing20 min

Problem: Mask phone numbers preserving format. "+7 702 365 6813" → "+7 XXX XXX XXXX" where X are deterministic digits. Same input = same output. Preserve country code. Output must pass regex validation for phone numbers.

// mask("+7 702 365 6813")  → "+7 481 927 3054"
// mask("+7 702 365 6813")  → "+7 481 927 3054"  (same!)
// mask("+44 20 7946 0958") → "+44 20 XXXX XXXX" (preserve format!)
// mask(null)               → null

fun maskPhone(phone: String?, salt: String = "s3cret"): String? {
    TODO()
}

fun maskPhone(phone: String?, salt: String = "s3cret"): String? {
    if (phone == null) return null

    val hash = MessageDigest.getInstance("SHA-256")
        .digest("$salt:$phone".toByteArray())

    // Generate a stream of digits from hash:
    val digits = hash.joinToString("") { "%03d".format(it.toInt() and 0xFF) }
    // e.g. "167042..." — stream of digit characters

    var digitIndex = 0
    val result = StringBuilder()

    // Keep first few chars (country code) + formatting, replace rest
    var preserveDigits = when {
        phone.startsWith("+7") -> 1     // +7 → keep "7"
        phone.startsWith("+44") -> 2    // +44 → keep "44"
        phone.startsWith("+1") -> 1     // +1 → keep "1"
        else -> 0
    }

    for (ch in phone) {
        when {
            !ch.isDigit() -> result.append(ch)  // keep +, spaces, dashes
            preserveDigits > 0 -> {
                result.append(ch)                 // keep country code digits
                preserveDigits--
            }
            else -> {
                result.append(digits[digitIndex % digits.length])
                digitIndex++
            }
        }
    }
    return result.toString()
}

Interview: mention Format-Preserving Encryption (FPE/FF1/FF3-1) as the industry standard. Our solution is simpler (hash-based, not encrytpion-based) but sufficient for testing data. FPE is reversible with the key, hash is not.

Task 7

Producer-Consumer with Backpressure

ConcurrencyPerformance25 min

Problem: One reader reads rows from source. N workers transform rows. One writer batches writes to target. Backpressure: if workers are slow, reader pauses. If writer is slow, workers pause. Use BlockingQueue (Java) or Channel (Kotlin).

// Java version with BlockingQueue:
fun processTable(
    source: Connection,
    target: Connection,
    table: String,
    workers: Int = 4,
    batchSize: Int = 5000,
    transform: (Map<String, Any?>) -> Map<String, Any?>
) {
    TODO()
}

val POISON = emptyMap<String, Any?>()

fun processTable(
    source: Connection, target: Connection,
    table: String, workers: Int = 4, batchSize: Int = 5000,
    transform: (Map<String, Any?>) -> Map<String, Any?>
) {
    val inputQ = ArrayBlockingQueue<Map<String, Any?>>(10_000)
    val outputQ = ArrayBlockingQueue<Map<String, Any?>>(10_000)
    val executor = Executors.newFixedThreadPool(workers)
    val activeWorkers = AtomicInteger(workers)

    // READER — single thread, streams from source
    val reader = Thread {
        source.autoCommit = false
        val stmt = source.createStatement().apply { fetchSize = 10_000 }
        val rs = stmt.executeQuery("SELECT * FROM $table")
        val meta = rs.metaData
        while (rs.next()) {
            val row = (1..meta.columnCount).associate {
                meta.getColumnName(it) to rs.getObject(it)
            }
            inputQ.put(row)  // blocks if queue full — backpressure!
        }
        repeat(workers) { inputQ.put(POISON) }  // signal each worker
    }

    // WORKERS — N threads, transform rows
    repeat(workers) {
        executor.submit {
            while (true) {
                val row = inputQ.take()  // blocks if queue empty
                if (row === POISON) {
                    if (activeWorkers.decrementAndGet() == 0) {
                        outputQ.put(POISON)  // last worker signals writer
                    }
                    return@submit
                }
                outputQ.put(transform(row))  // blocks if output queue full
            }
        }
    }

    // WRITER — single thread, batches to target
    val writer = Thread {
        val batch = mutableListOf<Map<String, Any?>>()
        while (true) {
            val row = outputQ.take()
            if (row === POISON) break
            batch.add(row)
            if (batch.size >= batchSize) {
                writeBatch(target, table, batch)
                batch.clear()
            }
        }
        if (batch.isNotEmpty()) writeBatch(target, table, batch)
    }

    reader.start(); writer.start()
    reader.join(); executor.shutdown(); executor.awaitTermination(1, TimeUnit.HOURS)
    writer.join()
}

Throughput: limited by slowest stage. Reader at 500K/sec, workers at 200K/sec total, writer at 300K/sec → bottleneck = workers at 200K/sec. Interview: explain POISON_PILL pattern, bounded queue = backpressure, why reader needs own thread (JDBC is blocking).

Task 8

Transformer Registry (Strategy Pattern)

DesignMasking20 min

Problem: User defines masking rules in YAML config. Build a registry that maps column → transformer. Apply the right transformer to each row. Must be thread-safe (called from multiple workers).

// Config: users.email → EmailMasker, users.phone → PhoneMasker
// registry.transform("users", "email", "alex@test.com") → "a7f3@masked.io"
// registry.transform("users", "name", "Alex") → "Alex" (no rule = passthrough)

interface Transformer {
    fun transform(value: Any?, salt: String): Any?
}

class TransformerRegistry {
    fun register(table: String, column: String, transformer: Transformer) { TODO() }
    fun transform(table: String, column: String, value: Any?, salt: String): Any? { TODO() }
}

class TransformerRegistry {
    // ConcurrentHashMap: thread-safe, called from multiple workers
    private val registry = ConcurrentHashMap<String, Transformer>()

    fun register(table: String, column: String, transformer: Transformer) {
        registry["$table.$column"] = transformer
    }

    fun transform(table: String, column: String, value: Any?, salt: String): Any? {
        val transformer = registry["$table.$column"]
            ?: return value  // no rule = passthrough
        return transformer.transform(value, salt)
    }
}

// Transformers — each is a pure function (thread-safe by nature):
class EmailMasker : Transformer {
    override fun transform(value: Any?, salt: String): Any? {
        val email = value as? String ?: return null
        val hash = sha256("$salt:$email").take(16)
        return "$hash@masked.io"
    }
}

class PhoneMasker : Transformer {
    override fun transform(value: Any?, salt: String): Any? {
        val phone = value as? String ?: return null
        // preserve format, replace digits
        return maskPhone(phone, salt)
    }
}

class NumericNoise(private val variance: Double = 0.1) : Transformer {
    override fun transform(value: Any?, salt: String): Any? {
        val num = (value as? Number)?.toDouble() ?: return null
        // Deterministic noise from hash of salt + value:
        val hash = sha256("$salt:$num").first().toInt()
        val factor = 1.0 + (hash % 100 - 50) / 500.0 * variance
        return num * factor
    }
}

object Passthrough : Transformer {
    override fun transform(value: Any?, salt: String) = value
}

Interview: explain how this maps to Synthesized's YAML config. Each "transformer" in YAML = one registered Transformer instance. ConcurrentHashMap is thread-safe for reads (no locking needed). Transformers are pure functions — no shared mutable state.

Task 9

Parallel Table Processing with Level-Based Execution

ConcurrencyGraph25 min

Problem: Combine topo sort with parallel execution. Tables at the same topological level are independent and can be processed in parallel. Respect a max parallelism limit (connection budget).

// Level 0: [departments, products]     → parallel
// Level 1: [users]                     → after level 0
// Level 2: [orders]                     → after level 1
// Level 3: [order_items]               → after level 2

fun processInOrder(
    tables: List<String>, fks: List<FK>,
    maxParallel: Int = 4,
    process: (String) -> Unit
) {
    TODO()
}

fun processInOrder(
    tables: List<String>, fks: List<FK>,
    maxParallel: Int = 4,
    process: (String) -> Unit
) {
    // Step 1: Build levels via modified topo sort
    val inDegree = tables.associateWith { 0 }.toMutableMap()
    val children = mutableMapOf<String, MutableList<String>>()
    for (fk in fks) {
        inDegree[fk.child] = (inDegree[fk.child] ?: 0) + 1
        children.getOrPut(fk.parent) { mutableListOf() }.add(fk.child)
    }

    val levels = mutableListOf<List<String>>()
    var currentLevel = tables.filter { (inDegree[it] ?: 0) == 0 }

    while (currentLevel.isNotEmpty()) {
        levels.add(currentLevel)
        val nextLevel = mutableListOf<String>()
        for (table in currentLevel) {
            for (child in children[table].orEmpty()) {
                inDegree[child] = inDegree[child]!! - 1
                if (inDegree[child] == 0) nextLevel.add(child)
            }
        }
        currentLevel = nextLevel
    }

    // Step 2: Process each level with bounded parallelism
    val executor = Executors.newFixedThreadPool(maxParallel)
    for (level in levels) {
        val futures = level.map { table ->
            executor.submit { process(table) }
        }
        futures.forEach { it.get() }  // wait for entire level
    }
    executor.shutdown()
}

// Example output:
// Processing level 0: [departments, products] in parallel (2 threads)
// Processing level 1: [users] (1 thread)
// Processing level 2: [orders] (1 thread)
// Processing level 3: [order_items] (1 thread)

Time: O(V + E) for sort + processing time. Parallelism bounded by min(level_size, maxParallel). Interview: explain that maxParallel should match connection pool size. Level 0 with 50 tables but only 8 connections → process 8 at a time.

Task 10

Checkpoint & Resume for Long-Running Processing

ReliabilityDatabase20 min

Problem: Processing 500 tables takes 6 hours. If it crashes at table #347, it should resume from #347, not start over. Track progress. Make each table's processing idempotent.

enum class Status { PENDING, IN_PROGRESS, COMPLETED, FAILED }
data class Progress(val table: String, val status: Status, val error: String? = null)

class CheckpointManager(private val conn: Connection) {
    fun initProgress(tables: List<String>) { TODO() }
    fun getStatus(table: String): Status { TODO() }
    fun markInProgress(table: String) { TODO() }
    fun markCompleted(table: String) { TODO() }
    fun markFailed(table: String, error: String) { TODO() }
    fun getPendingTables(): List<String> { TODO() }
}

fun processWithCheckpoint(tables: List<String>, mgr: CheckpointManager) {
    TODO()
}

class CheckpointManager(private val conn: Connection) {
    init {
        conn.createStatement().execute("""
            CREATE TABLE IF NOT EXISTS _synth_progress (
                table_name VARCHAR(255) PRIMARY KEY,
                status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
                started_at TIMESTAMP,
                completed_at TIMESTAMP,
                error_message TEXT
            )
        """)
    }

    fun initProgress(tables: List<String>) {
        val ps = conn.prepareStatement(
            "INSERT INTO _synth_progress (table_name) VALUES (?) ON CONFLICT DO NOTHING"
        )
        for (t in tables) { ps.setString(1, t); ps.addBatch() }
        ps.executeBatch()
    }

    fun getStatus(table: String): Status {
        val rs = conn.prepareStatement(
            "SELECT status FROM _synth_progress WHERE table_name = ?"
        ).apply { setString(1, table) }.executeQuery()
        return if (rs.next()) Status.valueOf(rs.getString(1)) else Status.PENDING
    }

    fun markInProgress(table: String) = update(table, "IN_PROGRESS")
    fun markCompleted(table: String) = update(table, "COMPLETED")
    fun markFailed(table: String, error: String) {
        conn.prepareStatement(
            "UPDATE _synth_progress SET status=?, error_message=? WHERE table_name=?"
        ).apply { setString(1,"FAILED"); setString(2,error); setString(3,table) }
        .executeUpdate()
    }

    fun getPendingTables() = query("PENDING") + query("FAILED") + query("IN_PROGRESS")
    private fun query(status: String): List<String> {
        val rs = conn.prepareStatement(
            "SELECT table_name FROM _synth_progress WHERE status = ?"
        ).apply { setString(1, status) }.executeQuery()
        return generateSequence { if (rs.next()) rs.getString(1) else null }.toList()
    }
    private fun update(t: String, s: String) = conn.prepareStatement(
        "UPDATE _synth_progress SET status=?, ${if(s=="IN_PROGRESS") "started_at" else "completed_at"}=NOW() WHERE table_name=?"
    ).apply { setString(1,s); setString(2,t) }.executeUpdate()
}

fun processWithCheckpoint(tables: List<String>, mgr: CheckpointManager) {
    mgr.initProgress(tables)
    for (table in mgr.getPendingTables()) {
        mgr.markInProgress(table)
        try {
            // IDEMPOTENT: truncate target before re-processing
            targetConn.createStatement().execute("TRUNCATE TABLE $table")
            processTable(table)
            mgr.markCompleted(table)
        } catch (e: Exception) {
            mgr.markFailed(table, e.message ?: "unknown")
            logger.error("Failed: $table", e)
        }
    }
}

Interview: explain why IN_PROGRESS is included in getPendingTables — if process crashed mid-table, that table is in IN_PROGRESS state and must be retried. TRUNCATE before retry makes it idempotent.

Task 11

JSON/JSONB Deep Masking

MaskingTree Traversal25 min

Problem: A JSONB column contains nested objects with PII fields. Mask specified paths inside the JSON without destroying the structure. Paths defined as dot-notation: "user.email", "contacts[*].phone".

// Input JSON:  {"user":{"email":"alex@test.com","age":30},"notes":"call me"}
// Mask paths:  ["user.email"]
// Output JSON: {"user":{"email":"a7f3b2@masked.io","age":30},"notes":"call me"}

fun maskJson(json: String, paths: List<String>, salt: String): String {
    TODO()
}

import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ObjectNode
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.TextNode
import com.fasterxml.jackson.databind.JsonNode

val mapper = ObjectMapper()

fun maskJson(json: String, paths: List<String>, salt: String): String {
    val root = mapper.readTree(json)
    for (path in paths) {
        maskPath(root, path.split("."), 0, salt)
    }
    return mapper.writeValueAsString(root)
}

fun maskPath(node: JsonNode, segments: List<String>, depth: Int, salt: String) {
    if (depth >= segments.size) return
    val segment = segments[depth]

    if (segment == "[*]" && node is ArrayNode) {
        // Apply to every element in array
        for (i in 0 until node.size()) {
            if (depth == segments.size - 1) {
                // Leaf: mask the value
                val masked = maskValue(node[i].asText(), salt)
                node.set<JsonNode>(i, TextNode(masked))
            } else {
                maskPath(node[i], segments, depth + 1, salt)
            }
        }
    } else if (node is ObjectNode) {
        val child = node.get(segment) ?: return
        if (depth == segments.size - 1) {
            // Leaf: mask the value
            node.set<JsonNode>(segment, TextNode(maskValue(child.asText(), salt)))
        } else {
            maskPath(child, segments, depth + 1, salt)
        }
    }
}

fun maskValue(value: String, salt: String): String {
    if (value.contains("@")) return maskEmail(value, salt)!!
    if (value.matches(Regex("\\+?\\d[\\d\\s-]{8,}"))) return maskPhone(value, salt)!!
    val hash = sha256("$salt:$value").take(10)
    return "masked_$hash"
}

Time: O(J + P*D) — J = JSON size, P = paths count, D = depth. Interview: mention streaming JSON parsers (Jackson Streaming API) for huge JSON documents that don't fit in memory. Tree approach is fine for typical JSONB columns (<1MB).

Task 12

Database Dialect Abstraction

DesignDatabase25 min

Problem: Design a DatabaseDialect interface that abstracts differences between PostgreSQL and MySQL. Implement both dialects. Include: identifier quoting, streaming configuration, bulk load strategy, type mapping.

// dialect.quoteIdentifier("users") → PostgreSQL: "users", MySQL: `users`
// dialect.configureStreaming(stmt) → PostgreSQL: fetchSize=10000, MySQL: MIN_VALUE
// dialect.mapType("int4") → CanonicalType.INTEGER

interface DatabaseDialect {
    fun quoteIdentifier(name: String): String
    fun configureStreaming(conn: Connection, stmt: Statement)
    fun mapType(typeName: String, size: Int): CanonicalType
    fun bulkLoadSql(table: String, columns: List<String>): String
}

sealed class CanonicalType { /* define types */ }

sealed class CanonicalType {
    object INTEGER : CanonicalType()
    object LONG : CanonicalType()
    object BOOLEAN : CanonicalType()
    object DOUBLE : CanonicalType()
    object TEXT : CanonicalType()
    data class VARCHAR(val size: Int) : CanonicalType()
    object TIMESTAMP : CanonicalType()
    object UUID : CanonicalType()
    object JSON : CanonicalType()
    object BINARY : CanonicalType()
    data class UNKNOWN(val native: String) : CanonicalType()
}

class PostgresDialect : DatabaseDialect {
    override fun quoteIdentifier(name: String) = "\"$name\""

    override fun configureStreaming(conn: Connection, stmt: Statement) {
        conn.autoCommit = false  // required for PG server-side cursors
        stmt.fetchSize = 10_000
    }

    override fun mapType(typeName: String, size: Int) = when (typeName) {
        "int4", "serial"    -> CanonicalType.INTEGER
        "int8", "bigserial" -> CanonicalType.LONG
        "bool"              -> CanonicalType.BOOLEAN
        "float8"            -> CanonicalType.DOUBLE
        "text"              -> CanonicalType.TEXT
        "varchar"           -> CanonicalType.VARCHAR(size)
        "timestamptz", "timestamp" -> CanonicalType.TIMESTAMP
        "uuid"              -> CanonicalType.UUID
        "jsonb", "json"     -> CanonicalType.JSON
        "bytea"             -> CanonicalType.BINARY
        else                -> CanonicalType.UNKNOWN(typeName)
    }

    override fun bulkLoadSql(table: String, columns: List<String>): String {
        val cols = columns.joinToString(",") { quoteIdentifier(it) }
        return "COPY ${quoteIdentifier(table)} ($cols) FROM STDIN WITH CSV"
    }
}

class MySqlDialect : DatabaseDialect {
    override fun quoteIdentifier(name: String) = "`$name`"

    override fun configureStreaming(conn: Connection, stmt: Statement) {
        // MySQL: magic fetchSize value enables streaming
        stmt.fetchSize = Int.MIN_VALUE
    }

    override fun mapType(typeName: String, size: Int) = when (typeName) {
        "INT", "MEDIUMINT"        -> CanonicalType.INTEGER
        "BIGINT"                   -> CanonicalType.LONG
        "TINYINT" -> if (size == 1) CanonicalType.BOOLEAN else CanonicalType.INTEGER
        "DOUBLE", "FLOAT"          -> CanonicalType.DOUBLE
        "TEXT", "LONGTEXT"         -> CanonicalType.TEXT
        "VARCHAR"                  -> CanonicalType.VARCHAR(size)
        "DATETIME", "TIMESTAMP"    -> CanonicalType.TIMESTAMP
        "CHAR" -> if (size == 36) CanonicalType.UUID else CanonicalType.VARCHAR(size)
        "JSON"                     -> CanonicalType.JSON
        "LONGBLOB", "BLOB"         -> CanonicalType.BINARY
        else                       -> CanonicalType.UNKNOWN(typeName)
    }

    override fun bulkLoadSql(table: String, columns: List<String>): String {
        val cols = columns.joinToString(",") { quoteIdentifier(it) }
        return "LOAD DATA LOCAL INFILE '/dev/stdin' INTO TABLE ${quoteIdentifier(table)} ($cols)"
    }
}

// Factory:
fun dialectFor(url: String): DatabaseDialect = when {
    "postgresql" in url -> PostgresDialect()
    "mysql" in url      -> MySqlDialect()
    else -> throw UnsupportedOperationException("Unknown DB: $url")
}

Interview: this is the Strategy pattern. Each new database = one new class. No if/else in the engine code. Adding Oracle support = implement OracleDialect. Mention CanonicalType as sealed class with exhaustive when — compiler catches missing types.

Practice order

Day 1 (basics): Tasks 1, 2, 6 — topo sort, email masking, phone masking. Core algorithms, small scope, build confidence.
Day 2 (hard): Tasks 3, 4, 5 — FK graph BFS, cycle detection, chunked queries. Graph problems + database knowledge.
Day 3 (system): Tasks 7, 8, 9 — producer-consumer, transformer registry, parallel levels. Concurrency + design patterns.
Day 4 (polish): Tasks 10, 11, 12 — checkpoint, JSON masking, dialect abstraction. Reliability + real Synthesized patterns.

For each task: first try to solve WITHOUT looking at solution (set a timer). Then compare. Focus on explaining your approach out loud in English — this is what the interviewer evaluates.