12 practical problems for Synthesized.io interview — with full solutions
Problem: Given tables and foreign key relationships, return tables in loading order (parents first). Detect cycles.
data class FK(val child: String, val parent: String)
// Input:
// tables = ["order_items", "orders", "users", "products", "departments"]
// fks = [FK("users","departments"), FK("orders","users"),
// FK("order_items","orders"), FK("order_items","products")]
// Output: ["departments", "products", "users", "orders", "order_items"]
fun topologicalSort(tables: List<String>, fks: List<FK>): List<String> {
TODO()
}
Problem: Mask emails deterministically. Same input = same output (for cross-table consistency). Irreversible. Output must look like a valid email.
// mask("alex@gmail.com") → "a7f3b2c1e9d04f28@masked.io"
// mask("alex@gmail.com") → "a7f3b2c1e9d04f28@masked.io" (same!)
// mask("maria@yahoo.com") → "different_hash@masked.io"
// mask(null) → null
fun maskEmail(email: String?, salt: String = "s3cret"): String? {
TODO()
}
Problem: Given seed rows from one table, find ALL rows across ALL tables needed for referential integrity. Follow FKs in both directions: children that reference seed rows, and parents that seed rows reference.
data class FK(val child: String, val childCol: String,
val parent: String, val parentCol: String)
// Seed: users ids {1, 2}
// Expected: users:{1,2}, orders:{10,11,12}, order_items:{100..108},
// products:{5,7,12}, departments:{1}
fun collectSubset(
seedTable: String,
seedIds: Set<Long>,
fks: List<FK>,
// Simulates DB query: "SELECT childCol FROM table WHERE parentCol IN ids"
fetchIds: (table: String, col: String, filterCol: String, filterIds: Set<Long>) -> Set<Long>
): Map<String, Set<Long>> {
TODO()
}
Problem: Find ALL cycles in a foreign key graph. Return the tables involved in each cycle. Self-references count as cycles.
// Input: employees.manager_id -> employees.id (self-ref)
// table_a.b_id -> table_b.id, table_b.a_id -> table_a.id (mutual)
// Output: [["employees"], ["table_a", "table_b"]]
fun findCycles(tables: List<String>, fks: List<FK>): List<List<String>> {
TODO()
}
Problem: PostgreSQL query planner degrades with large IN clauses. Split 50,000 IDs into chunks of 500, execute each chunk, merge results. Handle the edge case where chunk size > remaining IDs.
// fetchByIds("users", "id", setOf(1,2,...,50000), chunkSize=500)
// → executes 100 queries, each with IN(500 ids), merges results
fun <T> fetchByIds(
table: String,
column: String,
ids: Set<Long>,
chunkSize: Int = 500,
conn: Connection,
rowMapper: (ResultSet) -> T
): List<T> {
TODO()
}
Problem: Mask phone numbers preserving format. "+7 702 365 6813" → "+7 XXX XXX XXXX" where X are deterministic digits. Same input = same output. Preserve country code. Output must pass regex validation for phone numbers.
// mask("+7 702 365 6813") → "+7 481 927 3054"
// mask("+7 702 365 6813") → "+7 481 927 3054" (same!)
// mask("+44 20 7946 0958") → "+44 20 XXXX XXXX" (preserve format!)
// mask(null) → null
fun maskPhone(phone: String?, salt: String = "s3cret"): String? {
TODO()
}
Problem: One reader reads rows from source. N workers transform rows. One writer batches writes to target. Backpressure: if workers are slow, reader pauses. If writer is slow, workers pause. Use BlockingQueue (Java) or Channel (Kotlin).
// Java version with BlockingQueue:
fun processTable(
source: Connection,
target: Connection,
table: String,
workers: Int = 4,
batchSize: Int = 5000,
transform: (Map<String, Any?>) -> Map<String, Any?>
) {
TODO()
}
Problem: User defines masking rules in YAML config. Build a registry that maps column → transformer. Apply the right transformer to each row. Must be thread-safe (called from multiple workers).
// Config: users.email → EmailMasker, users.phone → PhoneMasker
// registry.transform("users", "email", "alex@test.com") → "a7f3@masked.io"
// registry.transform("users", "name", "Alex") → "Alex" (no rule = passthrough)
interface Transformer {
fun transform(value: Any?, salt: String): Any?
}
class TransformerRegistry {
fun register(table: String, column: String, transformer: Transformer) { TODO() }
fun transform(table: String, column: String, value: Any?, salt: String): Any? { TODO() }
}
Problem: Combine topo sort with parallel execution. Tables at the same topological level are independent and can be processed in parallel. Respect a max parallelism limit (connection budget).
// Level 0: [departments, products] → parallel
// Level 1: [users] → after level 0
// Level 2: [orders] → after level 1
// Level 3: [order_items] → after level 2
fun processInOrder(
tables: List<String>, fks: List<FK>,
maxParallel: Int = 4,
process: (String) -> Unit
) {
TODO()
}
Problem: Processing 500 tables takes 6 hours. If it crashes at table #347, it should resume from #347, not start over. Track progress. Make each table's processing idempotent.
enum class Status { PENDING, IN_PROGRESS, COMPLETED, FAILED }
data class Progress(val table: String, val status: Status, val error: String? = null)
class CheckpointManager(private val conn: Connection) {
fun initProgress(tables: List<String>) { TODO() }
fun getStatus(table: String): Status { TODO() }
fun markInProgress(table: String) { TODO() }
fun markCompleted(table: String) { TODO() }
fun markFailed(table: String, error: String) { TODO() }
fun getPendingTables(): List<String> { TODO() }
}
fun processWithCheckpoint(tables: List<String>, mgr: CheckpointManager) {
TODO()
}
Problem: A JSONB column contains nested objects with PII fields. Mask specified paths inside the JSON without destroying the structure. Paths defined as dot-notation: "user.email", "contacts[*].phone".
// Input JSON: {"user":{"email":"alex@test.com","age":30},"notes":"call me"}
// Mask paths: ["user.email"]
// Output JSON: {"user":{"email":"a7f3b2@masked.io","age":30},"notes":"call me"}
fun maskJson(json: String, paths: List<String>, salt: String): String {
TODO()
}
Problem: Design a DatabaseDialect interface that abstracts differences between PostgreSQL and MySQL. Implement both dialects. Include: identifier quoting, streaming configuration, bulk load strategy, type mapping.
// dialect.quoteIdentifier("users") → PostgreSQL: "users", MySQL: `users`
// dialect.configureStreaming(stmt) → PostgreSQL: fetchSize=10000, MySQL: MIN_VALUE
// dialect.mapType("int4") → CanonicalType.INTEGER
interface DatabaseDialect {
fun quoteIdentifier(name: String): String
fun configureStreaming(conn: Connection, stmt: Statement)
fun mapType(typeName: String, size: Int): CanonicalType
fun bulkLoadSql(table: String, columns: List<String>): String
}
sealed class CanonicalType { /* define types */ }
Day 1 (basics): Tasks 1, 2, 6 — topo sort, email masking, phone masking. Core algorithms, small scope, build confidence.
Day 2 (hard): Tasks 3, 4, 5 — FK graph BFS, cycle detection, chunked queries. Graph problems + database knowledge.
Day 3 (system): Tasks 7, 8, 9 — producer-consumer, transformer registry, parallel levels. Concurrency + design patterns.
Day 4 (polish): Tasks 10, 11, 12 — checkpoint, JSON masking, dialect abstraction. Reliability + real Synthesized patterns.
For each task: first try to solve WITHOUT looking at solution (set a timer). Then compare. Focus on explaining your approach out loud in English — this is what the interviewer evaluates.