Understanding SQL MERGE: When and Why to Use It

    Understanding SQL MERGE: When and Why to Use It

    08/01/2026

    Picture this: you're building an ETL pipeline that syncs customer data from a staging table to production. The old approach involves three separate queries—check if exists, update if found, insert if not. Sounds simple, right? Until you hit a race condition where between your SELECT and INSERT, another process inserts the same record. Now you've got duplicate key violations or inconsistent data.

    The SQL MERGE statement solves this by wrapping all three operations into a single atomic transaction. It's been around in SQL Server since 2008 and PostgreSQL since version 15, but I still see developers writing those three-statement patterns that are just waiting to break under concurrency.

    Here's when MERGE actually makes sense, when it doesn't, and why it's become my default for data synchronization tasks.

    What Is MERGE?

    The MERGE statement lets you perform INSERT, UPDATE, and DELETE operations in a single atomic statement based on matching conditions between a source and target table. According to the PostgreSQL documentation, it was added in PostgreSQL 15 to match SQL:2003 standard behavior.

    The elegant part is that the database engine handles the logic atomically—you don't have to worry about what happens between checking for existence and performing the operation.

    Basic Syntax

    The syntax varies slightly between database vendors, but the core structure is consistent:

    MERGE INTO target_table AS target USING source_table AS source ON target.key_column = source.key_column WHEN MATCHED THEN UPDATE SET target.column1 = source.column1 WHEN NOT MATCHED THEN INSERT (column1, column2) VALUES (source.column1, source.column2);

    PostgreSQL and SQL Server use similar syntax. Oracle has some differences, and MySQL doesn't support MERGE directly (you'd use INSERT ... ON DUPLICATE KEY UPDATE instead).

    Important gotcha: SQL Server requires a semicolon (;) before the MERGE statement if it's not the first statement in the batch. PostgreSQL doesn't have this requirement, but it's good practice to use semicolons anyway.

    Real-World Example: Synchronizing Inventory

    Let's say you're managing inventory. Every night, a batch process updates your product catalog from an external system. Some products are new, some have price changes, and discontinued items should be marked inactive.

    Target Table (products):

    CREATE TABLE products ( product_id INTEGER PRIMARY KEY, product_name VARCHAR(100), price DECIMAL(10, 2), status VARCHAR(20) DEFAULT 'ACTIVE', last_updated TIMESTAMP );

    Source Table (staging_updates):

    CREATE TABLE staging_updates ( product_id INTEGER, product_name VARCHAR(100), price DECIMAL(10, 2), is_discontinued BOOLEAN );

    Here's the MERGE statement that handles all three scenarios:

    MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED AND s.is_discontinued = true THEN UPDATE SET status = 'DISCONTINUED', last_updated = CURRENT_TIMESTAMP WHEN MATCHED AND s.is_discontinued = false THEN UPDATE SET product_name = s.product_name, price = s.price, status = 'ACTIVE', last_updated = CURRENT_TIMESTAMP WHEN NOT MATCHED THEN INSERT (product_id, product_name, price, status, last_updated) VALUES (s.product_id, s.product_name, s.price, 'ACTIVE', CURRENT_TIMESTAMP);

    The trick here is using multiple WHEN MATCHED clauses with conditions. PostgreSQL and SQL Server both support this. You can have conditional updates based on different criteria—something you can't easily do with separate UPDATE statements without risking race conditions.

    Why Prefer MERGE Over Separate Statements?

    Atomicity

    The biggest win is atomicity. With separate statements:

    -- Problematic approach UPDATE products SET price = 100 WHERE product_id = 123; -- What if another transaction inserts product_id = 123 here? INSERT INTO products (product_id, price) VALUES (123, 100);

    If your application logic decides to update first, then insert on failure, you've got a race condition. Another process could insert the row between your UPDATE and INSERT, causing a duplicate key error or inconsistent state.

    MERGE handles this as a single atomic operation. The database engine evaluates all matches first, then applies changes in one transaction.

    Performance

    Performance is where MERGE really shines at scale. Consider this alternative:

    -- Slow approach: multiple scans UPDATE products p SET price = s.price FROM staging_updates s WHERE p.product_id = s.product_id; INSERT INTO products (product_id, product_name, price) SELECT product_id, product_name, price FROM staging_updates s WHERE NOT EXISTS ( SELECT 1 FROM products p WHERE p.product_id = s.product_id );

    This scans both tables twice. With MERGE, the database engine can optimize to read each table once. The SQL Server documentation specifically notes that MERGE can be more efficient than equivalent separate statements because the optimizer sees the full operation.

    Conditional Logic

    With MERGE, you can have multiple conditional branches:

    MERGE INTO customer_orders AS co USING order_updates AS ou ON co.order_id = ou.order_id WHEN MATCHED AND ou.status = 'CANCELLED' THEN DELETE WHEN MATCHED AND ou.amount > co.amount THEN UPDATE SET amount = ou.amount, notes = 'Price increased after initial order' WHEN MATCHED THEN UPDATE SET status = ou.status WHEN NOT MATCHED THEN INSERT (order_id, customer_id, amount, status) VALUES (ou.order_id, ou.customer_id, ou.amount, ou.status);

    Try doing that with separate UPDATE and INSERT statements while maintaining the same atomicity guarantees. You'd need complex transaction logic and still risk race conditions.

    The order of WHEN clauses matters in some databases. SQL Server evaluates them top-to-bottom and stops at the first matching condition. PostgreSQL follows the same behavior. Always put more specific conditions before general ones.

    When MERGE Is Overkill

    Not every data sync needs MERGE. Here are scenarios where simpler approaches are better:

    Simple inserts only: If you're just adding new records and duplicates are impossible, use INSERT:

    -- Don't use MERGE for this INSERT INTO audit_log (event_type, event_data) SELECT event_type, event_data FROM staging_events;

    Single-row updates: For application-level updates where you're working with one row at a time, use standard UPDATE:

    -- Better than MERGE for single-row operations UPDATE users SET last_login = NOW() WHERE user_id = 123;

    Database-specific alternatives: MySQL's INSERT ... ON DUPLICATE KEY UPDATE is simpler and more idiomatic for that database:

    -- MySQL: simpler than MERGE INSERT INTO products (product_id, price) VALUES (123, 99.99) ON DUPLICATE KEY UPDATE price = 99.99;

    WHEN NOT MATCHED BY SOURCE: Deleting Records

    One powerful but dangerous feature is WHEN NOT MATCHED BY SOURCE, which deletes rows in the target that don't exist in the source. This is supported in SQL Server and PostgreSQL 17+ (not available in PostgreSQL 15-16):

    -- SQL Server / PostgreSQL 17+ MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price) WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes products not in staging_updates

    The danger: If staging_updates only contains a subset of products (say, just today's updates), this would delete all other products from the target table. Always scope your source:

    -- Safer approach: only sync products that should exist MERGE INTO products AS p USING ( SELECT product_id, product_name, price FROM staging_updates WHERE sync_date = CURRENT_DATE -- Only sync today's updates ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price); -- Intentionally omitting WHEN NOT MATCHED BY SOURCE

    Use WHEN NOT MATCHED BY SOURCE only when you're doing a full sync where the source represents the complete desired state.

    Advanced Patterns

    Outputting Affected Rows

    SQL Server supports returning information about what was changed using the OUTPUT clause:

    -- SQL Server OUTPUT clause MERGE products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price) OUTPUT $action, inserted.product_id, deleted.price AS old_price, inserted.price AS new_price;

    The $action column returns 'INSERT', 'UPDATE', or 'DELETE' depending on what operation was performed on each row.

    PostgreSQL 17+ added a RETURNING clause for MERGE:

    -- PostgreSQL 17+ RETURNING clause MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price) RETURNING merge_action(), product_id, price;

    Note: The RETURNING clause for MERGE is only available in PostgreSQL 17 and later. PostgreSQL 15-16 don't support it.

    This is useful for audit logging or triggering downstream processes based on what actually changed.

    Handling NULLs in Source

    One gotcha: if your source has NULL values and you want to preserve existing target values:

    MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET -- Only update if source has a value product_name = COALESCE(s.product_name, p.product_name), price = COALESCE(s.price, p.price) WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price);

    Without COALESCE, NULL values in the source would overwrite existing data in the target.

    Performance Considerations

    The performance of MERGE depends heavily on proper indexing. The columns used in the ON clause should be indexed:

    -- Essential index for MERGE performance CREATE INDEX idx_products_product_id ON products(product_id); CREATE INDEX idx_staging_product_id ON staging_updates(product_id);

    Without indexes, MERGE degrades to full table scans. I've seen production queries that worked fine with 1,000 rows but crawled with 1 million—missing indexes on the join columns.

    For large batch operations, consider filtering the source table:

    -- Better: filter source to only changed rows MERGE INTO products AS p USING ( SELECT * FROM staging_updates WHERE last_modified > CURRENT_DATE - INTERVAL '1 day' ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price);

    Duplicate Matches in Source

    If your ON clause matches multiple source rows to a single target row, you'll get an error. This commonly happens when the join condition isn't unique enough:

    -- Problem: multiple source rows match one target MERGE INTO products AS p USING staging_updates AS s ON p.product_name = s.product_name -- Not unique! WHEN MATCHED THEN UPDATE SET price = s.price; -- Error: MERGE statement attempted to UPDATE or DELETE the same row more than once

    The fix: ensure your ON clause uses a unique key or add conditions to filter duplicates:

    -- PostgreSQL: use DISTINCT ON MERGE INTO products AS p USING ( SELECT DISTINCT ON (product_id) product_id, product_name, price FROM staging_updates ORDER BY product_id, last_modified DESC -- Take latest if duplicates ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price; -- SQL Server: use ROW_NUMBER() MERGE INTO products AS p USING ( SELECT product_id, product_name, price FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY last_modified DESC) AS rn FROM staging_updates ) ranked WHERE rn = 1 ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price;

    Concurrency and Deadlocks

    MERGE can cause deadlocks in high-concurrency scenarios, especially when multiple transactions modify the same rows. SQL Server is particularly prone to this because MERGE acquires locks on both source and target.

    If you're seeing deadlocks, consider:

    1. Using appropriate isolation levels: READ COMMITTED (default) is usually fine, but SERIALIZABLE prevents phantom reads at the cost of more blocking.

    2. Error handling: Wrap MERGE in TRY-CATCH (SQL Server) or transaction blocks with error handling:

    -- SQL Server error handling BEGIN TRY BEGIN TRANSACTION; MERGE products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; -- Handle error (log, retry, etc.) THROW; END CATCH;
    1. Batch processing: Process data in smaller batches to reduce lock contention.

    2. Avoiding hot rows: If multiple processes update the same row frequently, consider batching or queuing updates instead of processing them immediately.

    Database Compatibility

    Support varies by database:

    • PostgreSQL: Basic MERGE supported since version 15 (released 2022). PostgreSQL 17 added RETURNING clause and WHEN NOT MATCHED BY SOURCE/BY TARGET syntax. Before version 15, use INSERT ... ON CONFLICT as an alternative for upsert operations. Official docs
    • SQL Server: Supported since SQL Server 2008. Full support for OUTPUT clause and BY SOURCE/BY TARGET syntax. Requires semicolon before MERGE if not first statement. Official docs
    • Oracle: Supported with slightly different syntax. More flexible with conditional clauses. Official docs
    • MySQL: Not supported. Use INSERT ... ON DUPLICATE KEY UPDATE instead (similar functionality, different syntax)
    • SQLite: Not supported. Use INSERT OR REPLACE or application logic with separate statements

    PostgreSQL alternative: If you're on PostgreSQL < 15, INSERT ... ON CONFLICT covers most upsert use cases:

    -- PostgreSQL alternative to MERGE (upsert only) INSERT INTO products (product_id, product_name, price) SELECT product_id, product_name, price FROM staging_updates ON CONFLICT (product_id) DO UPDATE SET product_name = EXCLUDED.product_name, price = EXCLUDED.price;

    Slowly Changing Dimensions (SCD)

    In data warehousing, MERGE is essential for handling Slowly Changing Dimensions (SCD). You're tracking historical changes while keeping current records accessible:

    -- SCD Type 2: Track history with effective dates MERGE INTO dim_customer AS target USING staging_customer AS source ON target.customer_id = source.customer_id AND target.current_flag = 'Y' -- Only match current record WHEN MATCHED AND target.email <> source.email THEN -- Close current record, insert new version UPDATE SET current_flag = 'N', end_date = CURRENT_DATE WHEN NOT MATCHED THEN INSERT (customer_id, email, start_date, current_flag) VALUES (source.customer_id, source.email, CURRENT_DATE, 'Y'); -- Then insert historical version INSERT INTO dim_customer (customer_id, email, start_date, current_flag) SELECT customer_id, email, CURRENT_DATE, 'Y' FROM staging_customer s WHERE EXISTS ( SELECT 1 FROM dim_customer d WHERE d.customer_id = s.customer_id AND d.current_flag = 'N' AND d.end_date = CURRENT_DATE );

    This is a simplified example—real SCD implementations can get complex, but MERGE makes the logic manageable.

    The Bottom Line

    Use MERGE when you need to synchronize data between tables with conditional logic and atomicity guarantees. It's not always the simplest solution, but for ETL pipelines, data warehousing (especially slowly changing dimensions), and any scenario where you're syncing data between systems, it's usually the right choice.

    The pattern I see working best: start with MERGE for any multi-operation sync, then simplify only if profiling shows it's a bottleneck. Watch out for deadlocks in high-concurrency scenarios and always scope your WHEN NOT MATCHED BY SOURCE clauses carefully. Most of the time, the atomicity and clarity are worth it.


    🔗 Blog 🔗 LinkedIn 🔗 Medium 🔗 Github

    Discover Top YouTube Creators

    Explore Popular Tech YouTube Channels

    Find the most popular YouTube creators in tech categories like AI, Java, JavaScript, Python, .NET, and developer conferences. Perfect for learning, inspiration, and staying updated with the best tech content.

    Summarise

    Transform Your Learning

    Get instant AI-powered summaries of YouTube videos and websites. Save time while enhancing your learning experience.

    Instant video summaries
    Smart insights extraction
    Channel tracking