Picture this: you're building an ETL pipeline that syncs customer data from a staging table to production. The old approach involves three separate queries—check if exists, update if found, insert if not. Sounds simple, right? Until you hit a race condition where between your SELECT and INSERT, another process inserts the same record. Now you've got duplicate key violations or inconsistent data.
The SQL MERGE statement solves this by wrapping all three operations into a single atomic transaction. It's been around in SQL Server since 2008 and PostgreSQL since version 15, but I still see developers writing those three-statement patterns that are just waiting to break under concurrency.
Here's when MERGE actually makes sense, when it doesn't, and why it's become my default for data synchronization tasks.
The MERGE statement lets you perform INSERT, UPDATE, and DELETE operations in a single atomic statement based on matching conditions between a source and target table. According to the PostgreSQL documentation, it was added in PostgreSQL 15 to match SQL:2003 standard behavior.
The elegant part is that the database engine handles the logic atomically—you don't have to worry about what happens between checking for existence and performing the operation.
The syntax varies slightly between database vendors, but the core structure is consistent:
MERGE INTO target_table AS target USING source_table AS source ON target.key_column = source.key_column WHEN MATCHED THEN UPDATE SET target.column1 = source.column1 WHEN NOT MATCHED THEN INSERT (column1, column2) VALUES (source.column1, source.column2);
PostgreSQL and SQL Server use similar syntax. Oracle has some differences, and MySQL doesn't support MERGE directly (you'd use INSERT ... ON DUPLICATE KEY UPDATE instead).
Important gotcha: SQL Server requires a semicolon (;) before the MERGE statement if it's not the first statement in the batch. PostgreSQL doesn't have this requirement, but it's good practice to use semicolons anyway.
Let's say you're managing inventory. Every night, a batch process updates your product catalog from an external system. Some products are new, some have price changes, and discontinued items should be marked inactive.
Target Table (products):
CREATE TABLE products ( product_id INTEGER PRIMARY KEY, product_name VARCHAR(100), price DECIMAL(10, 2), status VARCHAR(20) DEFAULT 'ACTIVE', last_updated TIMESTAMP );
Source Table (staging_updates):
CREATE TABLE staging_updates ( product_id INTEGER, product_name VARCHAR(100), price DECIMAL(10, 2), is_discontinued BOOLEAN );
Here's the MERGE statement that handles all three scenarios:
MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED AND s.is_discontinued = true THEN UPDATE SET status = 'DISCONTINUED', last_updated = CURRENT_TIMESTAMP WHEN MATCHED AND s.is_discontinued = false THEN UPDATE SET product_name = s.product_name, price = s.price, status = 'ACTIVE', last_updated = CURRENT_TIMESTAMP WHEN NOT MATCHED THEN INSERT (product_id, product_name, price, status, last_updated) VALUES (s.product_id, s.product_name, s.price, 'ACTIVE', CURRENT_TIMESTAMP);
The trick here is using multiple WHEN MATCHED clauses with conditions. PostgreSQL and SQL Server both support this. You can have conditional updates based on different criteria—something you can't easily do with separate UPDATE statements without risking race conditions.
The biggest win is atomicity. With separate statements:
-- Problematic approach UPDATE products SET price = 100 WHERE product_id = 123; -- What if another transaction inserts product_id = 123 here? INSERT INTO products (product_id, price) VALUES (123, 100);
If your application logic decides to update first, then insert on failure, you've got a race condition. Another process could insert the row between your UPDATE and INSERT, causing a duplicate key error or inconsistent state.
MERGE handles this as a single atomic operation. The database engine evaluates all matches first, then applies changes in one transaction.
Performance is where MERGE really shines at scale. Consider this alternative:
-- Slow approach: multiple scans UPDATE products p SET price = s.price FROM staging_updates s WHERE p.product_id = s.product_id; INSERT INTO products (product_id, product_name, price) SELECT product_id, product_name, price FROM staging_updates s WHERE NOT EXISTS ( SELECT 1 FROM products p WHERE p.product_id = s.product_id );
This scans both tables twice. With MERGE, the database engine can optimize to read each table once. The SQL Server documentation specifically notes that MERGE can be more efficient than equivalent separate statements because the optimizer sees the full operation.
With MERGE, you can have multiple conditional branches:
MERGE INTO customer_orders AS co USING order_updates AS ou ON co.order_id = ou.order_id WHEN MATCHED AND ou.status = 'CANCELLED' THEN DELETE WHEN MATCHED AND ou.amount > co.amount THEN UPDATE SET amount = ou.amount, notes = 'Price increased after initial order' WHEN MATCHED THEN UPDATE SET status = ou.status WHEN NOT MATCHED THEN INSERT (order_id, customer_id, amount, status) VALUES (ou.order_id, ou.customer_id, ou.amount, ou.status);
Try doing that with separate UPDATE and INSERT statements while maintaining the same atomicity guarantees. You'd need complex transaction logic and still risk race conditions.
The order of WHEN clauses matters in some databases. SQL Server evaluates them top-to-bottom and stops at the first matching condition. PostgreSQL follows the same behavior. Always put more specific conditions before general ones.
Not every data sync needs MERGE. Here are scenarios where simpler approaches are better:
Simple inserts only: If you're just adding new records and duplicates are impossible, use INSERT:
-- Don't use MERGE for this INSERT INTO audit_log (event_type, event_data) SELECT event_type, event_data FROM staging_events;
Single-row updates: For application-level updates where you're working with one row at a time, use standard UPDATE:
-- Better than MERGE for single-row operations UPDATE users SET last_login = NOW() WHERE user_id = 123;
Database-specific alternatives: MySQL's INSERT ... ON DUPLICATE KEY UPDATE is simpler and more idiomatic for that database:
-- MySQL: simpler than MERGE INSERT INTO products (product_id, price) VALUES (123, 99.99) ON DUPLICATE KEY UPDATE price = 99.99;
One powerful but dangerous feature is WHEN NOT MATCHED BY SOURCE, which deletes rows in the target that don't exist in the source. This is supported in SQL Server and PostgreSQL 17+ (not available in PostgreSQL 15-16):
-- SQL Server / PostgreSQL 17+ MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price) WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes products not in staging_updates
The danger: If staging_updates only contains a subset of products (say, just today's updates), this would delete all other products from the target table. Always scope your source:
-- Safer approach: only sync products that should exist MERGE INTO products AS p USING ( SELECT product_id, product_name, price FROM staging_updates WHERE sync_date = CURRENT_DATE -- Only sync today's updates ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price); -- Intentionally omitting WHEN NOT MATCHED BY SOURCE
Use WHEN NOT MATCHED BY SOURCE only when you're doing a full sync where the source represents the complete desired state.
SQL Server supports returning information about what was changed using the OUTPUT clause:
-- SQL Server OUTPUT clause MERGE products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price) OUTPUT $action, inserted.product_id, deleted.price AS old_price, inserted.price AS new_price;
The $action column returns 'INSERT', 'UPDATE', or 'DELETE' depending on what operation was performed on each row.
PostgreSQL 17+ added a RETURNING clause for MERGE:
-- PostgreSQL 17+ RETURNING clause MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price) RETURNING merge_action(), product_id, price;
Note: The RETURNING clause for MERGE is only available in PostgreSQL 17 and later. PostgreSQL 15-16 don't support it.
This is useful for audit logging or triggering downstream processes based on what actually changed.
One gotcha: if your source has NULL values and you want to preserve existing target values:
MERGE INTO products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET -- Only update if source has a value product_name = COALESCE(s.product_name, p.product_name), price = COALESCE(s.price, p.price) WHEN NOT MATCHED THEN INSERT (product_id, product_name, price) VALUES (s.product_id, s.product_name, s.price);
Without COALESCE, NULL values in the source would overwrite existing data in the target.
The performance of MERGE depends heavily on proper indexing. The columns used in the ON clause should be indexed:
-- Essential index for MERGE performance CREATE INDEX idx_products_product_id ON products(product_id); CREATE INDEX idx_staging_product_id ON staging_updates(product_id);
Without indexes, MERGE degrades to full table scans. I've seen production queries that worked fine with 1,000 rows but crawled with 1 million—missing indexes on the join columns.
For large batch operations, consider filtering the source table:
-- Better: filter source to only changed rows MERGE INTO products AS p USING ( SELECT * FROM staging_updates WHERE last_modified > CURRENT_DATE - INTERVAL '1 day' ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price);
If your ON clause matches multiple source rows to a single target row, you'll get an error. This commonly happens when the join condition isn't unique enough:
-- Problem: multiple source rows match one target MERGE INTO products AS p USING staging_updates AS s ON p.product_name = s.product_name -- Not unique! WHEN MATCHED THEN UPDATE SET price = s.price; -- Error: MERGE statement attempted to UPDATE or DELETE the same row more than once
The fix: ensure your ON clause uses a unique key or add conditions to filter duplicates:
-- PostgreSQL: use DISTINCT ON MERGE INTO products AS p USING ( SELECT DISTINCT ON (product_id) product_id, product_name, price FROM staging_updates ORDER BY product_id, last_modified DESC -- Take latest if duplicates ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price; -- SQL Server: use ROW_NUMBER() MERGE INTO products AS p USING ( SELECT product_id, product_name, price FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY last_modified DESC) AS rn FROM staging_updates ) ranked WHERE rn = 1 ) AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price;
MERGE can cause deadlocks in high-concurrency scenarios, especially when multiple transactions modify the same rows. SQL Server is particularly prone to this because MERGE acquires locks on both source and target.
If you're seeing deadlocks, consider:
Using appropriate isolation levels: READ COMMITTED (default) is usually fine, but SERIALIZABLE prevents phantom reads at the cost of more blocking.
Error handling: Wrap MERGE in TRY-CATCH (SQL Server) or transaction blocks with error handling:
-- SQL Server error handling BEGIN TRY BEGIN TRANSACTION; MERGE products AS p USING staging_updates AS s ON p.product_id = s.product_id WHEN MATCHED THEN UPDATE SET price = s.price WHEN NOT MATCHED THEN INSERT (product_id, price) VALUES (s.product_id, s.price); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; -- Handle error (log, retry, etc.) THROW; END CATCH;
Batch processing: Process data in smaller batches to reduce lock contention.
Avoiding hot rows: If multiple processes update the same row frequently, consider batching or queuing updates instead of processing them immediately.
Support varies by database:
MERGE supported since version 15 (released 2022). PostgreSQL 17 added RETURNING clause and WHEN NOT MATCHED BY SOURCE/BY TARGET syntax. Before version 15, use INSERT ... ON CONFLICT as an alternative for upsert operations. Official docsOUTPUT clause and BY SOURCE/BY TARGET syntax. Requires semicolon before MERGE if not first statement. Official docsINSERT ... ON DUPLICATE KEY UPDATE instead (similar functionality, different syntax)INSERT OR REPLACE or application logic with separate statementsPostgreSQL alternative: If you're on PostgreSQL < 15, INSERT ... ON CONFLICT covers most upsert use cases:
-- PostgreSQL alternative to MERGE (upsert only) INSERT INTO products (product_id, product_name, price) SELECT product_id, product_name, price FROM staging_updates ON CONFLICT (product_id) DO UPDATE SET product_name = EXCLUDED.product_name, price = EXCLUDED.price;
In data warehousing, MERGE is essential for handling Slowly Changing Dimensions (SCD). You're tracking historical changes while keeping current records accessible:
-- SCD Type 2: Track history with effective dates MERGE INTO dim_customer AS target USING staging_customer AS source ON target.customer_id = source.customer_id AND target.current_flag = 'Y' -- Only match current record WHEN MATCHED AND target.email <> source.email THEN -- Close current record, insert new version UPDATE SET current_flag = 'N', end_date = CURRENT_DATE WHEN NOT MATCHED THEN INSERT (customer_id, email, start_date, current_flag) VALUES (source.customer_id, source.email, CURRENT_DATE, 'Y'); -- Then insert historical version INSERT INTO dim_customer (customer_id, email, start_date, current_flag) SELECT customer_id, email, CURRENT_DATE, 'Y' FROM staging_customer s WHERE EXISTS ( SELECT 1 FROM dim_customer d WHERE d.customer_id = s.customer_id AND d.current_flag = 'N' AND d.end_date = CURRENT_DATE );
This is a simplified example—real SCD implementations can get complex, but MERGE makes the logic manageable.
Use MERGE when you need to synchronize data between tables with conditional logic and atomicity guarantees. It's not always the simplest solution, but for ETL pipelines, data warehousing (especially slowly changing dimensions), and any scenario where you're syncing data between systems, it's usually the right choice.
The pattern I see working best: start with MERGE for any multi-operation sync, then simplify only if profiling shows it's a bottleneck. Watch out for deadlocks in high-concurrency scenarios and always scope your WHEN NOT MATCHED BY SOURCE clauses carefully. Most of the time, the atomicity and clarity are worth it.
Master SQL Common Table Expressions (CTEs) with this practical guide. Learn how to simplify complex queries, write recursive queries, and improve readability through real-world examples.
Master SQL JOINs with this complete guide. Learn INNER, LEFT, RIGHT, and FULL OUTER JOINs with practical examples and visual diagrams.
Find the most popular YouTube creators in tech categories like AI, Java, JavaScript, Python, .NET, and developer conferences. Perfect for learning, inspiration, and staying updated with the best tech content.

Get instant AI-powered summaries of YouTube videos and websites. Save time while enhancing your learning experience.