Update: PostgreSQL 9.5 has been released with INSERT .. ON CONFLICT UPDATE support.
PostgreSQL has no “upsert” or “replace” or “insert .. on duplicate key update” or “merge into” construct to conditionally either insert a new row, or if a row with the key already exists, to either update the existing row with the new values, or first delete the old row and then insert a new one. This has been discussed many times in the posgresql developers mailing lists, and plans to implement the SQL standard MERGE operation have been devised. But there is still no such functionality in PostgreSQL as of version 9.3. Here are some examples on how to implement the functionality in different ways, using either a function, a trigger function or a rule. None of them are perfect. There are trade-offs to be made.
To implement similar functionality, there are a couple of ways. One is to first explicitly check if a row already exists, and then, depending on which semantics, merge or replace, you want, issue either INSERT or UPDATE query (for merge), or either INSERT or DELETE/INSERT queries (for replace).
Another way is to first try to INSERT the new value, and then check if an exception is raised. If not, the insert worked. If an exception is raised, the row exists and must be either UPDATEd (for merge) or DELETEd and a new one INSERTed (for replace).
Doing it for one row is fine, but if you need to do it for thousands of rows in one shot, you will run into either your whole transaction aborting after the first duplicate key violation, or the fact that you must issue selects and inserts/updates/deletes according to application logic over the network for each row. And doing each operation in a separate transaction will be slow. Also, having multiple database sessions doing merges on a table simultaneously may result in problems.
For a very good in-depth discussion on the topic, see this article: Why is UPSERT so complicated?
Here’s an example table we’ll be working on:
1 2 3 4 |
CREATE TABLE test ( id INTEGER PRIMARY KEY, value VARCHAR ); |
Let’s start merging values. The first two methods need application logic every step of the way to work.
Using SELECT
Conceptually it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
BEGIN; SELECT id FROM test WHERE id = 1; -- If found: UPDATE test SET value = 'a' WHERE id = 1; -- Else: INSERTO INTO test (id, value) VALUES (1, 'a'); -- End if. SELECT id FROM test WHERE id = 2; -- If found: UPDATE test SET value = 'b' WHERE id = 2; -- Else: INSERTO INTO test (id, value) VALUES (2, 'b'); -- End if. . . . COMMIT; |
Here, the application must check after each SELECT if the row exists, and issue either an UPDATE or an INSERT. That means a lot of traffic between client and server.
Using Savepoints
Conceptually it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
BEGIN; SAVEPOINT a; INSERT INTO test (id, value) VALUES (1, 'a'); -- If exception: ROLLBACK TO SAVEPOINT a; UPDATE test SET value = 'a' WHERE id = 1; -- end exception. SAVEPOINT b; INSERT INTO test (id, value) VALUES (2, 'b'); -- If exception: ROLLBACK TO SAVEPOINT b; UPDATE test SET value = 'b' WHERE id = 2; -- end exception. . . . COMMIT; |
Your code will have to do the exception checking after each INSERT and roll back to the previous savepoint. And your application must do the checking every step of the way.
Using a Function
We can move the checking part of the job to the database server, reducing the needed network round-trips between the database server and the application.
This function will first try to insert, and failing that, will update an existing row. You can use this without fearing that your transaction gets aborted due to unique constraint violations on the way.
1 2 3 4 5 6 7 8 9 |
CREATE FUNCTION merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$ BEGIN BEGIN INSERT INTO test (id, value) VALUES (the_id, the_value); EXCEPTION WHEN unique_violation THEN UPDATE test SET value = the_value WHERE id = the_id; END; END; $$ LANGUAGE plpgsql; |
This makes it possible to issue all the queries in one shot to the server, without needing to use application logic between the queries.
Merge some rows:
1 2 3 4 |
BEGIN; SELECT merge_test(1, 'b'); SELECT merge_test(2, 'c'); COMMIT; |
Or even simply:
1 |
SELECT merge_test(1, 'b'), merge_test(2, 'c'); |
This works if you are content with the possibility that some other transaction might delete the row you have just merged before your transaction returns and you think you have an existing row in the database. If you know nobody does deletes on the table, you’re fine. Another version which loops until a row exists and is up-to-date can work better. This one is adapted straight from the PostgreSQL documentation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
CREATE FUNCTION loop_merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$ BEGIN LOOP UPDATE test SET value = the_value WHERE id = the_id; IF FOUND THEN RETURN; END IF; -- Not found, try insert and check exceptions in case someone inserted -- the same key concurrently as we speak BEGIN INSERT INTO test (id, value) VALUES (the_id, the_value); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing; just loop back to the UPDATE END; END LOOP; END; $$ LANGUAGE plpgsql; |
This one will loop until it succeeds in either operation. Using this you can be a bit more sure that after running the merge a row will exist in the table. But, with this one, if there are other unique constraints on the table, we might be in trouble because we did not check which column generated the unique violation. See the documentation for a discussion on this.
Using a Trigger
If the semantics regarding your particular table are such that insert operations should never fail, but should always work as merge operations instead, you can write a trigger function to do it. Let’s make a second table for this test first with:
1 |
CREATE TABLE test2 (id INTEGER PRIMARY KEY, value VARCHAR); |
Then the trigger:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE OR REPLACE FUNCTION before_insert_test2() RETURNS TRIGGER AS $$ BEGIN IF (SELECT COUNT(id) FROM test2 WHERE id = NEW.id) = 1 THEN -- found; update, and return null to prevent the insert UPDATE test2 SET value = NEW.value WHERE id = NEW.id; RETURN NULL; END IF; -- not found; just return row to insert RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER test2_merge BEFORE INSERT ON test2 FOR EACH ROW EXECUTE PROCEDURE before_insert_test2(); |
There’s a caveat, though. If the id you are inserting already exists, your insert operation will return with an inserted row count of 0. That is kind of both correct and incorrect. Correct when you think of it in insert terms, but incorrect when you think of it in merge terms. Just don’t test your success by looking at the row count.
1 2 3 4 |
test=# insert into test2 values (1, 'a'); INSERT 0 1 test=# insert into test2 values (1, 'b'); INSERT 0 0 |
You can also write your merge operation into one insert query:
1 2 |
test=# insert into test2 values (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e'); INSERT 0 4 |
So now one row was updated and four were inserted. And all the dirty stuff is handled server side.
But this approach has the possibility of running into an exception if there are multiple sessions working concurrently with the data. So you must re-try the whole operation if it fails.
Using a Rule
Similar functionality as described above using a trigger can be written using the rule system. Let’s make a third test table:
1 |
CREATE TABLE test3 (id INTEGER PRIMARY KEY, value VARCHAR); |
This rule will test whether a row with the given id exists, and if it does, it will issue an update query instead:
1 2 3 4 5 |
CREATE OR REPLACE RULE test3_merge AS ON INSERT TO test3 WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id)) DO INSTEAD UPDATE test3 SET value = NEW.value WHERE id = NEW.id; |
This has the same caveat regarding the reported row count as the trigger way. Your row count will be incremented by 0 if the row already existed and was merely updated.
1 2 3 4 |
test=# insert into test3 values (1, 'a'); INSERT 0 1 test=# insert into test3 values (1, 'A'), (2, 'B'), (3, 'c'); INSERT 0 2 |
There is an extra SELECT query here, but the PostgreSQL rule system is clever, so it will actually only issue one INSERT operation and one UPDATE operation on the table for the whole data (use EXPLAIN to see the query plan yourself). Triggers are always fired for every row, so they are slower than rules.
The rule method seems like the fastest way of doing this particular merge. Not by any measured benchmark, but just my personal gut feeling. But this method has the possibility of exceptions in concurrent situations, so you must be ready to handle them in your application code, and re-try when needed.
Thanks for this. It well saved me time.
Thanks a lot — great post with helpful examples.
I use :
CREATE OR REPLACE RULE test3_merge AS
ON INSERT TO test3
WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id))
DO INSTEAD
UPDATE test3 SET value = value + NEW.value WHERE id = NEW.id;
and every insert with same key, value is increment. its ok, but why
on table is blank, first insert, the value increment too, for example I’m insert value=1, the result is 2, why
Thanks for this. It well saved me time.
Note that PostgreSQL 9.5 has upsert.
https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#INSERT_…_ON_CONFLICT_DO_NOTHING.2FUPDATE_.28.22UPSERT.22.29