Upsert Methods for PostgreSQL

3.62 avg. rating (73% score) - 8 votes

Update: PostgreSQL 9.5 has been released with INSERT .. ON CONFLICT UPDATE support.

PostgreSQL has no “upsert” or “replace” or “insert .. on duplicate key update” or “merge into” construct to conditionally either insert a new row, or if a row with the key already exists, to either update the existing row with the new values, or first delete the old row and then insert a new one. This has been discussed many times in the posgresql developers mailing lists, and plans to implement the SQL standard MERGE operation have been devised. But there is still no such functionality in PostgreSQL as of version 9.3. Here are some examples on how to implement the functionality in different ways, using either a function, a trigger function or a rule. None of them are perfect. There are trade-offs to be made.

To implement similar functionality, there are a couple of ways. One is to first explicitly check if a row already exists, and then, depending on which semantics, merge or replace, you want, issue either INSERT or UPDATE query (for merge), or either INSERT or DELETE/INSERT queries (for replace).

Another way is to first try to INSERT the new value, and then check if an exception is raised. If not, the insert worked. If an exception is raised, the row exists and must be either UPDATEd (for merge) or DELETEd and a new one INSERTed (for replace).

Doing it for one row is fine, but if you need to do it for thousands of rows in one shot, you will run into either your whole transaction aborting after the first duplicate key violation, or the fact that you must issue selects and inserts/updates/deletes according to application logic over the network for each row. And doing each operation in a separate transaction will be slow. Also, having multiple database sessions doing merges on a table simultaneously may result in problems.

For a very good in-depth discussion on the topic, see this article: Why is UPSERT so complicated?

Here’s an example table we’ll be working on:

CREATE TABLE test (
  id INTEGER PRIMARY KEY,
  value VARCHAR
);

CREATE TABLE test (

id INTEGER PRIMARY KEY,

value VARCHAR

);

Let’s start merging values. The first two methods need application logic every step of the way to work.

Table of Contents

1 Using SELECT
2 Using Savepoints
3 Using a Function
4 Using a Trigger
5 Using a Rule

Using SELECT

Conceptually it looks like this:

BEGIN;
SELECT id FROM test WHERE id = 1;
-- If found:
UPDATE test SET value = 'a' WHERE id = 1;
-- Else:
INSERTO INTO test (id, value) VALUES (1, 'a');
-- End if.
SELECT id FROM test WHERE id = 2;
-- If found:
UPDATE test SET value = 'b' WHERE id = 2;
-- Else:
INSERTO INTO test (id, value) VALUES (2, 'b');
-- End if.
.
.
.
COMMIT;

BEGIN;

SELECT id FROM test WHERE id = 1;

-- If found:

UPDATE test SET value = 'a' WHERE id = 1;

-- Else:

INSERTO INTO test (id, value) VALUES (1, 'a');

-- End if.

SELECT id FROM test WHERE id = 2;

-- If found:

UPDATE test SET value = 'b' WHERE id = 2;

-- Else:

INSERTO INTO test (id, value) VALUES (2, 'b');

-- End if.

COMMIT;

Here, the application must check after each SELECT if the row exists, and issue either an UPDATE or an INSERT. That means a lot of traffic between client and server.

Using Savepoints

Conceptually it looks like this:

BEGIN;
SAVEPOINT a;
INSERT INTO test (id, value) VALUES (1, 'a');
-- If exception:
ROLLBACK TO SAVEPOINT a;
UPDATE test SET value = 'a' WHERE id = 1;
-- end exception.
SAVEPOINT b;
INSERT INTO test (id, value) VALUES (2, 'b');
-- If exception:
ROLLBACK TO SAVEPOINT b;
UPDATE test SET value = 'b' WHERE id = 2;
-- end exception.
.
.
.
COMMIT;

BEGIN;

SAVEPOINT a;

INSERT INTO test (id, value) VALUES (1, 'a');

-- If exception:

ROLLBACK TO SAVEPOINT a;

UPDATE test SET value = 'a' WHERE id = 1;

-- end exception.

SAVEPOINT b;

INSERT INTO test (id, value) VALUES (2, 'b');

-- If exception:

ROLLBACK TO SAVEPOINT b;

UPDATE test SET value = 'b' WHERE id = 2;

-- end exception.

COMMIT;

Your code will have to do the exception checking after each INSERT and roll back to the previous savepoint. And your application must do the checking every step of the way.

Using a Function

We can move the checking part of the job to the database server, reducing the needed network round-trips between the database server and the application.

This function will first try to insert, and failing that, will update an existing row. You can use this without fearing that your transaction gets aborted due to unique constraint violations on the way.

CREATE FUNCTION merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$
BEGIN
  BEGIN
    INSERT INTO test (id, value) VALUES (the_id, the_value);
  EXCEPTION WHEN unique_violation THEN
    UPDATE test SET value = the_value WHERE id = the_id;
  END;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$

BEGIN

INSERT INTO test (id, value) VALUES (the_id, the_value);

EXCEPTION WHEN unique_violation THEN

UPDATE test SET value = the_value WHERE id = the_id;

END;

$$ LANGUAGE plpgsql;

This makes it possible to issue all the queries in one shot to the server, without needing to use application logic between the queries.

Merge some rows:

BEGIN;
SELECT merge_test(1, 'b');
SELECT merge_test(2, 'c');
COMMIT;

BEGIN;

SELECT merge_test(1, 'b');

SELECT merge_test(2, 'c');

COMMIT;

Or even simply:

SELECT merge_test(1, 'b'), merge_test(2, 'c');

1	SELECT merge_test(1, 'b'), merge_test(2, 'c');

This works if you are content with the possibility that some other transaction might delete the row you have just merged before your transaction returns and you think you have an existing row in the database. If you know nobody does deletes on the table, you’re fine. Another version which loops until a row exists and is up-to-date can work better. This one is adapted straight from the PostgreSQL documentation:

CREATE FUNCTION loop_merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$
BEGIN
  LOOP
    UPDATE test SET value = the_value WHERE id = the_id;
    IF FOUND THEN
      RETURN;
    END IF;
    -- Not found, try insert and check exceptions in case someone inserted
    -- the same key concurrently as we speak
    BEGIN
      INSERT INTO test (id, value) VALUES (the_id, the_value);
      RETURN;
    EXCEPTION WHEN unique_violation THEN
      -- do nothing; just loop back to the UPDATE
    END;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION loop_merge_test (the_id INTEGER, the_value VARCHAR) RETURNS VOID AS $$

BEGIN

LOOP

UPDATE test SET value = the_value WHERE id = the_id;

IF FOUND THEN

RETURN;

END IF;

-- Not found, try insert and check exceptions in case someone inserted

-- the same key concurrently as we speak

BEGIN

INSERT INTO test (id, value) VALUES (the_id, the_value);

RETURN;

EXCEPTION WHEN unique_violation THEN

-- do nothing; just loop back to the UPDATE

END;

END LOOP;

END;

$$ LANGUAGE plpgsql;

This one will loop until it succeeds in either operation. Using this you can be a bit more sure that after running the merge a row will exist in the table. But, with this one, if there are other unique constraints on the table, we might be in trouble because we did not check which column generated the unique violation. See the documentation for a discussion on this.

Using a Trigger

If the semantics regarding your particular table are such that insert operations should never fail, but should always work as merge operations instead, you can write a trigger function to do it. Let’s make a second table for this test first with:

CREATE TABLE test2 (id INTEGER PRIMARY KEY, value VARCHAR);

1	CREATE TABLE test2 (id INTEGER PRIMARY KEY, value VARCHAR);

Then the trigger:

CREATE OR REPLACE FUNCTION before_insert_test2() RETURNS TRIGGER AS $$
BEGIN
  IF (SELECT COUNT(id) FROM test2 WHERE id = NEW.id) = 1 THEN
    -- found; update, and return null to prevent the insert
    UPDATE test2 SET value = NEW.value WHERE id = NEW.id;
    RETURN NULL;
  END IF;

  -- not found; just return row to insert
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test2_merge BEFORE INSERT ON test2 FOR EACH ROW EXECUTE PROCEDURE before_insert_test2();

CREATE OR REPLACE FUNCTION before_insert_test2() RETURNS TRIGGER AS $$

BEGIN

IF (SELECT COUNT(id) FROM test2 WHERE id = NEW.id) = 1 THEN

-- found; update, and return null to prevent the insert

UPDATE test2 SET value = NEW.value WHERE id = NEW.id;

RETURN NULL;

END IF;

-- not found; just return row to insert

RETURN NEW;

END;

$$ LANGUAGE plpgsql;

CREATE TRIGGER test2_merge BEFORE INSERT ON test2 FOR EACH ROW EXECUTE PROCEDURE before_insert_test2();

There’s a caveat, though. If the id you are inserting already exists, your insert operation will return with an inserted row count of 0. That is kind of both correct and incorrect. Correct when you think of it in insert terms, but incorrect when you think of it in merge terms. Just don’t test your success by looking at the row count.

test=# insert into test2 values (1, 'a');
INSERT 0 1
test=# insert into test2 values (1, 'b');
INSERT 0 0

test=# insert into test2 values (1, 'a');

INSERT 0 1

test=# insert into test2 values (1, 'b');

INSERT 0 0

You can also write your merge operation into one insert query:

test=# insert into test2 values (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e');
INSERT 0 4

1 2	test=# insert into test2 values (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e'); INSERT 0 4

So now one row was updated and four were inserted. And all the dirty stuff is handled server side.

But this approach has the possibility of running into an exception if there are multiple sessions working concurrently with the data. So you must re-try the whole operation if it fails.

Using a Rule

Similar functionality as described above using a trigger can be written using the rule system. Let’s make a third test table:

CREATE TABLE test3 (id INTEGER PRIMARY KEY, value VARCHAR);

1	CREATE TABLE test3 (id INTEGER PRIMARY KEY, value VARCHAR);

This rule will test whether a row with the given id exists, and if it does, it will issue an update query instead:

CREATE OR REPLACE RULE test3_merge AS
  ON INSERT TO test3
  WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id))
  DO INSTEAD
    UPDATE test3 SET value = NEW.value WHERE id = NEW.id;

CREATE OR REPLACE RULE test3_merge AS

ON INSERT TO test3

WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id))

DO INSTEAD

UPDATE test3 SET value = NEW.value WHERE id = NEW.id;

This has the same caveat regarding the reported row count as the trigger way. Your row count will be incremented by 0 if the row already existed and was merely updated.

test=# insert into test3 values (1, 'a');
INSERT 0 1
test=# insert into test3 values (1, 'A'), (2, 'B'), (3, 'c');
INSERT 0 2

test=# insert into test3 values (1, 'a');

INSERT 0 1

test=# insert into test3 values (1, 'A'), (2, 'B'), (3, 'c');

INSERT 0 2

There is an extra SELECT query here, but the PostgreSQL rule system is clever, so it will actually only issue one INSERT operation and one UPDATE operation on the table for the whole data (use EXPLAIN to see the query plan yourself). Triggers are always fired for every row, so they are slower than rules.

The rule method seems like the fastest way of doing this particular merge. Not by any measured benchmark, but just my personal gut feeling. But this method has the possibility of exceptions in concurrent situations, so you must be ready to handle them in your application code, and re-try when needed.

5 thoughts on “Upsert Methods for PostgreSQL”

Golly says:

2014-03-20 at 03:47

Thanks for this. It well saved me time.

yonbeastie says:

2014-04-17 at 20:40

Thanks a lot — great post with helpful examples.

sheillendra says:

2014-06-05 at 09:39

I use :

CREATE OR REPLACE RULE test3_merge AS
ON INSERT TO test3
WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id))
DO INSTEAD
UPDATE test3 SET value = value + NEW.value WHERE id = NEW.id;

and every insert with same key, value is increment. its ok, but why
on table is blank, first insert, the value increment too, for example I’m insert value=1, the result is 2, why

toplu sms says:

2014-07-21 at 12:33

Thanks for this. It well saved me time.

Craig Ringer says:

2015-08-19 at 08:19

Note that PostgreSQL 9.5 has upsert.

https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#INSERT_…_ON_CONFLICT_DO_NOTHING.2FUPDATE_.28.22UPSERT.22.29