Upsert Methods for PostgreSQL

Upsert Methods for PostgreSQL
  • 0.00 / 5 5
0 votes, 0.00 avg. rating (0% score)

PostgreSQL has no “upsert” or “replace” or “insert .. on duplicate key update” or “merge into” construct to conditionally either insert a new row, or if a row with the key already exists, to either update the existing row with the new values, or first delete the old row and then insert a new one. This has been discussed many times in the posgresql developers mailing lists, and plans to implement the SQL standard MERGE operation have been devised. But there is still no such functionalily in PostgreSQL as of version 9.3. Here are some examples on how to implement the functionality in different ways, using either a function, a trigger function or a rule. None of them are perfect. There are trade-offs to be made.

To implement similar functionality, there are a couple of ways. One is to first explicitly check if a row already exists, and then, depending on which semantics, merge or replace, you want, issue either INSERT or UPDATE query (for merge), or either INSERT or DELETE/INSERT queries (for replace).

Another way is to first try to INSERT the new value, and then check if an exception is raised. If not, the insert worked. If an exception is raised, the row exists and must be either UPDATEd (for merge) or DELETEd and a new one INSERTed (for replace).

Doing it for one row is fine, but if you need to do it for thousands of rows in one shot, you will run into either your whole transaction aborting after the first duplicate key violation, or the fact that you must issue selects and inserts/updates/deletes according to application logic over the network for each row. And doing each operation in a separate transaction will be slow. Also, having multiple database sessions doing merges on a table simultaneously may result in problems.

For a very good in-depth discussion on the topic, see this article: Why is UPSERT so complicated?

Here’s an example table we’ll be working on:

Let’s start merging values. The first two methods need application logic every step of the way to work.

Using SELECT

Conceptually it looks like this:

Here, the application must check after each SELECT if the row exists, and issue either an UPDATE or an INSERT. That means a lot of traffic between client and server.

Using Savepoints

Conceptually it looks like this:

Your code will have to do the exception checking after each INSERT and roll back to the previous savepoint. And your application must do the checking every step of the way.

Using a Function

We can move the checking part of the job to the database server, reducing the needed network round-trips between the database server and the application.

This function will first try to insert, and failing that, will update an existing row. You can use this without fearing that your transaction gets aborted due to unique constraint violations on the way.

This makes it possible to issue all the queries in one shot to the server, without needing to use application logic between the queries.

Merge some rows:

Or even simply:

This works if you are content with the possibility that some other transaction might delete the row you have just merged before your transaction returns and you think you have an existing row in the database. If you know nobody does deletes on the table, you’re fine. Another version which loops until a row exists and is up-to-date can work better. This one is adapted straight from the PostgreSQL documentation:

This one will loop until it succeeds in either operation. Using this you can be a bit more sure that after running the merge a row will exist in the table. But, with this one, if there are other unique constraints on the table, we might be in trouble because we did not check which column generated the unique violation. See the documentation for a discussion on this.

Using a Trigger

If the semantics regarding your particular table are such that insert operations should never fail, but should always work as merge operations instead, you can write a trigger function to do it. Let’s make a second table for this test first with:

Then the trigger:

There’s a caveat, though. If the id you are inserting already exists, your insert operation will return with an inserted row count of 0. That is kind of both correct and incorrect. Correct when you think of it in insert terms, but incorrect when you think of it in merge terms. Just don’t test your success by looking at the row count.

You can also write your merge operation into one insert query:

So now one row was updated and four were inserted. And all the dirty stuff is handled server side.

But this approach has the possibility of running into an exception if there are multiple sessions working concurrently with the data. So you must re-try the whole operation if it fails.

Using a Rule

Similar functionality as described above using a trigger can be written using the rule system. Let’s make a third test table:

This rule will test whether a row with the given id exists, and if it does, it will issue an update query instead:

This has the same caveat regarding the reported row count as the trigger way. Your row count will be incremented by 0 if the row already existed and was merely updated.

There is an extra SELECT query here, but the PostgreSQL rule system is clever, so it will actually only issue one INSERT operation and one UPDATE operation on the table for the whole data (use EXPLAIN to see the query plan yourself). Triggers are always fired for every row, so they are slower than rules.

The rule method seems like the fastest way of doing this particular merge. Not by any measured benchmark, but just my personal gut feeling. But this method has the possibility of exceptions in concurrent situations, so you must be ready to handle them in your application code, and re-try when needed.

4 thoughts on “Upsert Methods for PostgreSQL”

  1. I use :

    CREATE OR REPLACE RULE test3_merge AS
    ON INSERT TO test3
    WHERE (EXISTS (SELECT 1 FROM test3 WHERE id = NEW.id))
    DO INSTEAD
    UPDATE test3 SET value = value + NEW.value WHERE id = NEW.id;

    and every insert with same key, value is increment. its ok, but why
    on table is blank, first insert, the value increment too, for example I’m insert value=1, the result is 2, why

Leave a Reply