Get size of a table

SELECT pg_size_pretty(pg_total_relation_size('schemaname.tablename'));

Select unique combination of fields / tuples

SELECT DISTINCT ON(field1, field2) field1, field2
FROM thetable

Select rows where a combination of fields is not unique

SELECT columnA, columnB, count(*) AS count
FROM thetable
GROUP BY columnA, columnB
HAVING count(*) > 1

Search for rows with array containing value

Assuming, the field appointments has the type date[]

SELECT *
FROM mtable
WHERE appointments @> ARRAY['2023-09-19'::date]

also valid is this query:

SELECT *
FROM mtable
WHERE ARRAY['2023-09-19'::date] <@ appointments

Max Connections allowed

Show current value and source of configuration

SELECT setting, source, sourcefile, sourceline
FROM pg_settings
WHERE name = 'max_connections';

Set Max Connections

ALTER system SET max_connections = 250;

Kill Postgres Connections

SELECT pg_terminate_backend(pid)
FROM pg_catalog.pg_stat_activity
-- we don't want to kill our own connection
WHERE pid != pg_backend_pid()
-- we don't want to kill connections to other databases
AND datname = 'MYDATABASE'

Get configured Server Timezone

SELECT current_setting('timezone') AS configured_timezone;

Kubernetes Setup via Postgres Helm Chart

Install Helm Chart

We’re installing it as a release named my-postgres

helm install my-postgres oci://registry-1.docker.io/bitnamicharts/postgresql

Get the Passwort as ENV

export POSTGRES_PASSWORD=$(kubectl get secret --namespace testns my-postgres-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)

Post forward and connect to Database

kubectl port-forward --namespace testns svc/my-postgres-postgresql 5432:5432 &
PGPASSWORD="$POSTGRES_PASSWORD" psql --host 127.0.0.1 -U postgres -d postgres -p 5432

More information

helm inspect all oci://registry-1.docker.io/bitnamicharts/postgresql

JSON Datatype, Inserting JSON, Querying JSON

Creating a table with the JSON datatype

Table create
CREATE TABLE documents
(
  title varchar NOT NULL CONSTRAINT documents_pk PRIMARY KEY,
  "metadata" json NOT NULL
);

The JSON to be inserted

{
  "title": "Some document",
  "author": {
    "firstName": "Fanny",
    "lastName": "Fisher"
  }
}

Insert Query

INSERT INTO documents VALUES('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json)

or as inline query

select *
from (
 values
 ('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json))
 ) as data(title,jsondata)

or as common table expression (CTE)

with data(title,jsondata) as (
 values
 ('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json))
)
select d.jsondata #>> '{author, firstName}'
from data d

Query for JSON Part

e.g. we query for the document’s author’s first name:

JSON Query
SELECT d.metadata -> 'author' ->> 'firstName' AS firstName
FROM documents d
returns the node, →> return a text value

a more elegant way for deeper nested JSON structures is using the path operator:

SELECT d.metadata #>> '{author, firstName}' AS firstName
FROM documents d

Resources:

Postgres Inline Data or Table Value Constructor

Input data without using a table

select *
from (
 values
 ('A', 1),
 ('A', 2),
 ('A', 3),
 ('A', 4),
 ('B', 3),
 ('B', 4),
 ('B', 5),
 ('B', 6)
) as data(importkey, id)

Query for array row contains valueA or valueB

In the following example, we’re querying for all rows where the date 2024-01-29 or 2024-01-30 are contained in the column dates of type date[]

select id, dates from (
 values
  (1, array['2024-01-29'::date,'2024-01-31'::date]),
  (2, array['2024-01-28'::date, '2024-01-30'::date]),
  (3, array['2024-01-27'::date, '2024-01-28'::date, '2024-01-31'::date])
) as mydata(id, dates)
where dates && '{2024-01-29,2024-01-30}'

yields the following result:

id|dates                  |
--+-----------------------+
 1|{2024-01-29,2024-01-31}|
 2|{2024-01-28,2024-01-30}|

Postgres Expand Array to Set of Rows

An example:

-- Select UUID and Delivery Date from Dataset
-- Calculate the import date to begin at 00:00:00 and the
-- delivery date to start next day at 12:00:00
select
  (id) as "import_id",
  (deliverydate + '00:00:00'::time) as "date_begin",
  (deliverydate + interval '1 day' + '12:00:00'::time) as "date_deliver"
from unnest(
'{91f8f3e5-9d99-4f2b-be2c-7573262440fd, f3f0b446-3336-4704-9abd-1c8fa05913b4, 866d1b46-545e-4572-ba7c-ab75052b08e5}':: uuid[],
'{2023-1-2, 2023-1-4, 2023-1-9}':: date[]
) as plan(id, deliverydate);

Produces the following result:

import_id                           |date_begin             |date_deliver           |
------------------------------------+-----------------------+-----------------------+
91f8f3e5-9d99-4f2b-be2c-7573262440fd|2023-01-02 00:00:00.000|2023-01-03 12:00:00.000|
f3f0b446-3336-4704-9abd-1c8fa05913b4|2023-01-04 00:00:00.000|2023-01-05 12:00:00.000|
866d1b46-545e-4572-ba7c-ab75052b08e5|2023-01-09 00:00:00.000|2023-01-10 12:00:00.000|

Select all rows of first query that are not part of the second query with EXCEPT

Types and columns of both queries must match!
select id
from (
 values
 ('A', 1),
 ('A', 2),
 ('A', 3),
 ('A', 4),
 ('B', 3),
 ('B', 4),
 ('B', 5),
 ('B', 6)
) as data(importkey, id)
where importkey = 'A'
except
select id
from (
 values
 ('A', 1),
 ('A', 2),
 ('A', 3),
 ('A', 4),
 ('B', 3),
 ('B', 4),
 ('B', 5),
 ('B', 6)
) as data(importkey, id)
where importkey = 'B'

Create new table and copy structure and data of another table

CREATE TABLE mynewtable
  (LIKE myoldtable INCLUDING ALL);

Activate extended format for query output

\x on

Connect to database

psql -d DATABASENAME -h HOSTNAME -U USERNAME -p PORT

Explorative commands

  • \?: show commands/help

  • \q: quit

  • \c <DATABASE>: connect to database

  • \d <TABLE>: show table definition, cols, triggers …​.

  • \d+ <TABLE>: show detailed table information

  • \l: list databases

  • \dt: list tables

  • \di: list indexes

  • \dn: list schemas

  • \du: list users

  • \dy: list events

  • \df: list functions

  • \dT+: List all data types

  • \dv: List views

Resources:

Clear Screen Workaround

\! clear

Run local Postgres Database with Docker

docker run --name somename -p5432:5432 -e POSTGRES_PASSWORD=somefancypassword -d mydesiredusername postgres

Generate Lotto Numbers

5 of 1 to 50

SELECT
  (random() * 50 + 1)::int AS lotto_number
FROM
  generate_series(1, 5)
ORDER BY 1 ASC

Table Partitions

Select Tables with partitioning

also filtering out some tables we don’t want to see …​

SELECT partrelid::regclass::text FROM pg_partitioned_table WHERE partrelid::regclass::text NOT IN ('tablethatdoesnotexist', 'notexistingtoo') order by 1

Upserts

Upsert means to either insert or update existing records (or do nothing) .. but this must be guaranteed

On table columns

on conflict with the values of col1 and col2, do nothing
INSERT INTO <table>
VALUES('something', 123)
ON CONFLICT ON (col1, col2)
DO NOTHING

On table constraints

on conflict with constraint unique_cols, update col2
INSERT INTO <table>
VALUES('something', 123)
ON CONFLICT ON CONSTRAINT unique_cols
DO UPDATE SET col2 = 'xoxo'

Resources

Ranges and Multiranges in Postgres

We’re using `multidaterange`s to model a hotel booking table where duration marks the date ranges where a customer has booked a room.

-- Sally has booked room 23 from 3rd june (inkl.) to 6th june (excl.) and 8th junge to 10th june (both inkl.)
-- Tim has booked room 12 from 2nd june (inkl.) to 5th june (excl.) and 19th june to 22nd june (both inkl.)
with bookings(customer_name, room_number, duration) as (
 values
 ('Sally', 23, '{[2024-06-03,2024-06-06), [2024-06-08,2024-06-10]}'::datemultirange),
 ('Tim', 12, '{[2024-06-02,2024-06-05), [2024-06-19,2024-06-22]}'::datemultirange)
)
select *
from bookings
where duration @> '[2024-06-04, 2024-06-05]'::daterange

This produces the following outcome:

customer_name|room_number|duration                                         |
-------------+-----------+-------------------------------------------------+
Sally        |         23|{[2024-06-03,2024-06-06),[2024-06-08,2024-06-11)}|

Resources

Mapping with JDBC and JDBCObject

PGobject dateRange = new PGobject();
dateRange.setType("daterange");
dateRange.setValue("[2024-06-19, 2024-06-25]");

PreparedStatement stmt = conn.prepareStatement(sql);
// as a query parameter
stmt.setObject(1,dateRange);

// for mapping values from the resultset
PGobject daterangeValue = (PGobject) rs.getObject("daterange_col");

Using Domain Types

e.g. we have a composite key consisting of a user’s id and his locale, separated by a double-colon so that it might look like this: 12345:EN

we could create a domain type based on varchar and add a check for this pattern

create domain userkey as varchar check (VALUE ~* '^\d+:[A-Z]+$')

afterwards we may use this type

create table userlogin (user userkey, loggedin timestamp);

Lateral Joins

Allows to combine each row from table with the result of a lateral subquery.

In the following example, we’re joining people’s reading preferences with matching book titles:

WITH persons AS (
    SELECT * FROM (VALUES
                       (1, 'Alice', 'Horror'),
                       (2, 'Bob', 'Poetry'),
                       (3, 'Sally', 'Romance')
                  ) AS person(id, name, preferred_book_genre)
),
     books AS (
         SELECT * FROM (VALUES
                            (101, 'The Call of Cthulhu', 'Horror'),
                            (102, 'Best of Poe', 'Poetry'),
                            (103, 'Sunset of Love', 'Romance'),
                            (104, 'Nyarlathothep', 'Horror'),
                            (105, 'The Raven', 'Poetry')
                       ) AS book(id, title, genre)
     ) -- everything before this marker is just there to provide some sample data
SELECT
    person.id,
    person.name,
    person.preferred_book_genre,
    book.id,
    book.title
FROM
    persons person
        CROSS JOIN LATERAL (
        SELECT
            id,
            title,
            genre
        FROM
            books
        WHERE
            books.genre = person.preferred_book_genre
        ) book;

Produces the following result:

+--+-----+--------------------+---+-------------------+
|id|name |preferred_book_genre|id |title              |
+--+-----+--------------------+---+-------------------+
|1 |Alice|Horror              |101|The Call of Cthulhu|
|2 |Bob  |Poetry              |102|Best of Poe        |
|3 |Sally|Romance             |103|Sunset of Love     |
|1 |Alice|Horror              |104|Nyarlathothe       |
|2 |Bob  |Poetry              |105|The Raven          |
+--+-----+--------------------+---+-------------------+

Another example expands the JSONB Array of hobbies:

with mytable (username, jsondata) as (
  values
    ('Sally', '{"hobbies":["swimming", "reading"]}'::jsonb),
    ('Tim', '{"hobbies":["diving", "coding", "cooking"]}'::jsonb),
    ('Maggie', '{"hobbies":["reading", "photography", "swimming"]}'::jsonb)
)
select t.username as username, hobby
from mytable t
cross join lateral jsonb_array_elements(t.jsondata -> 'hobbies') as hobby

This produces the following output:

username|hobby        |
--------+-------------+
Sally   |"swimming"   |
Sally   |"reading"    |
Tim     |"diving"     |
Tim     |"coding"     |
Tim     |"cooking"    |
Maggie  |"reading"    |
Maggie  |"photography"|
Maggie  |"swimming"   |

Analyze RDS Logs with pgbadger

cat all.log | docker run -i --rm matsuu/pgbadger -f rds - -o - -x html > out.html

Download Postgres Logs from AWS RDS

Get AWS RDS instances
aws rds describe-db-instances
Get available log files
aws rds describe-db-log-files --db-instance-identifier my-dev-db

{
    "DescribeDBLogFiles": [
        {
            "LogFileName": "error/postgres.log",
            "LastWritten": 1720443677790,
            "Size": 963
        },
        {
            "LogFileName": "error/postgresql.log.2024-07-06-06",
            "LastWritten": 1720249200734,
            "Size": 7700843
        },
        {
            "LogFileName": "error/postgresql.log.2024-07-06-07",
            "LastWritten": 1720252800007,
            "Size": 7717431
        },
        {
            "LogFileName": "error/postgresql.log.2024-07-06-08",
            "LastWritten": 1720256399250,
            "Size": 7689466
        },
        {
            "LogFileName": "error/postgresql.log.2024-07-06-09",
            "LastWritten": 1720259998404,
            "Size": 7711202
        },
        {
            "LogFileName": "error/postgresql.log.2024-07-06-10",
            "LastWritten": 1720263600737,
            "Size": 7688732
        }
    ]
}
Filter out just the log file names
aws rds describe-db-log-files --db-instance-identifier my-dev-db | jq '.DescribeDBLogFiles[].LogFileName'
// or better
aws rds describe-db-log-files --db-instance-identifier my-dev-db --query 'DescribeDBLogFiles[].LogFileName' --output text

"error/postgresql.log.2024-07-08-10"
"error/postgresql.log.2024-07-08-11"
"error/postgresql.log.2024-07-08-12"
"error/postgresql.log.2024-07-08-13"
"error/postgresql.log.2024-07-08-14"
"error/postgresql.log.2024-07-08-15"
...
Bash Script to download RDS Logs for a DB instance
#!/bin/bash
set -x

the_region="eu-central-1"
the_db_instance="my-dev-db"

for filename in $( aws rds describe-db-log-files --region $the_region --db-instance-identifier $the_db_instance | jq -r '.DescribeDBLogFiles[] | .LogFileName' )
do
aws rds download-db-log-file-portion --region $the_region --db-instance-identifier $the_db_instance --output text --no-paginate --log-file $filename  >> ./postgres_aws_rds_logs_$the_db_instance.log
done

Hikari Connection Pool Sizing

A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. …​ There hasn’t been any analysis so far regarding how well the formula works with SSDs.

— Hikari Docs
connections = ((core_count * 2) + effective_spindle_count)