Get size of a table
SELECT pg_size_pretty(pg_total_relation_size('schemaname.tablename'));
Select unique combination of fields / tuples
SELECT DISTINCT ON(field1, field2) field1, field2
FROM thetable
Select rows where a combination of fields is not unique
SELECT columnA, columnB, count(*) AS count
FROM thetable
GROUP BY columnA, columnB
HAVING count(*) > 1
Search for rows with array containing value
Assuming, the field appointments
has the type date[]
SELECT *
FROM mtable
WHERE appointments @> ARRAY['2023-09-19'::date]
also valid is this query:
SELECT *
FROM mtable
WHERE ARRAY['2023-09-19'::date] <@ appointments
Max Connections allowed
Show current value and source of configuration
SELECT setting, source, sourcefile, sourceline
FROM pg_settings
WHERE name = 'max_connections';
Set Max Connections
ALTER system SET max_connections = 250;
Kill Postgres Connections
SELECT pg_terminate_backend(pid)
FROM pg_catalog.pg_stat_activity
-- we don't want to kill our own connection
WHERE pid != pg_backend_pid()
-- we don't want to kill connections to other databases
AND datname = 'MYDATABASE'
Get configured Server Timezone
SELECT current_setting('timezone') AS configured_timezone;
Kubernetes Setup via Postgres Helm Chart
Install Helm Chart
We’re installing it as a release named my-postgres
helm install my-postgres oci://registry-1.docker.io/bitnamicharts/postgresql
Get the Passwort as ENV
export POSTGRES_PASSWORD=$(kubectl get secret --namespace testns my-postgres-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)
Post forward and connect to Database
kubectl port-forward --namespace testns svc/my-postgres-postgresql 5432:5432 &
PGPASSWORD="$POSTGRES_PASSWORD" psql --host 127.0.0.1 -U postgres -d postgres -p 5432
More information
helm inspect all oci://registry-1.docker.io/bitnamicharts/postgresql
JSON Datatype, Inserting JSON, Querying JSON
Creating a table with the JSON datatype
CREATE TABLE documents
(
title varchar NOT NULL CONSTRAINT documents_pk PRIMARY KEY,
"metadata" json NOT NULL
);
The JSON to be inserted
{
"title": "Some document",
"author": {
"firstName": "Fanny",
"lastName": "Fisher"
}
}
Insert Query
INSERT INTO documents VALUES('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json)
or as inline query
select *
from (
values
('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json))
) as data(title,jsondata)
or as common table expression (CTE)
with data(title,jsondata) as (
values
('Some document',to_json('{"title": "Some document", "author": {"firstName": "Fanny","lastName": "Fisher"}}'::json))
)
select d.jsondata #>> '{author, firstName}'
from data d
Query for JSON Part
e.g. we query for the document’s author’s first name:
SELECT d.metadata -> 'author' ->> 'firstName' AS firstName
FROM documents d
→ returns the node, →> return a text value
|
a more elegant way for deeper nested JSON structures is using the path operator:
SELECT d.metadata #>> '{author, firstName}' AS firstName
FROM documents d
Resources:
Postgres Inline Data or Table Value Constructor
Input data without using a table
select *
from (
values
('A', 1),
('A', 2),
('A', 3),
('A', 4),
('B', 3),
('B', 4),
('B', 5),
('B', 6)
) as data(importkey, id)
Query for array row contains valueA or valueB
In the following example, we’re querying for all rows where the date 2024-01-29
or 2024-01-30
are contained in the column dates
of type date[]
select id, dates from (
values
(1, array['2024-01-29'::date,'2024-01-31'::date]),
(2, array['2024-01-28'::date, '2024-01-30'::date]),
(3, array['2024-01-27'::date, '2024-01-28'::date, '2024-01-31'::date])
) as mydata(id, dates)
where dates && '{2024-01-29,2024-01-30}'
yields the following result:
id|dates |
--+-----------------------+
1|{2024-01-29,2024-01-31}|
2|{2024-01-28,2024-01-30}|
Postgres Expand Array to Set of Rows
An example:
-- Select UUID and Delivery Date from Dataset
-- Calculate the import date to begin at 00:00:00 and the
-- delivery date to start next day at 12:00:00
select
(id) as "import_id",
(deliverydate + '00:00:00'::time) as "date_begin",
(deliverydate + interval '1 day' + '12:00:00'::time) as "date_deliver"
from unnest(
'{91f8f3e5-9d99-4f2b-be2c-7573262440fd, f3f0b446-3336-4704-9abd-1c8fa05913b4, 866d1b46-545e-4572-ba7c-ab75052b08e5}':: uuid[],
'{2023-1-2, 2023-1-4, 2023-1-9}':: date[]
) as plan(id, deliverydate);
Produces the following result:
import_id |date_begin |date_deliver |
------------------------------------+-----------------------+-----------------------+
91f8f3e5-9d99-4f2b-be2c-7573262440fd|2023-01-02 00:00:00.000|2023-01-03 12:00:00.000|
f3f0b446-3336-4704-9abd-1c8fa05913b4|2023-01-04 00:00:00.000|2023-01-05 12:00:00.000|
866d1b46-545e-4572-ba7c-ab75052b08e5|2023-01-09 00:00:00.000|2023-01-10 12:00:00.000|
Select all rows of first query that are not part of the second query with EXCEPT
Types and columns of both queries must match! |
select id
from (
values
('A', 1),
('A', 2),
('A', 3),
('A', 4),
('B', 3),
('B', 4),
('B', 5),
('B', 6)
) as data(importkey, id)
where importkey = 'A'
except
select id
from (
values
('A', 1),
('A', 2),
('A', 3),
('A', 4),
('B', 3),
('B', 4),
('B', 5),
('B', 6)
) as data(importkey, id)
where importkey = 'B'
Create new table and copy structure and data of another table
CREATE TABLE mynewtable
(LIKE myoldtable INCLUDING ALL);
Activate extended format for query output
\x on
Connect to database
psql -d DATABASENAME -h HOSTNAME -U USERNAME -p PORT
Explorative commands
-
\?
: show commands/help -
\q
: quit -
\c <DATABASE>
: connect to database -
\d <TABLE>
: show table definition, cols, triggers …. -
\d+ <TABLE>
: show detailed table information -
\l
: list databases -
\dt
: list tables -
\di
: list indexes -
\dn
: list schemas -
\du
: list users -
\dy
: list events -
\df
: list functions -
\dT+
: List all data types -
\dv
: List views
Resources:
Clear Screen Workaround
\! clear
Run local Postgres Database with Docker
docker run --name somename -p5432:5432 -e POSTGRES_PASSWORD=somefancypassword -d mydesiredusername postgres
Generate Lotto Numbers
5 of 1 to 50
SELECT
(random() * 50 + 1)::int AS lotto_number
FROM
generate_series(1, 5)
ORDER BY 1 ASC
Table Partitions
Select Tables with partitioning
also filtering out some tables we don’t want to see …
SELECT partrelid::regclass::text FROM pg_partitioned_table WHERE partrelid::regclass::text NOT IN ('tablethatdoesnotexist', 'notexistingtoo') order by 1
Upserts
Upsert means to either insert or update existing records (or do nothing) .. but this must be guaranteed |
On table columns
INSERT INTO <table>
VALUES('something', 123)
ON CONFLICT ON (col1, col2)
DO NOTHING
On table constraints
INSERT INTO <table>
VALUES('something', 123)
ON CONFLICT ON CONSTRAINT unique_cols
DO UPDATE SET col2 = 'xoxo'
Resources
Ranges and Multiranges in Postgres
We’re using `multidaterange`s to model a hotel booking table where duration marks the date ranges where a customer has booked a room.
-- Sally has booked room 23 from 3rd june (inkl.) to 6th june (excl.) and 8th junge to 10th june (both inkl.)
-- Tim has booked room 12 from 2nd june (inkl.) to 5th june (excl.) and 19th june to 22nd june (both inkl.)
with bookings(customer_name, room_number, duration) as (
values
('Sally', 23, '{[2024-06-03,2024-06-06), [2024-06-08,2024-06-10]}'::datemultirange),
('Tim', 12, '{[2024-06-02,2024-06-05), [2024-06-19,2024-06-22]}'::datemultirange)
)
select *
from bookings
where duration @> '[2024-06-04, 2024-06-05]'::daterange
This produces the following outcome:
customer_name|room_number|duration |
-------------+-----------+-------------------------------------------------+
Sally | 23|{[2024-06-03,2024-06-06),[2024-06-08,2024-06-11)}|
Resources
Mapping with JDBC and JDBCObject
PGobject dateRange = new PGobject();
dateRange.setType("daterange");
dateRange.setValue("[2024-06-19, 2024-06-25]");
PreparedStatement stmt = conn.prepareStatement(sql);
// as a query parameter
stmt.setObject(1,dateRange);
// for mapping values from the resultset
PGobject daterangeValue = (PGobject) rs.getObject("daterange_col");
Using Domain Types
e.g. we have a composite key consisting of a user’s id and his locale, separated by a double-colon so that it might look like this: 12345:EN
we could create a domain type based on varchar
and add a check for this pattern
create domain userkey as varchar check (VALUE ~* '^\d+:[A-Z]+$')
afterwards we may use this type
create table userlogin (user userkey, loggedin timestamp);
Lateral Joins
Allows to combine each row from table with the result of a lateral subquery.
In the following example, we’re joining people’s reading preferences with matching book titles:
WITH persons AS (
SELECT * FROM (VALUES
(1, 'Alice', 'Horror'),
(2, 'Bob', 'Poetry'),
(3, 'Sally', 'Romance')
) AS person(id, name, preferred_book_genre)
),
books AS (
SELECT * FROM (VALUES
(101, 'The Call of Cthulhu', 'Horror'),
(102, 'Best of Poe', 'Poetry'),
(103, 'Sunset of Love', 'Romance'),
(104, 'Nyarlathothep', 'Horror'),
(105, 'The Raven', 'Poetry')
) AS book(id, title, genre)
) -- everything before this marker is just there to provide some sample data
SELECT
person.id,
person.name,
person.preferred_book_genre,
book.id,
book.title
FROM
persons person
CROSS JOIN LATERAL (
SELECT
id,
title,
genre
FROM
books
WHERE
books.genre = person.preferred_book_genre
) book;
Produces the following result:
+--+-----+--------------------+---+-------------------+
|id|name |preferred_book_genre|id |title |
+--+-----+--------------------+---+-------------------+
|1 |Alice|Horror |101|The Call of Cthulhu|
|2 |Bob |Poetry |102|Best of Poe |
|3 |Sally|Romance |103|Sunset of Love |
|1 |Alice|Horror |104|Nyarlathothe |
|2 |Bob |Poetry |105|The Raven |
+--+-----+--------------------+---+-------------------+
Another example expands the JSONB Array of hobbies:
with mytable (username, jsondata) as (
values
('Sally', '{"hobbies":["swimming", "reading"]}'::jsonb),
('Tim', '{"hobbies":["diving", "coding", "cooking"]}'::jsonb),
('Maggie', '{"hobbies":["reading", "photography", "swimming"]}'::jsonb)
)
select t.username as username, hobby
from mytable t
cross join lateral jsonb_array_elements(t.jsondata -> 'hobbies') as hobby
This produces the following output:
username|hobby |
--------+-------------+
Sally |"swimming" |
Sally |"reading" |
Tim |"diving" |
Tim |"coding" |
Tim |"cooking" |
Maggie |"reading" |
Maggie |"photography"|
Maggie |"swimming" |
Analyze RDS Logs with pgbadger
cat all.log | docker run -i --rm matsuu/pgbadger -f rds - -o - -x html > out.html
Download Postgres Logs from AWS RDS
aws rds describe-db-instances
aws rds describe-db-log-files --db-instance-identifier my-dev-db
{
"DescribeDBLogFiles": [
{
"LogFileName": "error/postgres.log",
"LastWritten": 1720443677790,
"Size": 963
},
{
"LogFileName": "error/postgresql.log.2024-07-06-06",
"LastWritten": 1720249200734,
"Size": 7700843
},
{
"LogFileName": "error/postgresql.log.2024-07-06-07",
"LastWritten": 1720252800007,
"Size": 7717431
},
{
"LogFileName": "error/postgresql.log.2024-07-06-08",
"LastWritten": 1720256399250,
"Size": 7689466
},
{
"LogFileName": "error/postgresql.log.2024-07-06-09",
"LastWritten": 1720259998404,
"Size": 7711202
},
{
"LogFileName": "error/postgresql.log.2024-07-06-10",
"LastWritten": 1720263600737,
"Size": 7688732
}
]
}
aws rds describe-db-log-files --db-instance-identifier my-dev-db | jq '.DescribeDBLogFiles[].LogFileName'
// or better
aws rds describe-db-log-files --db-instance-identifier my-dev-db --query 'DescribeDBLogFiles[].LogFileName' --output text
"error/postgresql.log.2024-07-08-10"
"error/postgresql.log.2024-07-08-11"
"error/postgresql.log.2024-07-08-12"
"error/postgresql.log.2024-07-08-13"
"error/postgresql.log.2024-07-08-14"
"error/postgresql.log.2024-07-08-15"
...
#!/bin/bash
set -x
the_region="eu-central-1"
the_db_instance="my-dev-db"
for filename in $( aws rds describe-db-log-files --region $the_region --db-instance-identifier $the_db_instance | jq -r '.DescribeDBLogFiles[] | .LogFileName' )
do
aws rds download-db-log-file-portion --region $the_region --db-instance-identifier $the_db_instance --output text --no-paginate --log-file $filename >> ./postgres_aws_rds_logs_$the_db_instance.log
done
Hikari Connection Pool Sizing
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. … There hasn’t been any analysis so far regarding how well the formula works with SSDs.
connections = ((core_count * 2) + effective_spindle_count)