`pg_tde` test PR #3

jeltz · 2025-02-24T18:13:24Z

This is a testy PR!

1. TruncateMultiXact() performs the SLRU truncations in a critical section. Deleting the SLRU segments calls ForwardSyncRequest(), which will try to compact the request queue if it's full (CompactCheckpointerRequestQueue()). That in turn allocates memory, which is not allowed in a critical section. Backtrace: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] To fix, bail out in CompactCheckpointerRequestQueue() without doing anything, if it's called in a critical section. That covers the above call path, as well as any other similar cases where RegisterSyncRequest might be called in a critical section. 2. After fixing that, another problem became apparent: Autovacuum process doing that truncation can deadlock with the checkpointer process. TruncateMultiXact() sets "MyProc->delayChkptFlags |= DELAY_CHKPT_START". If the sync request queue is full and cannot be compacted, the process will repeatedly sleep and retry, until there is room in the queue. However, if the checkpointer is trying to start a checkpoint at the same time, and is waiting for the DELAY_CHKPT_START processes to finish, the queue will never shrink. More concretely, the autovacuum process is stuck here: #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570 #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516 #3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:538 postgres#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614 postgres#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495 postgres#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566 postgres#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006 postgres#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201 postgres#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917 postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760 postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550 postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569 and the checkpointer is stuck here: #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50 #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098 postgres#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464 To fix, add AbsorbSyncRequests() to the loops where the checkpointer waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to finish. Backpatch to v14. Before that, SLRU deletion didn't call RegisterSyncRequest, which avoided this failure. I'm not sure if there are other similar scenarios on older versions, but we haven't had any such reports. Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi

These tests are copy of original pg_resetwal tests with enalbed WAL encryption and removed flags validation as we interested here only in proper enrypted WAL handling.

The code we ran when redoing a smgrcreate() was overly complex and not necessary to run. If the file descriptor was open we skipped it but if not we ran a bit of pointless code since the key creation is handled by it is own WAL record which is always before the SMGR creation. Also improve some outdated comments.

This is the only field from the InternalKey structure that's actually used, and other "raw" crypto functions doesn't use these structures.

This function probably belongs elsewhere than in the key file code, but that's where it currently resides so expose it so it can also be used elsewhere.

We want to add timeline information to the wal keys and cannot easily do so without affecting existing clusters' relation key files. This commit does the bare minimum to separate the two completely and as such contains a fair bit of duplicated code. The file format for the WAL key file is exactly the same before and after this commit. There is _a lot_ of cleanup that will have to be done on both sides of this separation, but this is a bit of "it gets worse before it gets better".

This is to prevent this file from ever getting mixed up with a relation key file as these might no longer be of the same format.

Instead of 1664_keys it's now called wal_encryption_keys. This lets us use a constant name for it instead of generating it from an Oid pretending it's a relation key file. Also remove some now unused Oid parameters to functions.

It happened to work by coincidence since the two structs had the same shape but is a bug waiting to happen.

Since the RelFileLocator has never actually been used for WAL keys we can remove all traces of it from the new file and from the code.

The other keys are stored in <oid>_keys so wal_keys fits better into that pattern than the more redundant wal_encryption_keys where "encryption" does not add any information but just makes the path longer.

Some definitions should be in the .c files rather than in the header files since they are just used in one file.

Add them as unused fields in the TDEMapEntry structure however, so we do not affect existing key files.

Before this commit, WAL keys didn't mind TLI at all. But after pg_rewind, for example, pg_wal/ may contain segments from two timelines. And the wal reader choosing the key may pick the wrong one because LSNs of different TLIs may overlap. There was also another bug: There is a key with the start LSN 0/30000 in TLI 1. And after the start in TLI 2, the wal writer creates a new key with the SN 0/30000, but in TLI 2. But the reader wouldn't fetch the latest key because w/o TLI, these are the same. This commit adds TLI to the Internal keys and makes use of it along with LSN for key compares.

Sincw we never delete WAL keys this logic only confuses the reader of the code. Plus we can optimize the insertion of a new WAL key by using seek().

Let's stop pretending that we support more than two status: empty or that there is a SMGR key.

Also rename enum variants for consistency plus renumber the types for the WAL keys which is fine since this file is newly introduced which makes breaking backwards compatibility not an issue.

When WAL is streamed during the backup (default mode), it comes in unencrypted. But we need keys to encrypt it. For now, we expect that the user would put `pg_tde` dir containing the `1664_key` and `1664_providers` into the destination directory before starting the backup. We encrypt the streamed WAL according to internal keys. No `pg_tde` dir means no streamed WAL encryption.

Add missing key vaidation test to meson build configuration.

Commit e2d4ef8 (the fix for CVE-2017-7484) added security checks to the selectivity estimation functions to prevent them from running user-supplied operators on data obtained from pg_statistic if the user lacks privileges to select from the underlying table. In cases involving inheritance/partitioning, those checks were originally performed against the child RTE (which for plain inheritance might actually refer to the parent table). Commit 553d2ec then extended that to also check the parent RTE, allowing access if the user had permissions on either the parent or the child. It turns out, however, that doing any checks using the child RTE is incorrect, since securityQuals is set to NULL when creating an RTE for an inheritance child (whether it refers to the parent table or the child table), and therefore such checks do not correctly account for any RLS policies or security barrier views. Therefore, do the security checks using only the parent RTE. This is consistent with how RLS policies are applied, and the executor's ACL checks, both of which use only the parent table's permissions/policies. Similar checks are performed in the extended stats code, so update that in the same way, centralizing all the checks in a new function. In addition, note that these checks by themselves are insufficient to ensure that the user has access to the table's data because, in a query that goes via a view, they only check that the view owner has permissions on the underlying table, not that the current user has permissions on the view itself. In the selectivity estimation functions, there is no easy way to navigate from underlying tables to views, so add permissions checks for all views mentioned in the query to the planner startup code. If the user lacks permissions on a view, a permissions error will now be reported at planner-startup, and the selectivity estimation functions will not be run. Checking view permissions at planner-startup in this way is a little ugly, since the same checks will be repeated at executor-startup. Longer-term, it might be better to move all the permissions checks from the executor to the planner so that permissions errors can be reported sooner, instead of creating a plan that won't ever be run. However, such a change seems too far-reaching to be back-patched. Back-patch to all supported versions. In v13, there is the added complication that UPDATEs and DELETEs on inherited target tables are planned using inheritance_planner(), which plans each inheritance child table separately, so that the selectivity estimation functions do not know that they are dealing with a child table accessed via its parent. Handle that by checking access permissions on the top parent table at planner-startup, in the same way as we do for views. Any securityQuals on the top parent table are moved down to the child tables by inheritance_planner(), so they continue to be checked by the selectivity estimation functions. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13 Security: CVE-2025-8713

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 4f9af069289c30fc32337b844fb1db25d7b11e9b

Maliciously-crafted object names could achieve SQL injection during restore. CVE-2012-0868 fixed this class of problem at the time, but later work reintroduced three cases. Commit bc8cd50 (back-patched to v11+ in 2023-05 releases) introduced the pg_dump case. Commit 6cbdbd9 (v12+) introduced the two pg_dumpall cases. Move sanitize_line(), unchanged, to dumputils.c so pg_dumpall has access to it in all supported versions. Back-patch to v13 (all supported versions). Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Backpatch-through: 13 Security: CVE-2025-8715

A malicious server could inject psql meta-commands into plain-text dump output (i.e., scripts created with pg_dump --format=plain, pg_dumpall, or pg_restore --file) that are run at restore time on the machine running psql. To fix, introduce a new "restricted" mode in psql that blocks all meta-commands (except for \unrestrict to exit the mode), and teach pg_dump, pg_dumpall, and pg_restore to use this mode in plain-text dumps. While at it, encourage users to only restore dumps generated from trusted servers or to inspect it beforehand, since restoring causes the destination to execute arbitrary code of the source superusers' choice. However, the client running the dump and restore needn't trust the source or destination superusers. Reported-by: Martin Rakhmanov Reported-by: Matthieu Denais <litezeraw@gmail.com> Reported-by: RyotaK <ryotak.mail@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Security: CVE-2025-8714 Backpatch-through: 13

Security: CVE-2025-8713, CVE-2025-8714, CVE-2025-8715

After sync with upstream PostgreSQL version 17.6 we have to update Percona Server for PostgreSQL to 17.6.1

A few tests output changed after sync with upsteam PostgreSQL 17.6, we have to update TDE realted tests with the same changes.

New version of Codechecker resolved issues with python and other problems that we faced. So it's time to update.

Percona version change was missed in some configuration files.

This is an attempt at making the file more similar to the other contrib extensions' .gitignore files. - Remove ignore of editor specific files - Sort ignores into groups - Remove things already ignore by the root .gitignore - Remove legacy autoconf ignores

Merge back release 17.6.1

- remove RC to GA warning

- reorganized with a full step-by-step procedure to use a local key to encrypt data - improved mentions at the end to take user to other topics - added example output and extra tips - using _database_ functions instead of _global_ for testing purposes

Warnings from libkmip were previously only silenced in Meson builds.

There is no reaosn to user override and appending for everything. PostgreSQL themselves only use it for things like CFLAGS where the user is expected to specify their own stuff.

This mirrors how it is done in other parts of PostgreSQL where meson uses the .o object directly while Make uses libpgutils.a.

The output spam from make makes it hard to find errors when the build fails.

When linking the define is ignored anyway.

…#585)

Also move the first item to a separate line. This improves readability of the Makefile while also making it more like PG's own makefiles.

Writing the whole compiler command from scratch only makes it more risky that we miss something.

Using many different variables only makes things more confusing and when we start needing to use $(libpq_pgport) this will be cleaner anyway.

There've been a few complaints that it can be overly difficult to figure out why the planner picked a Memoize plan. To help address that, here we adjust the EXPLAIN output to display the following additional details: 1) The estimated number of cache entries that can be stored at once 2) The estimated number of unique lookup keys that we expect to see 3) The number of lookups we expect 4) The estimated hit ratio Technically postgres#4 can be calculated using #1, #2 and #3, but it's not a particularly obvious calculation, so we opt to display it explicitly. The original patch by Lukas Fittl only displayed the hit ratio, but there was a fear that might lead to more questions about how that was calculated. The idea with displaying all 4 is to be transparent which may allow queries to be tuned more easily. For example, if #2 isn't correct then maybe extended statistics or a manual n_distinct estimate can be used to help fix poor plan choices. Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com

jeltz changed the base branch from master to TDE_REL_17_STABLE February 24, 2025 18:14

jeltz force-pushed the tde/test branch from 6c74368 to 4d5fc5b Compare February 24, 2025 18:16

jeltz force-pushed the tde/test branch 4 times, most recently from 3361b41 to ff5a1b6 Compare April 11, 2025 13:55

jeltz force-pushed the tde/test branch 3 times, most recently from d3147df to ba8d862 Compare April 28, 2025 08:17

artemgavrilov and others added 20 commits August 1, 2025 14:07

PG-1411 Add pg_resetwal tap tests for TDE setup

342122c

These tests are copy of original pg_resetwal tests with enalbed WAL encryption and removed flags validation as we interested here only in proper enrypted WAL handling.

Add cards in the index page of the website (#488)

efe9f04

Fix lsan issues in pg_resetwal

0319cb3

Use raw key data for stream crypto

7459951

This is the only field from the InternalKey structure that's actually used, and other "raw" crypto functions doesn't use these structures.

Expose function to sign principal key data

a90de98

This function probably belongs elsewhere than in the key file code, but that's where it currently resides so expose it so it can also be used elsewhere.

PG-1813 Change file magic for wal keyfile

436d2a6

This is to prevent this file from ever getting mixed up with a relation key file as these might no longer be of the same format.

PG-1813 Rename wal key file

583f8ef

Instead of 1664_keys it's now called wal_encryption_keys. This lets us use a constant name for it instead of generating it from an Oid pretending it's a relation key file. Also remove some now unused Oid parameters to functions.

Use correct struct when calculating offset

ac0c58d

It happened to work by coincidence since the two structs had the same shape but is a bug waiting to happen.

Remove RelFileLocator from the WAL key file

91997fb

Since the RelFileLocator has never actually been used for WAL keys we can remove all traces of it from the new file and from the code.

Change the file name of the wal keys to match better

8c40309

The other keys are stored in <oid>_keys so wal_keys fits better into that pattern than the more redundant wal_encryption_keys where "encryption" does not add any information but just makes the path longer.

Move things out of header files after key file split

8d7192c

Some definitions should be in the .c files rather than in the header files since they are just used in one file.

Remove some unused fields from InternalKey

87c55e6

Add them as unused fields in the TDEMapEntry structure however, so we do not affect existing key files.

Remove checks for empty entries in WAL key file

d7b42c1

Sincw we never delete WAL keys this logic only confuses the reader of the code. Plus we can optimize the insertion of a new WAL key by using seek().

Clean up type code for the key map file

588938d

Let's stop pretending that we support more than two status: empty or that there is a SMGR key.

Split key type enum into two to make code less confusing

602cd73

Also rename enum variants for consistency plus renumber the types for the WAL keys which is fine since this file is newly introduced which makes breaking backwards compatibility not an issue.

Add missing test to meson build

458e6ed

Add missing key vaidation test to meson build configuration.

tglsfdc and others added 22 commits September 3, 2025 12:25

Release notes for 17.6, 16.10, 15.14, 14.19, 13.22.

e819d61

Translation updates

b0c0089

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 4f9af069289c30fc32337b844fb1db25d7b11e9b

Last-minute updates for release notes.

20ff7f9

Security: CVE-2025-8713, CVE-2025-8714, CVE-2025-8715

Stamp 17.6.

69524ca

Updated expected Percona Server version to 17.6.1

c4ec594

After sync with upstream PostgreSQL version 17.6 we have to update Percona Server for PostgreSQL to 17.6.1

Update tests after sync with upstream version 17.6

19101d3

A few tests output changed after sync with upsteam PostgreSQL 17.6, we have to update TDE realted tests with the same changes.

PG-1889 Add Barman as unsupported tool (#574)

c03a7f9

Update codechecker to 6.26.1

e24e386

New version of Codechecker resolved issues with python and other problems that we faced. So it's time to update.

Fix percona server version

39601ca

Percona version change was missed in some configuration files.

Merge pull request #580 from percona/release-17.6.1

f89f408

Merge back release 17.6.1

Update landing page with PG17 link (#577)

146bdea

- remove RC to GA warning

PG-1879 Silence warnings from libkmip in make builds

e467d1f

Warnings from libkmip were previously only silenced in Meson builds.

PG-1879 Simplify Makefile logic

1f7208b

There is no reaosn to user override and appending for everything. PostgreSQL themselves only use it for things like CFLAGS where the user is expected to specify their own stuff.

PG-1879 Use libpgfeutils.a when building with make

dc3fd0e

This mirrors how it is done in other parts of PostgreSQL where meson uses the .o object directly while Make uses libpgutils.a.

PG-1870 Silence make output in the CI

d33cea5

The output spam from make makes it hard to find errors when the build fails.

Remove pointless define when linking executables

bc9eef5

When linking the define is ignored anyway.

PG-1927 Adding missing perl modules to Makefile for devel packages. (…

c7c9b73

…#585)

jeltz force-pushed the tde/test branch from ba8d862 to 8df725a Compare September 24, 2025 08:29

jeltz added 5 commits September 25, 2025 12:03

PG-1879 Indent multiline assignments in Makefile

4c4e3d0

Also move the first item to a separate line. This improves readability of the Makefile while also making it more like PG's own makefiles.

PG-1879 Remove unnecessary quotes in Makefile

1107aa4

PG-1879 Just add extra CLFAG when building xlogreader.c

44141cd

Writing the whole compiler command from scratch only makes it more risky that we miss something.

PG-1879 Use only LDFLAGS_EX in Makefile

cdfaf2d

Using many different variables only makes things more confusing and when we start needing to use $(libpq_pgport) this will be cleaner anyway.

WIP

eb031ea

jeltz force-pushed the tde/test branch from 8df725a to eb031ea Compare October 8, 2025 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`pg_tde` test PR #3

`pg_tde` test PR #3

Uh oh!

jeltz commented Feb 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

pg_tde test PR #3

Are you sure you want to change the base?

pg_tde test PR #3

Uh oh!

Conversation

jeltz commented Feb 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

`pg_tde` test PR #3

`pg_tde` test PR #3