forked from postgres/postgres
-
Notifications
You must be signed in to change notification settings - Fork 0
pg_tde test PR
#3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jeltz
wants to merge
799
commits into
TDE_REL_17_STABLE
Choose a base branch
from
tde/test
base: TDE_REL_17_STABLE
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jeltz
pushed a commit
that referenced
this pull request
Mar 5, 2025
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:
TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]
To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.
2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.
More concretely, the autovacuum process is stuck here:
#0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
#2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
#3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
at ../src/backend/storage/ipc/latch.c:538
postgres#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
postgres#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
postgres#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
postgres#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
postgres#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
postgres#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569
and the checkpointer is stuck here:
#0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
#3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
postgres#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464
To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.
Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.
Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
3361b41 to
ff5a1b6
Compare
d3147df to
ba8d862
Compare
These tests are copy of original pg_resetwal tests with enalbed WAL encryption and removed flags validation as we interested here only in proper enrypted WAL handling.
The code we ran when redoing a smgrcreate() was overly complex and not necessary to run. If the file descriptor was open we skipped it but if not we ran a bit of pointless code since the key creation is handled by it is own WAL record which is always before the SMGR creation. Also improve some outdated comments.
This is the only field from the InternalKey structure that's actually used, and other "raw" crypto functions doesn't use these structures.
This function probably belongs elsewhere than in the key file code, but that's where it currently resides so expose it so it can also be used elsewhere.
We want to add timeline information to the wal keys and cannot easily do so without affecting existing clusters' relation key files. This commit does the bare minimum to separate the two completely and as such contains a fair bit of duplicated code. The file format for the WAL key file is exactly the same before and after this commit. There is _a lot_ of cleanup that will have to be done on both sides of this separation, but this is a bit of "it gets worse before it gets better".
This is to prevent this file from ever getting mixed up with a relation key file as these might no longer be of the same format.
Instead of 1664_keys it's now called wal_encryption_keys. This lets us use a constant name for it instead of generating it from an Oid pretending it's a relation key file. Also remove some now unused Oid parameters to functions.
It happened to work by coincidence since the two structs had the same shape but is a bug waiting to happen.
Since the RelFileLocator has never actually been used for WAL keys we can remove all traces of it from the new file and from the code.
The other keys are stored in <oid>_keys so wal_keys fits better into that pattern than the more redundant wal_encryption_keys where "encryption" does not add any information but just makes the path longer.
Some definitions should be in the .c files rather than in the header files since they are just used in one file.
Add them as unused fields in the TDEMapEntry structure however, so we do not affect existing key files.
Before this commit, WAL keys didn't mind TLI at all. But after pg_rewind, for example, pg_wal/ may contain segments from two timelines. And the wal reader choosing the key may pick the wrong one because LSNs of different TLIs may overlap. There was also another bug: There is a key with the start LSN 0/30000 in TLI 1. And after the start in TLI 2, the wal writer creates a new key with the SN 0/30000, but in TLI 2. But the reader wouldn't fetch the latest key because w/o TLI, these are the same. This commit adds TLI to the Internal keys and makes use of it along with LSN for key compares.
Sincw we never delete WAL keys this logic only confuses the reader of the code. Plus we can optimize the insertion of a new WAL key by using seek().
Let's stop pretending that we support more than two status: empty or that there is a SMGR key.
Also rename enum variants for consistency plus renumber the types for the WAL keys which is fine since this file is newly introduced which makes breaking backwards compatibility not an issue.
When WAL is streamed during the backup (default mode), it comes in unencrypted. But we need keys to encrypt it. For now, we expect that the user would put `pg_tde` dir containing the `1664_key` and `1664_providers` into the destination directory before starting the backup. We encrypt the streamed WAL according to internal keys. No `pg_tde` dir means no streamed WAL encryption.
Add missing key vaidation test to meson build configuration.
Commit e2d4ef8 (the fix for CVE-2017-7484) added security checks to the selectivity estimation functions to prevent them from running user-supplied operators on data obtained from pg_statistic if the user lacks privileges to select from the underlying table. In cases involving inheritance/partitioning, those checks were originally performed against the child RTE (which for plain inheritance might actually refer to the parent table). Commit 553d2ec then extended that to also check the parent RTE, allowing access if the user had permissions on either the parent or the child. It turns out, however, that doing any checks using the child RTE is incorrect, since securityQuals is set to NULL when creating an RTE for an inheritance child (whether it refers to the parent table or the child table), and therefore such checks do not correctly account for any RLS policies or security barrier views. Therefore, do the security checks using only the parent RTE. This is consistent with how RLS policies are applied, and the executor's ACL checks, both of which use only the parent table's permissions/policies. Similar checks are performed in the extended stats code, so update that in the same way, centralizing all the checks in a new function. In addition, note that these checks by themselves are insufficient to ensure that the user has access to the table's data because, in a query that goes via a view, they only check that the view owner has permissions on the underlying table, not that the current user has permissions on the view itself. In the selectivity estimation functions, there is no easy way to navigate from underlying tables to views, so add permissions checks for all views mentioned in the query to the planner startup code. If the user lacks permissions on a view, a permissions error will now be reported at planner-startup, and the selectivity estimation functions will not be run. Checking view permissions at planner-startup in this way is a little ugly, since the same checks will be repeated at executor-startup. Longer-term, it might be better to move all the permissions checks from the executor to the planner so that permissions errors can be reported sooner, instead of creating a plan that won't ever be run. However, such a change seems too far-reaching to be back-patched. Back-patch to all supported versions. In v13, there is the added complication that UPDATEs and DELETEs on inherited target tables are planned using inheritance_planner(), which plans each inheritance child table separately, so that the selectivity estimation functions do not know that they are dealing with a child table accessed via its parent. Handle that by checking access permissions on the top parent table at planner-startup, in the same way as we do for views. Any securityQuals on the top parent table are moved down to the child tables by inheritance_planner(), so they continue to be checked by the selectivity estimation functions. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13 Security: CVE-2025-8713
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 4f9af069289c30fc32337b844fb1db25d7b11e9b
Maliciously-crafted object names could achieve SQL injection during restore. CVE-2012-0868 fixed this class of problem at the time, but later work reintroduced three cases. Commit bc8cd50 (back-patched to v11+ in 2023-05 releases) introduced the pg_dump case. Commit 6cbdbd9 (v12+) introduced the two pg_dumpall cases. Move sanitize_line(), unchanged, to dumputils.c so pg_dumpall has access to it in all supported versions. Back-patch to v13 (all supported versions). Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Backpatch-through: 13 Security: CVE-2025-8715
A malicious server could inject psql meta-commands into plain-text dump output (i.e., scripts created with pg_dump --format=plain, pg_dumpall, or pg_restore --file) that are run at restore time on the machine running psql. To fix, introduce a new "restricted" mode in psql that blocks all meta-commands (except for \unrestrict to exit the mode), and teach pg_dump, pg_dumpall, and pg_restore to use this mode in plain-text dumps. While at it, encourage users to only restore dumps generated from trusted servers or to inspect it beforehand, since restoring causes the destination to execute arbitrary code of the source superusers' choice. However, the client running the dump and restore needn't trust the source or destination superusers. Reported-by: Martin Rakhmanov Reported-by: Matthieu Denais <litezeraw@gmail.com> Reported-by: RyotaK <ryotak.mail@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Security: CVE-2025-8714 Backpatch-through: 13
After sync with upstream PostgreSQL version 17.6 we have to update Percona Server for PostgreSQL to 17.6.1
A few tests output changed after sync with upsteam PostgreSQL 17.6, we have to update TDE realted tests with the same changes.
New version of Codechecker resolved issues with python and other problems that we faced. So it's time to update.
Percona version change was missed in some configuration files.
This is an attempt at making the file more similar to the other contrib extensions' .gitignore files. - Remove ignore of editor specific files - Sort ignores into groups - Remove things already ignore by the root .gitignore - Remove legacy autoconf ignores
Merge back release 17.6.1
- remove RC to GA warning
- reorganized with a full step-by-step procedure to use a local key to encrypt data - improved mentions at the end to take user to other topics - added example output and extra tips - using _database_ functions instead of _global_ for testing purposes
Warnings from libkmip were previously only silenced in Meson builds.
There is no reaosn to user override and appending for everything. PostgreSQL themselves only use it for things like CFLAGS where the user is expected to specify their own stuff.
This mirrors how it is done in other parts of PostgreSQL where meson uses the .o object directly while Make uses libpgutils.a.
The output spam from make makes it hard to find errors when the build fails.
When linking the define is ignored anyway.
Also move the first item to a separate line. This improves readability of the Makefile while also making it more like PG's own makefiles.
Writing the whole compiler command from scratch only makes it more risky that we miss something.
Using many different variables only makes things more confusing and when we start needing to use $(libpq_pgport) this will be cleaner anyway.
jeltz
pushed a commit
that referenced
this pull request
Nov 21, 2025
There've been a few complaints that it can be overly difficult to figure out why the planner picked a Memoize plan. To help address that, here we adjust the EXPLAIN output to display the following additional details: 1) The estimated number of cache entries that can be stored at once 2) The estimated number of unique lookup keys that we expect to see 3) The number of lookups we expect 4) The estimated hit ratio Technically postgres#4 can be calculated using #1, #2 and #3, but it's not a particularly obvious calculation, so we opt to display it explicitly. The original patch by Lukas Fittl only displayed the hit ratio, but there was a fear that might lead to more questions about how that was calculated. The idea with displaying all 4 is to be transparent which may allow queries to be tuned more easily. For example, if #2 isn't correct then maybe extended statistics or a manual n_distinct estimate can be used to help fix poor plan choices. Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a testy PR!