Skip to content

Commit a6e4dd8

Browse files
authored
Merge pull request #34 from simosund/pping_better_map_cleaning
PPing - Improve map cleanup
2 parents 27861e3 + e7201c9 commit a6e4dd8

File tree

6 files changed

+514
-181
lines changed

6 files changed

+514
-181
lines changed

pping/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ USER_TARGETS := pping
44
BPF_TARGETS := pping_kern
55

66
LDLIBS += -pthread
7-
EXTRA_DEPS += pping.h
7+
EXTRA_DEPS += pping.h pping_debug_cleanup.h
88

99
LIB_DIR = ../lib
1010

pping/TODO.md

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,6 @@
3535
- Original pping checks if flow is bi-directional before adding
3636
timestamps, but this could miss shorter flows
3737
- [ ] Dynamically grow the maps if they are starting to get full
38-
- [ ] Improve map cleaning: Use a dynamic time to live for map entries
39-
based on flow's RTT, instead of static 10s limit
40-
- Keeping entries around for a long time allows the map to grow
41-
unnecessarily large, which slows down the cleaning and may block
42-
new entries
4338
- [ ] Use libxdp to load XDP program
4439

4540
## Done
@@ -63,6 +58,15 @@
6358
- [x] Add timestamps to output (as original pping)
6459
- [x] Add support for other hooks
6560
- TC-BFP on ingress instead of XDP
61+
- [x] Improve map cleaning:
62+
- [x] Use BPF iter to clean up maps with BPF programs instead of
63+
looping through entries in userspace.
64+
- [x] Use a dynamic time to live for map entries based on flow's RTT
65+
instead of static 10s limit
66+
- Keeping entries around for a long time allows the map to grow
67+
unnecessarily large, which slows down the cleaning and may block
68+
new entries
69+
6670

6771
# Potential issues
6872
## Limited information in different output formats
@@ -90,10 +94,10 @@ delete its flow-state entry (and send a timely flow closing event).
9094

9195
A consequence of this is that the ICMP flow entries will stick around
9296
and occupy a space in the flow state map until they are cleaned out by
93-
the periodic cleanup process. The current timeout for inactive flows
94-
is a very generous 5 minutes, which means a lot of short ICMP flows
95-
could quickly fill up the flow map and potentially block other flows
96-
for a long while.
97+
the periodic cleanup process. The current timeout for ICMP flows is 30
98+
seconds, which means a lot of short ICMP flows could quickly fill up
99+
the flow map and potentially block other flows for a considerable
100+
time.
97101

98102
## RTT-based sampling
99103
The RTT-based sampling features means that timestamp entries may only
@@ -111,6 +115,57 @@ growing faster than it shrinks, as if it starts with a low RTT it will
111115
quickly update it to higher RTTs, but with high RTTs it will take
112116
longer for it do decrease to a lower RTT again.
113117

118+
## Losing debug/warning information
119+
The "map full" and "map cleaning" events are pushed through the same
120+
perf-buffer as the rtt and flow events. Just like rtt events may be
121+
lost if too many of them are pushed, these debug/warning messages may
122+
also be lost if there's a lot of other events being pushed. In case
123+
one considers these debug/warning messages more critical than the
124+
normal RTT reports, it may be worth considering pushing them through a
125+
separate channel to make it less likely that they are lost.
126+
127+
## RTT-based cleanup of timestamp entries
128+
For most flows the RTT will likely be well below one second, which
129+
allows for removing timestamp entries much earlier than the 10s static
130+
limit. This helps keep the timestamp map size down, thereby decreasing
131+
the risk of there being no room for new entries as well as reducing
132+
the number of entries the cleanup process must go through. However, if
133+
the RTT for a flow grows quickly (due to ex. buffer bloat) then the
134+
actual RTT of the flow may increase to beyond 8 times the initial RTT
135+
before ePPing has collected enough RTT samples to increase sRTT to a
136+
similar level. This may cause timestamps being deleted too early
137+
before they have time to actually match against a reply, in which case
138+
one also loses the RTT sample that would be used to update the sRTT
139+
causing the sRTT to remain too low.
140+
141+
In practice this issue is limited by the fact that the cleanup process
142+
only runs periodically, so even for flows with a very low RTT the
143+
average time to delete a timestamp entry would still be 500ms (at
144+
default rate of running at 1Hz). But this also limits the usefulness
145+
of trying to delete entries earlier.
146+
147+
Overall, a better approach in the future might be to get rid of the
148+
periodic cleanup process entirely, and instead only evict old entries
149+
if they're blocking a new entry from being inserted. This would be
150+
similar to using LRU maps, but would need some way to prevent an
151+
existing entry from being removed in case it's too young.
152+
153+
## Periodical map cleanup may miss entries on delete
154+
Due to how the hash map traversal is implemented, all entries may not
155+
be traversed each time a map is iterated through. Specifically, if an
156+
entry is deleted and there are other entries remaining in the same
157+
bucket, those remaining entries will not be traversed. As the
158+
periodical cleanup may delete entries as it is traversing the map,
159+
this may result in some of the entries which share the bucket
160+
with deleted entries not always be traversed.
161+
162+
In general this should not cause a large problem as those entries will
163+
simply be traversed the next time the map is iterated over
164+
instead. However, it may cause certain entries to remain in the hash
165+
map a bit longer than expected (and if the map is full subsequently
166+
block new entries from being created for that duration)
167+
168+
114169
## Concurrency issues
115170

116171
The program uses "global" (not `PERCPU`) hash maps to keep state. As

0 commit comments

Comments
 (0)