3535 - Original pping checks if flow is bi-directional before adding
3636 timestamps, but this could miss shorter flows
3737- [ ] Dynamically grow the maps if they are starting to get full
38- - [ ] Improve map cleaning: Use a dynamic time to live for map entries
39- based on flow's RTT, instead of static 10s limit
40- - Keeping entries around for a long time allows the map to grow
41- unnecessarily large, which slows down the cleaning and may block
42- new entries
4338- [ ] Use libxdp to load XDP program
4439
4540## Done
6358- [x] Add timestamps to output (as original pping)
6459- [x] Add support for other hooks
6560 - TC-BFP on ingress instead of XDP
61+ - [x] Improve map cleaning:
62+ - [x] Use BPF iter to clean up maps with BPF programs instead of
63+ looping through entries in userspace.
64+ - [x] Use a dynamic time to live for map entries based on flow's RTT
65+ instead of static 10s limit
66+ - Keeping entries around for a long time allows the map to grow
67+ unnecessarily large, which slows down the cleaning and may block
68+ new entries
69+
6670
6771# Potential issues
6872## Limited information in different output formats
@@ -90,10 +94,10 @@ delete its flow-state entry (and send a timely flow closing event).
9094
9195A consequence of this is that the ICMP flow entries will stick around
9296and occupy a space in the flow state map until they are cleaned out by
93- the periodic cleanup process. The current timeout for inactive flows
94- is a very generous 5 minutes , which means a lot of short ICMP flows
95- could quickly fill up the flow map and potentially block other flows
96- for a long while .
97+ the periodic cleanup process. The current timeout for ICMP flows is 30
98+ seconds , which means a lot of short ICMP flows could quickly fill up
99+ the flow map and potentially block other flows for a considerable
100+ time .
97101
98102## RTT-based sampling
99103The RTT-based sampling features means that timestamp entries may only
@@ -111,6 +115,57 @@ growing faster than it shrinks, as if it starts with a low RTT it will
111115quickly update it to higher RTTs, but with high RTTs it will take
112116longer for it do decrease to a lower RTT again.
113117
118+ ## Losing debug/warning information
119+ The "map full" and "map cleaning" events are pushed through the same
120+ perf-buffer as the rtt and flow events. Just like rtt events may be
121+ lost if too many of them are pushed, these debug/warning messages may
122+ also be lost if there's a lot of other events being pushed. In case
123+ one considers these debug/warning messages more critical than the
124+ normal RTT reports, it may be worth considering pushing them through a
125+ separate channel to make it less likely that they are lost.
126+
127+ ## RTT-based cleanup of timestamp entries
128+ For most flows the RTT will likely be well below one second, which
129+ allows for removing timestamp entries much earlier than the 10s static
130+ limit. This helps keep the timestamp map size down, thereby decreasing
131+ the risk of there being no room for new entries as well as reducing
132+ the number of entries the cleanup process must go through. However, if
133+ the RTT for a flow grows quickly (due to ex. buffer bloat) then the
134+ actual RTT of the flow may increase to beyond 8 times the initial RTT
135+ before ePPing has collected enough RTT samples to increase sRTT to a
136+ similar level. This may cause timestamps being deleted too early
137+ before they have time to actually match against a reply, in which case
138+ one also loses the RTT sample that would be used to update the sRTT
139+ causing the sRTT to remain too low.
140+
141+ In practice this issue is limited by the fact that the cleanup process
142+ only runs periodically, so even for flows with a very low RTT the
143+ average time to delete a timestamp entry would still be 500ms (at
144+ default rate of running at 1Hz). But this also limits the usefulness
145+ of trying to delete entries earlier.
146+
147+ Overall, a better approach in the future might be to get rid of the
148+ periodic cleanup process entirely, and instead only evict old entries
149+ if they're blocking a new entry from being inserted. This would be
150+ similar to using LRU maps, but would need some way to prevent an
151+ existing entry from being removed in case it's too young.
152+
153+ ## Periodical map cleanup may miss entries on delete
154+ Due to how the hash map traversal is implemented, all entries may not
155+ be traversed each time a map is iterated through. Specifically, if an
156+ entry is deleted and there are other entries remaining in the same
157+ bucket, those remaining entries will not be traversed. As the
158+ periodical cleanup may delete entries as it is traversing the map,
159+ this may result in some of the entries which share the bucket
160+ with deleted entries not always be traversed.
161+
162+ In general this should not cause a large problem as those entries will
163+ simply be traversed the next time the map is iterated over
164+ instead. However, it may cause certain entries to remain in the hash
165+ map a bit longer than expected (and if the map is full subsequently
166+ block new entries from being created for that duration)
167+
168+
114169## Concurrency issues
115170
116171The program uses "global" (not ` PERCPU ` ) hash maps to keep state. As
0 commit comments