Skip to content

Commit 8a8f538

Browse files
committed
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress hooks. This makes it more similar to Kathie's pping, allowing the tool to capture RTTs in both directions when deployed on just a single interface. Like Kathie's pping, by default filter out RTTs for packets going to the local machine (will only include local processing delays). This behavior can be disabled by passing the -l/--include-local option. As packets that are timestamped on ingress and matched on egress will include the local machines processing delay, add the "match_on_egress" member to the JSON output that can be used to differentiate between RTTs that include the local processing delay, and those which don't. Finally, report the source and destination addresses from the perspective of the reply packet, rather than the timestamped packet, to be consistent with Kathie's pping. Overall, refactor large parts of pping_kern to allow both timestamping and matching, as well as updating both the flow and reverse flow and handle flow-events related to them, in one go. Also update README to reflect changes. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
1 parent 928a414 commit 8a8f538

File tree

5 files changed

+437
-293
lines changed

5 files changed

+437
-293
lines changed

pping/README.md

Lines changed: 38 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,21 @@ spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
1313
features (which may or may not ever get implemented).
1414

1515
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
16-
outgoing packets, and then look for matches in the incoming packets. If a match
17-
is found, the RTT is simply calculated as the time difference between the
18-
current time and the stored timestamp.
16+
packets, and then look for matches in the reply packets. If a match is found,
17+
the RTT is simply calculated as the time difference between the current time and
18+
the stored timestamp.
1919

2020
This tool, just as Kathie's original pping implementation, uses TCP timestamps
21-
as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
22-
timestamp in and off itself) is timestamped. Incoming packets are then parsed
23-
for the TSecr, which are the echoed TSval values from the receiver. The TCP
24-
timestamps are not necessarily unique for every packet (they have a limited
25-
update frequency, appears to be 1000 Hz for modern Linux systems), so only the
26-
first instance of an identifier is timestamped, and matched against the first
27-
incoming packet with the identifier. The mechanism to ensure only the first
28-
packet is timestamped and matched differs from the one in Kathie's pping, and is
29-
further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
21+
as identifiers for TCP traffic. The TSval (which is a timestamp in and off
22+
itself) is used as an identifier and timestamped. Reply packets in the reverse
23+
flow are then parsed for the TSecr, which are the echoed TSval values from the
24+
receiver. The TCP timestamps are not necessarily unique for every packet (they
25+
have a limited update frequency, appears to be 1000 Hz for modern Linux
26+
systems), so only the first instance of an identifier is timestamped, and
27+
matched against the first incoming packet with a matching reply identifier. The
28+
mechanism to ensure only the first packet is timestamped and matched differs
29+
from the one in Kathie's pping, and is further described in
30+
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
3031

3132
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
3233
number as identifer to match against. Linux systems will typically use different
@@ -48,7 +49,7 @@ single line per event.
4849

4950
An example of the format is provided below:
5051
```shell
51-
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
52+
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from dest
5253
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
5354
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
5455
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
@@ -96,7 +97,7 @@ An example of a (pretty-printed) flow-event is provided below:
9697
"protocol": "TCP",
9798
"flow_event": "opening",
9899
"reason": "SYN-ACK",
99-
"triggered_by": "src"
100+
"triggered_by": "dest"
100101
}
101102
```
102103

@@ -114,7 +115,8 @@ An example of a (pretty-printed) RTT-even is provided below:
114115
"sent_packets": 9393,
115116
"sent_bytes": 492457296,
116117
"rec_packets": 5922,
117-
"rec_bytes": 37
118+
"rec_bytes": 37,
119+
"match_on_egress": false
118120
}
119121
```
120122

@@ -123,36 +125,33 @@ An example of a (pretty-printed) RTT-even is provided below:
123125

124126
### Files:
125127
- **pping.c:** Userspace program that loads and attaches the BPF programs, pulls
126-
the perf-buffer `rtt_events` to print out RTT messages and periodically cleans
128+
the perf-buffer `events` to print out RTT messages and periodically cleans
127129
up the hash-maps from old entries. Also passes user options to the BPF
128130
programs by setting a "global variable" (stored in the programs .rodata
129131
section).
130-
- **pping_kern.c:** Contains the BPF programs that are loaded on tc (egress) and
131-
XDP (ingress), as well as several common functions, a global constant `config`
132-
(set from userspace) and map definitions. The tc program `pping_egress()`
133-
parses outgoing packets for identifiers. If an identifier is found and the
134-
sampling strategy allows it, a timestamp for the packet is created in
135-
`packet_ts`. The XDP program `pping_ingress()` parses incomming packets for an
136-
identifier. If found, it looks up the `packet_ts` map for a match on the
137-
reverse flow (to match source/dest on egress). If there is a match, it
138-
calculates the RTT from the stored timestamp and deletes the entry. The
139-
calculated RTT (together with the flow-tuple) is pushed to the perf-buffer
140-
`events`. Both `pping_egress()` and `pping_ingress` can also push flow-events
141-
to the `events` buffer.
132+
- **pping_kern.c:** Contains the BPF programs that are loaded on egress (tc) and
133+
ingress (XDP or tc), as well as several common functions, a global constant
134+
`config` (set from userspace) and map definitions. Essentially the same pping
135+
program is loaded on both ingress and egress. All packets are parsed for both
136+
an identifier that can be used to create a timestamp entry `packet_ts`, and a
137+
reply identifier that can be used to match the packet with a previously
138+
timestamped one in the reverse flow. If a match is found, an RTT is calculated
139+
and an RTT-event is pushed to userspace through the perf-buffer `events`. For
140+
each packet with a valid identifier, the program also keeps track of and
141+
updates the state flow and reverse flow, stored in the `flow_state` map.
142142
- **pping.h:** Common header file included by `pping.c` and
143143
`pping_kern.c`. Contains some common structs used by both (are part of the
144144
maps).
145145

146146
### BPF Maps:
147147
- **flow_state:** A hash-map storing some basic state for each flow, such as the
148148
last seen identifier for the flow and when the last timestamp entry for the
149-
flow was created. Entries are created by `pping_egress()`, and can be updated
150-
or deleted by both `pping_egress()` and `pping_ingress()`. Leftover entries
151-
are eventually removed by `pping.c`.
149+
flow was created. Entries are created, updated and deleted by the BPF pping
150+
programs. Leftover entries are eventually removed by userspace (`pping.c`).
152151
- **packet_ts:** A hash-map storing a timestamp for a specific packet
153-
identifier. Entries are created by `pping_egress()` and removed by
154-
`pping_ingress()` if a match is found. Leftover entries are eventually removed
155-
by `pping.c`.
152+
identifier. Entries are created by the BPF pping program if a valid identifier
153+
is found, and removed if a match is found. Leftover entries are eventually
154+
removed by userspace (`pping.c`).
156155
- **events:** A perf-buffer used by the BPF programs to push flow or RTT events
157156
to `pping.c`, which continuously polls the map the prints them out.
158157

@@ -222,9 +221,9 @@ additional map space and report some additional RTT(s) more than expected
222221
(however the reported RTTs should still be correct).
223222

224223
If the packets have the same identifier, they must first have managed to bypass
225-
the previous check for unique identifiers (see [previous point](#Tracking last
226-
seen identifier)), and only one of them will be able to successfully store a
227-
timestamp entry.
224+
the previous check for unique identifiers (see [previous
225+
point](#tracking-last-seen-identifier)), and only one of them will be able to
226+
successfully store a timestamp entry.
228227

229228
#### Matching against stored timestamps
230229
The XDP/ingress program could potentially match multiple concurrent packets with
@@ -246,8 +245,8 @@ if this is the lowest RTT seen so far for the flow. If multiple RTTs are
246245
calculated concurrently, then several could pass this check concurrently and
247246
there may be a lost update. It should only be possible for multiple RTTs to be
248247
calculated concurrently in case either the [timestamp rate-limit was
249-
bypassed](#Rate-limiting new timestamps) or [multiple packets managed to match
250-
against the same timestamp](#Matching against stored timestamps).
248+
bypassed](#rate-limiting-new-timestamps) or [multiple packets managed to match
249+
against the same timestamp](#matching-against-stored-timestamps).
251250

252251
It's worth noting that with sampling the reported minimum-RTT is only an
253252
estimate anyways (may never calculate RTT for packet with the true minimum

pping/eBPF_pping_design.png

-8.57 KB
Loading

pping/pping.c

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,8 @@ static const char *__doc__ =
1515
#include <unistd.h>
1616
#include <getopt.h>
1717
#include <stdbool.h>
18-
#include <limits.h>
1918
#include <signal.h> // For detecting Ctrl-C
2019
#include <sys/resource.h> // For setting rlmit
21-
#include <sys/wait.h>
22-
#include <sys/stat.h>
2320
#include <time.h>
2421
#include <pthread.h>
2522

@@ -108,6 +105,7 @@ static const struct option long_options[] = {
108105
{ "ingress-hook", required_argument, NULL, 'I' }, // Use tc or XDP as ingress hook
109106
{ "tcp", no_argument, NULL, 'T' }, // Calculate and report RTTs for TCP traffic (with TCP timestamps)
110107
{ "icmp", no_argument, NULL, 'C' }, // Calculate and report RTTs for ICMP echo-reply traffic
108+
{ "include-local", no_argument, NULL, 'l' }, // Also report "internal" RTTs
111109
{ 0, 0, NULL, 0 }
112110
};
113111

@@ -172,11 +170,12 @@ static int parse_arguments(int argc, char *argv[], struct pping_config *config)
172170
double rate_limit_ms, cleanup_interval_s, rtt_rate;
173171

174172
config->ifindex = 0;
173+
config->bpf_config.localfilt = true;
175174
config->force = false;
176175
config->bpf_config.track_tcp = false;
177176
config->bpf_config.track_icmp = false;
178177

179-
while ((opt = getopt_long(argc, argv, "hfTCi:r:R:t:c:F:I:", long_options,
178+
while ((opt = getopt_long(argc, argv, "hflTCi:r:R:t:c:F:I:", long_options,
180179
NULL)) != -1) {
181180
switch (opt) {
182181
case 'i':
@@ -257,6 +256,9 @@ static int parse_arguments(int argc, char *argv[], struct pping_config *config)
257256
return -EINVAL;
258257
}
259258
break;
259+
case 'l':
260+
config->bpf_config.localfilt = false;
261+
break;
260262
case 'f':
261263
config->force = true;
262264
config->xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -504,9 +506,9 @@ static bool flow_timeout(void *key_ptr, void *val_ptr, __u64 now)
504506
if (print_event_func) {
505507
fe.event_type = EVENT_TYPE_FLOW;
506508
fe.timestamp = now;
507-
fe.flow = *(struct network_tuple *)key_ptr;
508-
fe.event_info.event = FLOW_EVENT_CLOSING;
509-
fe.event_info.reason = EVENT_REASON_FLOW_TIMEOUT;
509+
reverse_flow(&fe.flow, key_ptr);
510+
fe.flow_event_type = FLOW_EVENT_CLOSING;
511+
fe.reason = EVENT_REASON_FLOW_TIMEOUT;
510512
fe.source = EVENT_SOURCE_USERSPACE;
511513
print_event_func(NULL, 0, &fe, sizeof(fe));
512514
}
@@ -657,6 +659,7 @@ static const char *flowevent_to_str(enum flow_event_type fe)
657659
case FLOW_EVENT_OPENING:
658660
return "opening";
659661
case FLOW_EVENT_CLOSING:
662+
case FLOW_EVENT_CLOSING_BOTH:
660663
return "closing";
661664
default:
662665
return "unknown";
@@ -674,8 +677,6 @@ static const char *eventreason_to_str(enum flow_event_reason er)
674677
return "first observed packet";
675678
case EVENT_REASON_FIN:
676679
return "FIN";
677-
case EVENT_REASON_FIN_ACK:
678-
return "FIN-ACK";
679680
case EVENT_REASON_RST:
680681
return "RST";
681682
case EVENT_REASON_FLOW_TIMEOUT:
@@ -688,9 +689,9 @@ static const char *eventreason_to_str(enum flow_event_reason er)
688689
static const char *eventsource_to_str(enum flow_event_source es)
689690
{
690691
switch (es) {
691-
case EVENT_SOURCE_EGRESS:
692+
case EVENT_SOURCE_PKT_SRC:
692693
return "src";
693-
case EVENT_SOURCE_INGRESS:
694+
case EVENT_SOURCE_PKT_DEST:
694695
return "dest";
695696
case EVENT_SOURCE_USERSPACE:
696697
return "userspace-cleanup";
@@ -740,8 +741,8 @@ static void print_event_standard(void *ctx, int cpu, void *data,
740741
printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
741742
print_flow_ppvizformat(stdout, &e->flow_event.flow);
742743
printf(" %s due to %s from %s\n",
743-
flowevent_to_str(e->flow_event.event_info.event),
744-
eventreason_to_str(e->flow_event.event_info.reason),
744+
flowevent_to_str(e->flow_event.flow_event_type),
745+
eventreason_to_str(e->flow_event.reason),
745746
eventsource_to_str(e->flow_event.source));
746747
}
747748
}
@@ -790,15 +791,16 @@ static void print_rttevent_fields_json(json_writer_t *ctx,
790791
jsonw_u64_field(ctx, "sent_bytes", re->sent_bytes);
791792
jsonw_u64_field(ctx, "rec_packets", re->rec_pkts);
792793
jsonw_u64_field(ctx, "rec_bytes", re->rec_bytes);
794+
jsonw_bool_field(ctx, "match_on_egress", re->match_on_egress);
793795
}
794796

795797
static void print_flowevent_fields_json(json_writer_t *ctx,
796798
const struct flow_event *fe)
797799
{
798800
jsonw_string_field(ctx, "flow_event",
799-
flowevent_to_str(fe->event_info.event));
801+
flowevent_to_str(fe->flow_event_type));
800802
jsonw_string_field(ctx, "reason",
801-
eventreason_to_str(fe->event_info.reason));
803+
eventreason_to_str(fe->reason));
802804
jsonw_string_field(ctx, "triggered_by", eventsource_to_str(fe->source));
803805
}
804806

pping/pping.h

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,22 +18,22 @@ typedef __u64 fixpoint64;
1818
enum __attribute__((__packed__)) flow_event_type {
1919
FLOW_EVENT_NONE,
2020
FLOW_EVENT_OPENING,
21-
FLOW_EVENT_CLOSING
21+
FLOW_EVENT_CLOSING,
22+
FLOW_EVENT_CLOSING_BOTH
2223
};
2324

2425
enum __attribute__((__packed__)) flow_event_reason {
2526
EVENT_REASON_SYN,
2627
EVENT_REASON_SYN_ACK,
2728
EVENT_REASON_FIRST_OBS_PCKT,
2829
EVENT_REASON_FIN,
29-
EVENT_REASON_FIN_ACK,
3030
EVENT_REASON_RST,
3131
EVENT_REASON_FLOW_TIMEOUT
3232
};
3333

3434
enum __attribute__((__packed__)) flow_event_source {
35-
EVENT_SOURCE_EGRESS,
36-
EVENT_SOURCE_INGRESS,
35+
EVENT_SOURCE_PKT_SRC,
36+
EVENT_SOURCE_PKT_DEST,
3737
EVENT_SOURCE_USERSPACE
3838
};
3939

@@ -43,7 +43,8 @@ struct bpf_config {
4343
bool use_srtt;
4444
bool track_tcp;
4545
bool track_icmp;
46-
__u8 reserved[5];
46+
bool localfilt;
47+
__u32 reserved;
4748
};
4849

4950
/*
@@ -108,12 +109,8 @@ struct rtt_event {
108109
__u64 sent_bytes;
109110
__u64 rec_pkts;
110111
__u64 rec_bytes;
111-
__u32 reserved;
112-
};
113-
114-
struct flow_event_info {
115-
enum flow_event_type event;
116-
enum flow_event_reason reason;
112+
bool match_on_egress;
113+
__u8 reserved[7];
117114
};
118115

119116
/*
@@ -126,7 +123,8 @@ struct flow_event {
126123
__u64 event_type;
127124
__u64 timestamp;
128125
struct network_tuple flow;
129-
struct flow_event_info event_info;
126+
enum flow_event_type flow_event_type;
127+
enum flow_event_reason reason;
130128
enum flow_event_source source;
131129
__u8 reserved;
132130
};
@@ -137,4 +135,19 @@ union pping_event {
137135
struct flow_event flow_event;
138136
};
139137

138+
/*
139+
* Convenience function for getting the corresponding reverse flow.
140+
* PPing needs to keep track of flow in both directions, and sometimes
141+
* also needs to reverse the flow to report the "correct" (consistent
142+
* with Kathie's PPing) src and dest address.
143+
*/
144+
static void reverse_flow(struct network_tuple *dest, struct network_tuple *src)
145+
{
146+
dest->ipv = src->ipv;
147+
dest->proto = src->proto;
148+
dest->saddr = src->daddr;
149+
dest->daddr = src->saddr;
150+
dest->reserved = 0;
151+
}
152+
140153
#endif

0 commit comments

Comments
 (0)