Skip to content

Commit bd6ded5

Browse files
committed
pping: Add support for ICMP echo messages
Allow pping to passivly monitor RTT for ICMP echo request/reply flows. Use the echo identifier as ports, and echo sequence as packet identifier. Additionally, add protocol to standard output format in order to be able to distinguish between TCP and ICMP flows. The ppviz format does not include protocol, making it impossible to distinguish between TCP and ICMP traffic. Will add warning if ppviz format is used together with ICMP traffic in the future. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
1 parent af5e660 commit bd6ded5

File tree

4 files changed

+124
-48
lines changed

4 files changed

+124
-48
lines changed

pping/README.md

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,34 @@ TC-BPF (on egress) for the packet capture logic.
66
## Simple description
77
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
88
can be used on endhosts as well as any (BPF-capable Linux) device which can see
9-
both directions of the traffic (ex router or middlebox). Currently it only works
10-
for TCP traffic which uses the TCP timestamp option, but could be extended to
11-
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP
12-
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
13-
(which may or may not ever get implemented).
9+
both directions of the traffic (ex router or middlebox). Currently it works for
10+
TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
11+
could be extended to also work with for example TCP seq/ACK numbers, the QUIC
12+
spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
13+
features (which may or may not ever get implemented).
1414

1515
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
1616
outgoing packets, and then look for matches in the incoming packets. If a match
1717
is found, the RTT is simply calculated as the time difference between the
1818
current time and the stored timestamp.
1919

2020
This tool, just as Kathie's original pping implementation, uses TCP timestamps
21-
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
22-
itself) is timestamped. Incoming packets are then parsed for the TSecr, which
23-
are the echoed TSval values from the receiver. The TCP timestamps are not
24-
necessarily unique for every packet (they have a limited update frequency,
25-
appears to be 1000 Hz for modern Linux systems), so only the first instance of
26-
an identifier is timestamped, and matched against the first incoming packet with
27-
the identifier. The mechanism to ensure only the first packet is timestamped and
28-
matched differs from the one in Kathie's pping, and is further described in
29-
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
21+
as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
22+
timestamp in and off itself) is timestamped. Incoming packets are then parsed
23+
for the TSecr, which are the echoed TSval values from the receiver. The TCP
24+
timestamps are not necessarily unique for every packet (they have a limited
25+
update frequency, appears to be 1000 Hz for modern Linux systems), so only the
26+
first instance of an identifier is timestamped, and matched against the first
27+
incoming packet with the identifier. The mechanism to ensure only the first
28+
packet is timestamped and matched differs from the one in Kathie's pping, and is
29+
further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
30+
31+
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
32+
number as identifer to match against. Linux systems will typically use different
33+
echo identifers for different instances of ping, and thus each ping instance
34+
will be recongnized as a separate flow. Windows systems typically use a static
35+
echo identifer, and thus all instaces of ping originating from a particular
36+
Windows host and the same target host will be considered a single flow.
3037

3138
## Output formats
3239
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
@@ -41,12 +48,12 @@ single line per event.
4148

4249
An example of the format is provided below:
4350
```shell
44-
16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
45-
16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528
46-
16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
47-
16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
48-
16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
49-
16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
51+
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
52+
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
53+
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
54+
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
55+
16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
56+
16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
5057
```
5158

5259
### ppviz format
@@ -196,8 +203,8 @@ these identifiers.
196203

197204
This issue could be avoided entirely by requiring that new-id > old-id instead
198205
of simply checking that new-id != old-id, as TCP timestamps should monotonically
199-
increase. That may however not be a suitable solution if/when we add support for
200-
other types of identifiers.
206+
increase. That may however not be a suitable solution for other types of
207+
identifiers.
201208

202209
#### Rate-limiting new timestamps
203210
In the tc/egress program packets to timestamp are sampled by using a per-flow

pping/TODO.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,15 @@
1414
- If one only considers SEQ/ACK (and don't check for SACK
1515
options), could result in ex. delay from retransmission being
1616
included in RTT
17-
- [ ] ICMP (ex Echo/Reply)
17+
- [x] ICMP (ex Echo/Reply)
1818
- [ ] QUIC (based on spinbit)
19+
- [ ] DNS queries
1920

2021
## General pping
2122
- [x] Add sampling so that RTT is not calculated for every packet
2223
(with unique value) for large flows
2324
- [ ] Allow short bursts to bypass sampling in order to handle
24-
delayed ACKs
25+
delayed ACKs, reordered or lost packets etc.
2526
- [x] Keep some per-flow state
2627
- Will likely be needed for the sampling
2728
- [ ] Could potentially include keeping track of average RTT, which

pping/pping.c

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/* SPDX-License-Identifier: GPL-2.0-or-later */
22
static const char *__doc__ =
3-
"Passive Ping - monitor flow RTT based on TCP timestamps";
3+
"Passive Ping - monitor flow RTT based on header inspection";
44

55
#include <bpf/bpf.h>
66
#include <bpf/libbpf.h>
@@ -51,16 +51,16 @@ enum PPING_OUTPUT_FORMAT {
5151
};
5252

5353
/*
54-
* BPF implementation of pping using libbpf
55-
* Uses TC-BPF for egress and XDP for ingress
56-
* - On egrees, packets are parsed for TCP TSval,
57-
* if found added to hashmap using flow+TSval as key,
58-
* and current time as value
59-
* - On ingress, packets are parsed for TCP TSecr,
60-
* if found looksup hashmap using reverse-flow+TSecr as key,
61-
* and calculates RTT as different between now map value
62-
* - Calculated RTTs are pushed to userspace
63-
* (together with the related flow) and printed out
54+
* BPF implementation of pping using libbpf.
55+
* Uses TC-BPF for egress and XDP for ingress.
56+
* - On egrees, packets are parsed for an identifer,
57+
* if found added to hashmap using flow+identifier as key,
58+
* and current time as value.
59+
* - On ingress, packets are parsed for reply identifer,
60+
* if found looksup hashmap using reverse-flow+identifier as key,
61+
* and calculates RTT as different between now and stored timestamp.
62+
* - Calculated RTTs are pushed to userspace
63+
* (together with the related flow) and printed out.
6464
*/
6565

6666
// Structure to contain arguments for clean_map (for passing to pthread_create)
@@ -678,16 +678,17 @@ static void print_event_standard(void *ctx, int cpu, void *data,
678678

679679
if (e->event_type == EVENT_TYPE_RTT) {
680680
print_ns_datetime(stdout, e->rtt_event.timestamp);
681-
printf(" %llu.%06llu ms %llu.%06llu ms ",
681+
printf(" %llu.%06llu ms %llu.%06llu ms %s ",
682682
e->rtt_event.rtt / NS_PER_MS,
683683
e->rtt_event.rtt % NS_PER_MS,
684684
e->rtt_event.min_rtt / NS_PER_MS,
685-
e->rtt_event.min_rtt % NS_PER_MS);
685+
e->rtt_event.min_rtt % NS_PER_MS,
686+
proto_to_str(e->rtt_event.flow.proto));
686687
print_flow_ppvizformat(stdout, &e->rtt_event.flow);
687688
printf("\n");
688689
} else if (e->event_type == EVENT_TYPE_FLOW) {
689690
print_ns_datetime(stdout, e->flow_event.timestamp);
690-
printf(" ");
691+
printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
691692
print_flow_ppvizformat(stdout, &e->flow_event.flow);
692693
printf(" %s due to %s from %s\n",
693694
flowevent_to_str(e->flow_event.event_info.event),
@@ -701,6 +702,7 @@ static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
701702
const struct rtt_event *e = data;
702703
__u64 time = convert_monotonic_to_realtime(e->timestamp);
703704

705+
// ppviz format does not support flow events
704706
if (e->event_type != EVENT_TYPE_RTT)
705707
return;
706708

pping/pping_kern.c

Lines changed: 76 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
#include <linux/ip.h>
99
#include <linux/ipv6.h>
1010
#include <linux/tcp.h>
11+
#include <linux/icmp.h>
12+
#include <linux/icmpv6.h>
1113
#include <stdbool.h>
1214

1315
// overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit
@@ -182,6 +184,64 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
182184
return 0;
183185
}
184186

187+
/*
188+
* Attemps to fetch an identifier for an ICMPv6 header, based on the echo
189+
* request/reply sequence number.
190+
* If successful, identifer will be set to the echo sequence number, both
191+
* sport and dport will be set to the echo identifier, and 0 will be returned.
192+
* On failure, -1 will be returned.
193+
* Note: Will store the 16-bit echo sequence number in network byte order in
194+
* the 32-bit identifier.
195+
*/
196+
static int parse_icmp6_identifier(struct parsing_context *ctx, __u16 *sport,
197+
__u16 *dport, struct flow_event_info *fei,
198+
__u32 *identifier)
199+
{
200+
struct icmp6hdr *icmp6h;
201+
202+
if (parse_icmp6hdr(&ctx->nh, ctx->data_end, &icmp6h) < 0)
203+
return -1;
204+
205+
if (ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REQUEST)
206+
return -1;
207+
if (!ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REPLY)
208+
return -1;
209+
if (icmp6h->icmp6_code != 0)
210+
return -1;
211+
212+
fei->event = FLOW_EVENT_NONE;
213+
*sport = icmp6h->icmp6_identifier;
214+
*dport = *sport;
215+
*identifier = icmp6h->icmp6_sequence;
216+
return 0;
217+
}
218+
219+
/*
220+
* Same as parse_icmp6_identifier, but for an ICMP(v4) header instead.
221+
*/
222+
static int parse_icmp_identifier(struct parsing_context *ctx, __u16 *sport,
223+
__u16 *dport, struct flow_event_info *fei,
224+
__u32 *identifier)
225+
{
226+
struct icmphdr *icmph;
227+
228+
if (parse_icmphdr(&ctx->nh, ctx->data_end, &icmph) < 0)
229+
return -1;
230+
231+
if (ctx->is_egress && icmph->type != ICMP_ECHO)
232+
return -1;
233+
if (!ctx->is_egress && icmph->type != ICMP_ECHOREPLY)
234+
return -1;
235+
if (icmph->code != 0)
236+
return -1;
237+
238+
fei->event = FLOW_EVENT_NONE;
239+
*sport = icmph->un.echo.id;
240+
*dport = *sport;
241+
*identifier = icmph->un.echo.sequence;
242+
return 0;
243+
}
244+
185245
/*
186246
* Attempts to parse the packet limited by the data and data_end pointers,
187247
* to retrieve a protocol dependent packet identifier. If sucessful, the
@@ -225,15 +285,21 @@ static int parse_packet_identifier(struct parsing_context *ctx,
225285
return -1;
226286
}
227287

228-
// Add new protocols here
229-
if (p_id->flow.proto == IPPROTO_TCP) {
230-
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port,
231-
fei, &p_id->identifier);
232-
if (err)
233-
return -1;
234-
} else {
235-
return -1;
236-
}
288+
// Parse identifer from suitable protocol
289+
if (p_id->flow.proto == IPPROTO_TCP)
290+
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei,
291+
&p_id->identifier);
292+
else if (p_id->flow.proto == IPPROTO_ICMPV6 &&
293+
p_id->flow.ipv == AF_INET6)
294+
err = parse_icmp6_identifier(ctx, &saddr->port, &daddr->port,
295+
fei, &p_id->identifier);
296+
else if (p_id->flow.proto == IPPROTO_ICMP && p_id->flow.ipv == AF_INET)
297+
err = parse_icmp_identifier(ctx, &saddr->port, &daddr->port,
298+
fei, &p_id->identifier);
299+
else
300+
return -1; // No matching protocol
301+
if (err)
302+
return -1; // Failed parsing protocol
237303

238304
// Sucessfully parsed packet identifier - fill in IP-addresses and return
239305
if (p_id->flow.ipv == AF_INET) {
@@ -267,7 +333,7 @@ static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
267333
{
268334
fe->event_type = EVENT_TYPE_FLOW;
269335
fe->timestamp = timestamp;
270-
__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple));
336+
fe->flow = *flow;
271337
fe->source = source;
272338
fe->reserved = 0; // Make sure it's initilized
273339
}

0 commit comments

Comments
 (0)