1- ## adaparse
1+ ## Command line interface (CLI)
22
33The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently.
44
@@ -13,56 +13,93 @@ The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, norm
1313 - ` -p ` , ` --path ` : Process all the URLs in a given file
1414 - ` -o ` , ` --output ` : Output the results of the parsing to a file
1515
16- ### Usage/Examples:
16+ ### Performance
1717
18- Well-formatted URL:
18+ Our ` adaparse ` tool may outperform other popular alternatives. We offer a [ collection of
19+ sets of URLs] ( https://github.com/ada-url/url-various-datasets ) for benchmarking purposes.
20+ The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
21+ compare against [ trurl] ( https://github.com/curl/trurl ) version 0.6 (libcurl/7.87.0).
1922
20- ``` bash
21- adaparse " http://www.google.com "
23+ < details >
24+ < summary >With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about **three times faster than trurl**.</ summary >
2225```
23- Output:
26+ time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
27+ cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
28+ trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
29+
2430
31+ time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
32+ cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
33+ ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
2534```
26- http://www.google.com
35+ </details >
36+
37+ <details >
38+ <summary >With the top100 dataset, the adaparse tool is **twice as fast as the trurl**.</summary >
39+ ```
40+ time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
41+ cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
42+ trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
43+
44+ time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
45+ cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
46+ ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
2747```
48+ </details >
49+
50+
51+ #### Comparison
2852
29- Ill-formatted URL:
53+ ```
54+ wikipedia 100k
55+ ada ▏ 55 ms ███████▋
56+ trurl ▏ 180 ms █████████████████████████
57+
58+ top100
59+ ada ▏ 61 ms █████████████▍
60+ trurl ▏ 113 ms █████████████████████████
61+ ```
62+
63+ The results will vary depending on your system. We invite you to run your own benchmarks.
64+
65+ ### Usage/Examples
66+
67+ #### Well-formatted URL
3068
3169``` bash
32- adaparse " h^tp:ws:/ www.g00g .com"
70+ adaparse " http:// www.google .com"
3371```
3472Output:
3573
3674```
37- Invalid URL: h^tp:ws:/ www.g00g .com
75+ http:// www.google .com
3876```
3977
40-
41- Diagram flag:
78+ #### Diagram
4279
4380``` bash
4481adaparse -d http://www.google.com/bal\? a\=\= 11\# fddfds
45- ```
82+ ```
4683
4784Output:
4885
49- ```
50- http://www.google.com/bal?a==11#fddfds [38 bytes]
51- | | | | |
52- | | | | `------ hash_start
53- | | | `------------ search_start 25
54- | | `---------------- pathname_start 21
55- | | `---------------- host_end 21
56- | `------------------------------ host_start 7
57- | `------------------------------ username_end 7
58- `-------------------------------- protocol_end 5
86+ ```
87+ http://www.google.com/bal?a==11#fddfds [38 bytes]
88+ | | | | |
89+ | | | | `------ hash_start
90+ | | | `------------ search_start 25
91+ | | `---------------- pathname_start 21
92+ | | `---------------- host_end 21
93+ | `------------------------------ host_start 7
94+ | `------------------------------ username_end 7
95+ `-------------------------------- protocol_end 5
5996```
6097
98+ #### Pipe Operator
6199
62-
63- ### Piping Example
64-
65- Ada can process URLs from piped input, making it easy to integrate with other command-line tools that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada. Given a list of URLs, one by line, we may query the normalized URL string (` href ` ) and detect any malformed URL:
100+ Ada can process URLs from piped input, making it easy to integrate with other command-line tools
101+ that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada.
102+ Given a list of URLs, one by line, we may query the normalized URL string (` href ` ) and detect any malformed URL:
66103
67104``` bash
68105cat dragonball_url.txt | adaparse --get href
@@ -95,14 +132,16 @@ www.gohan.com
95132If you omit ` -g ` , it will only provide a list of invalid URLs. This might be
96133useful if you want to valid quickly a list of URLs.
97134
135+ ### Benchmark Runner
98136
99137The benchmark flag can be used to output the time it takes to process piped input:
100138
101139``` bash
102140cat wikipedia_100k.txt | adaparse -b
103141```
104142
105- ``` bash
143+ Output:
144+ ```
106145Invalid URL: 1968:_Die_Kinder_der_Diktatur
107146Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
108147Invalid URL: 650luc:_Gangsta_Grillz
@@ -120,26 +159,29 @@ read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
1201590.1587226744053009 GB/s
121160```
122161
162+ #### Saving result to file system
163+
123164There is an option to output to a file on disk:
124165
125166``` bash
126-
127167cat wikipedia_100k.txt | adaparse -o wiki_output.txt
128168```
129169
130- as well as read in from a file on disk without going through cat:
170+ As well as read in from a file on disk without going through cat:
131171
132172``` bash
133173adaparse -p wikipedia_top_100_txt
134174```
135175
176+ #### Advanced Usage
177+
136178You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file:
137179
138180``` bash
139181adaparse" -p wikipedia_top100.txt -o test_write.txt -g host -b
140182` ` `
141183
142- Console output :
184+ Output :
143185` ` ` bash
144186read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
1451870.19483260937757307 GB/s(base)
@@ -160,51 +202,3 @@ en.wikipedia.org
160202en.wikipedia.org
161203(---snip---)
162204` ` `
163-
164- ### Performance
165-
166- Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
167- sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
168- The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
169- compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).
170-
171- <details><summary>
172- With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about three
173- times faster than trurl.</summary>
174- <pre>
175- time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
176- cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
177- trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
178-
179-
180- time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
181- cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
182- ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
183- </pre>
184- </details>
185-
186- <details><summary>With the top100 dataset, the adaparse tool is twice as fast as the trurl.</summary>
187- <pre>
188- time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
189- cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
190- trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
191-
192- time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
193- cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
194- ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
195- </pre>
196- </details>
197-
198-
199-
200- The results will vary depending on your system. We invite you to run your own benchmarks.
201-
202- ```
203- wikipedia 100k
204- ada ▏ 55 ms ███████▋
205- trurl ▏ 180 ms █████████████████████████
206-
207- top100
208- ada ▏ 61 ms █████████████▍
209- trurl ▏ 113 ms █████████████████████████
210- ```
0 commit comments