Commit 9f81aa6
ENH: pd.read_html argument to extract hrefs along with text from cells (#45973)
* ENH: pd.read_html argument to extract hrefs along with text from cells
* Fix typing error
* Simplify tests
* Fix still incorrect typing
* Summarise whatsnew entry and move detailed explanation into user guide
* More flexible link extraction
* Suggested changes
* extract_hrefs -> extract_links
* Move versionadded to correct place and improve docstring for extract_links (@attack68)
* Test for invalid extract_links value
* Test all extract_link options
* Fix for MultiIndex headers (also fixes tests)
* Test that text surrounding <a> tag is still captured
* Test for multiple <a> tags in cell
* Fix all tests, with both MultiIndex -> Index and np.nan -> None conversions resolved
* Add back EOF newline to test_html.py
* Correct user guide example
* Update pandas/io/html.py
* Update pandas/io/html.py
* Update pandas/io/html.py
* Simplify MultiIndex -> Index conversion
* Move unnecessary fixtures into test body
* Simplify statement
* Fix code checks
Co-authored-by: JHM Darbyshire <24256554+attack68@users.noreply.github.com>1 parent c7b470c commit 9f81aa6
File tree
4 files changed
+186
-9
lines changed- doc/source
- user_guide
- whatsnew
- pandas
- io
- tests/io
4 files changed
+186
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2743 | 2743 | | |
2744 | 2744 | | |
2745 | 2745 | | |
| 2746 | + | |
| 2747 | + | |
| 2748 | + | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
| 2768 | + | |
| 2769 | + | |
2746 | 2770 | | |
2747 | 2771 | | |
2748 | 2772 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
| 292 | + | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| 34 | + | |
33 | 35 | | |
| 36 | + | |
34 | 37 | | |
35 | 38 | | |
36 | 39 | | |
| |||
184 | 187 | | |
185 | 188 | | |
186 | 189 | | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
187 | 196 | | |
188 | 197 | | |
189 | 198 | | |
| |||
202 | 211 | | |
203 | 212 | | |
204 | 213 | | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
205 | 220 | | |
206 | 221 | | |
207 | 222 | | |
208 | 223 | | |
209 | 224 | | |
| 225 | + | |
210 | 226 | | |
211 | 227 | | |
212 | 228 | | |
| |||
225 | 241 | | |
226 | 242 | | |
227 | 243 | | |
| 244 | + | |
228 | 245 | | |
229 | 246 | | |
230 | 247 | | |
231 | 248 | | |
232 | 249 | | |
233 | 250 | | |
| 251 | + | |
234 | 252 | | |
235 | 253 | | |
236 | 254 | | |
| |||
263 | 281 | | |
264 | 282 | | |
265 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
266 | 300 | | |
267 | 301 | | |
268 | 302 | | |
| |||
439 | 473 | | |
440 | 474 | | |
441 | 475 | | |
442 | | - | |
443 | | - | |
444 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
445 | 479 | | |
446 | 480 | | |
447 | 481 | | |
448 | | - | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
449 | 485 | | |
450 | 486 | | |
451 | 487 | | |
452 | 488 | | |
453 | 489 | | |
454 | 490 | | |
455 | 491 | | |
| 492 | + | |
456 | 493 | | |
457 | 494 | | |
458 | 495 | | |
459 | 496 | | |
460 | | - | |
| 497 | + | |
| 498 | + | |
461 | 499 | | |
462 | 500 | | |
463 | 501 | | |
464 | 502 | | |
465 | 503 | | |
466 | 504 | | |
467 | 505 | | |
468 | | - | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
469 | 510 | | |
470 | 511 | | |
471 | 512 | | |
| |||
485 | 526 | | |
486 | 527 | | |
487 | 528 | | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
488 | 532 | | |
489 | 533 | | |
490 | 534 | | |
| |||
589 | 633 | | |
590 | 634 | | |
591 | 635 | | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
592 | 640 | | |
593 | 641 | | |
594 | 642 | | |
| |||
680 | 728 | | |
681 | 729 | | |
682 | 730 | | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
683 | 735 | | |
684 | 736 | | |
685 | 737 | | |
| |||
920 | 972 | | |
921 | 973 | | |
922 | 974 | | |
923 | | - | |
| 975 | + | |
924 | 976 | | |
925 | 977 | | |
926 | 978 | | |
927 | 979 | | |
928 | 980 | | |
929 | 981 | | |
930 | | - | |
| 982 | + | |
931 | 983 | | |
932 | 984 | | |
933 | 985 | | |
| |||
955 | 1007 | | |
956 | 1008 | | |
957 | 1009 | | |
958 | | - | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
959 | 1021 | | |
960 | 1022 | | |
961 | 1023 | | |
| |||
978 | 1040 | | |
979 | 1041 | | |
980 | 1042 | | |
| 1043 | + | |
981 | 1044 | | |
982 | 1045 | | |
983 | 1046 | | |
| |||
1072 | 1135 | | |
1073 | 1136 | | |
1074 | 1137 | | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
1075 | 1144 | | |
1076 | 1145 | | |
1077 | 1146 | | |
| |||
1120 | 1189 | | |
1121 | 1190 | | |
1122 | 1191 | | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
1123 | 1198 | | |
1124 | 1199 | | |
1125 | 1200 | | |
| |||
1140 | 1215 | | |
1141 | 1216 | | |
1142 | 1217 | | |
| 1218 | + | |
1143 | 1219 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1340 | 1340 | | |
1341 | 1341 | | |
1342 | 1342 | | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
| 1407 | + | |
| 1408 | + | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
| 1412 | + | |
| 1413 | + | |
| 1414 | + | |
| 1415 | + | |
| 1416 | + | |
| 1417 | + | |
| 1418 | + | |
0 commit comments