Named Entity and Relation Extraction models for NFL play-by-play snippets
- Scrap Data
- Centralize Data
- combine multiple files into a single one
- Build Dataset / Model
- Split
- splits random subset for managable inspection - 1% at random
- ITERATE
- Annotate Data
- builds a redacted file for quick visual inspection
- Inspect Data
- if issues, fix and annotate again
- may require a complete reset of "gold standard" dataset
- Save
- add data to be used in model building - "gold standard"
- Build Model
- Annotate Data
- Split
scrap game ids and play-by-play text from ESPN for 2022 NFL regular season.
from the project root
cd tasks\scrapmake scrap-schedulesoutput files found in "tasks/data/1/"
make scrap-pbpoutput files found in "tasks/data/2/"
create a main source file and split into dev / holdout datasets
from the project root
cd tasks\scrap
make centralize-dataoutput files found in "tasks/data/3/"
from the project root
cd workspaceextr-ds --splitoutput files found in "workspace/2/"
extr-ds --annotateoutput files found in "workspace/3/"
extr-ds --relateoutput files found in "workspace/3/"
extr-ds --saveoutput files found in "tasks/data/4/"
make crf