Validation

Creating the validation data, histograms and pdf maps

Set appropriate options in default.yml and config_model_parameters.yml. The code used to extract the validation data is put to tpcwithdnn/idc_data_validator.py for 1D fluctuations and to tpcwithdnn/data_validator.py for the old CNN part.

In default.yml:

docreatendvaldata - whether to produce a root file with full validation data described on the Validation Data Format page; the result file is needed for other options below to work
- the output is split into parts to be more efficient with memory. The output files can be merged with the script tpcwithdnn/merge_validation_trees.sh
docreatepdfmaps – whether to create ND histograms (as *.gzip files) and pdf maps (*.root) for all validation data:
- old input (U-Net): mean with id 0, 9, 18 (scaling: 1.0, 1.1, 0.9)
- IDC input (BDT): mean maps with id 0, 9, 18, 27, 36 (scaling: 1.00, 1.03, 0.97, 1.06, 0.94)
docreatepdfmapforvariable - whether to create ND histograms and pdf maps for the data specified in config_model_parameters.yml
domergepdfmaps - whether to merge pdf maps for different mean maps and factors into one file

Note: the data created by docreatendvaldata can now be visualized interactively with jupyter notebooks, including interactive histogramming, e.g. with notebooks/model_validation.ipynb. Therefore, the part of creating pdf maps is not necessary anymore.

In config_model_parameters.yml:

dirtree - where to save validation data (ROOT tree files) and pdf maps
dirhist - where to save validation histograms
nd_val_events - number of scenarios for ND validation
nd_val_partition - where the validation scenarios should be taken from:
- random - sample randomly, but use only mean factors of 0.9, 1.0, 1.1 for U-Net, 0.94, 0.97, 1.0, 1.03, 1.06 for BDT data
- train, val, apply - train / validation / apply data
nd_validate_model: if the trained model (its predictions) should be evaluated as well; required for ND histograms and maps
pdf_map_var, pdf_map_mean_id: ND histograms and pdf maps will be created for this variable and mean map id if docreatepdfmapforvariable is set in default.yml

Plotting and browsing pdf maps and validation trees

The easiest way to examine the result files is to use the interactive Jupyter notebook available here.

Alternatively, one can manually draw plots with ROOT, from *.root pdf maps.

Running the notebook on remote workstation via ssh

Enter the notebooks directory:

cd notebooks/

Launch Jupyter without browser (you will later browse on your local machine). It will print an URL with a token, copy and store this for the next step.

python -m notebook --no-browser --port=8887 # Or any other reasonable port number

On your local machine, tunnel the localhost to the notebook port:

ssh -N -L localhost:8888:localhost:{remote_port_number} {user@remote_machine}

You can browse the notebooks at the URL returned to you by Jupyter, just change the port number to 8888.

Running the notebook - U-Net example

Prepare a list of pdf maps to be contained in the notebook:
1. adjust makePDFMapsList() in TPCwithDNN/notebooks/makePDFMapsLists.sh
2. run:

source makePDFMapsLists.sh
makePDFMapsList

You should get a new file pdfmaps.list with paths to proper pdf files
Enter the notebook model_performance_evaluation.ipynb in the Jupyter web browser.
You need to adjust the directory variables on the top
Follow the rest of code in the notebook, adjusting any file paths as needed. You might need to adjust the cuts if there is no matching data.