Skip to content

Validation

ehellbar edited this page Mar 7, 2023 · 16 revisions

Creating the validation data, histograms and pdf maps

Set appropriate options in default.yml and config_model_parameters.yml. The code used to extract the validation data is put to tpcwithdnn/idc_data_validator.py for 1D fluctuations and to tpcwithdnn/data_validator.py for the old CNN part.

In default.yml:

  • docreatendvaldata - whether to produce a root file with full validation data described on the Validation Data Format page; the result file is needed for other options below to work
    • the output is split into parts to be more efficient with memory. The output files can be merged with the script tpcwithdnn/merge_validation_trees.sh
  • docreatepdfmaps – whether to create ND histograms (as *.gzip files) and pdf maps (*.root) for all validation data:
    • old input (U-Net): mean with id 0, 9, 18 (scaling: 1.0, 1.1, 0.9)
    • IDC input (BDT): mean maps with id 0, 9, 18, 27, 36 (scaling: 1.00, 1.03, 0.97, 1.06, 0.94)
  • docreatepdfmapforvariable - whether to create ND histograms and pdf maps for the data specified in config_model_parameters.yml
  • domergepdfmaps - whether to merge pdf maps for different mean maps and factors into one file

Note: the data created by docreatendvaldata can now be visualized interactively with jupyter notebooks, including interactive histogramming, e.g. with notebooks/model_validation.ipynb. Therefore, the part of creating pdf maps is not necessary anymore.

In config_model_parameters.yml:

  • dirtree - where to save validation data (ROOT tree files) and pdf maps
  • dirhist - where to save validation histograms
  • nd_val_events - number of scenarios for ND validation
  • nd_val_partition - where the validation scenarios should be taken from:
    • random - sample randomly, but use only mean factors of 0.9, 1.0, 1.1 for U-Net, 0.94, 0.97, 1.0, 1.03, 1.06 for BDT data
    • train, val, apply - train / validation / apply data
  • nd_validate_model: if the trained model (its predictions) should be evaluated as well; required for ND histograms and maps
  • pdf_map_var, pdf_map_mean_id: ND histograms and pdf maps will be created for this variable and mean map id if docreatepdfmapforvariable is set in default.yml

Plotting and browsing pdf maps and validation trees

The easiest way to examine the result files is to use the interactive Jupyter notebook available here.

Alternatively, one can manually draw plots with ROOT, from *.root pdf maps.

Running the notebook on remote workstation via ssh

Enter the notebooks directory:

cd notebooks/

Launch Jupyter without browser (you will later browse on your local machine). It will print an URL with a token, copy and store this for the next step.

python -m notebook --no-browser --port=8887 # Or any other reasonable port number

On your local machine, tunnel the localhost to the notebook port:

ssh -N -L localhost:8888:localhost:{remote_port_number} {user@remote_machine}

You can browse the notebooks at the URL returned to you by Jupyter, just change the port number to 8888.

Running the notebook - U-Net example

  1. Prepare a list of pdf maps to be contained in the notebook:
    1. adjust makePDFMapsList() in TPCwithDNN/notebooks/makePDFMapsLists.sh
    2. run:
source makePDFMapsLists.sh
makePDFMapsList
  1. You should get a new file pdfmaps.list with paths to proper pdf files
  2. Enter the notebook model_performance_evaluation.ipynb in the Jupyter web browser.
  3. You need to adjust the directory variables on the top
  4. Follow the rest of code in the notebook, adjusting any file paths as needed. You might need to adjust the cuts if there is no matching data.

CAUTION: Validation input can be very big - better keep the default selection and do not run the whole notebook at once but only the plots of interest.