Dr Troels C Petersen (Niels Bohr Institute, University of Copenhagen)
In particle physics, algorithms are trained on simulated data (MC) and then applied to the real data. Though everything is done to ensure identical distributions in data and MC, they are almost surely not the same, leading to suboptimal performance in the data. However, in many situations it is possible to obtain “approximate labels” in data through a Tag&Probe approaches in control channels. Such labels are usually not powerful enough to obtain good training results from data alone, but combining data with MC for simultaneous “hybrid training” allows ML algorithms to learn the general relations mainly from the perfectly labelled MC data, while at the same time learning the smaller adoptions needed for optimal performance in data. The approach will be shown through an example with electron energy regression in ATLAS.
An additional problem in applying ML techniques to particle physics data is the complexity and sparcity of the data, which only in its simplest form has a tabular form (i.e. same number of input variables for each case). While such a case is hard for both likelihood methods and “simple” Machine Learning (ML) algorithms, it fits a Graph Neural Network (GNN) perfectly, as they are build for such geometric cases. Considering the IceCube experiment on the South Pole, which consists of 5000+ optical modules embedded in a billion tons of Antarctic ice, I will show how the GNN approach solves both the geometric complication and the non-fixed input size elegantly, and how the GNN approach can be used in many places and further boosted with a transformer architecture.