202113 Oct

DeLTA seminar by Yi-Shan Wu


Testing hypotheses about the behavior of an unobserved policy in offline contextual bandits is a challenging task because no data from the distribution of interest is available. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set (mimicking properties of P) and then applies an existing test in the target domain. We prove that this procedure holds pointwise asymptotic level if the target test holds pointwise asymptotic level, the size of the resample is at most of order square root n, and the resampling weights are well-behaved. The talk is based on joint work with Nikolaj Thams, Sorawit Saengkyongam and Jonas Peters.

Source: Ku