We focus on designing a finite sequence of experiments, seeking fully optimal design policies (strategies) that can (a) adapt to newly collected data during the sequence (i.e. feedback) and (b) anticipate future changes (i.e. lookahead). We approach this sequential decision-making problem in a Bayesian setting with information-based utilities, and solve it numerically via policy gradient methods from reinforcement learning. In particular, we directly parameterize the policy and value functions by neural networks‚Äîthus adopting an actor-critic approach‚Äîand improve them using gradient estimates produced from simulated design and observation sequences. The overall method is demonstrated on an algebraic benchmark and a sensor movement application for source inversion. The results provide intuitive insights on the benefits of feedback and lookahead, and indicate computational advantages compared to previous approaches based on approximate dynamical programming.
Add to my calendar
Create your personal schedule through the official app, Whova!Get Started