Blog(1)
Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using offline Reinforcement Learning. In this work, we describe our custom OPS method and its successful application in Samsung Instant Plays for optimizing ad delivery timings.
Research Areas(0)
Publications(0)
News(0)
Others(0)