White RoomNEW

Why can doubly robust off-policy evaluation outperform importance sampling?

A doubly robust off-policy estimator combines importance ratios with an estimated action-value model. Under the method's standard support and logging-policy assumptions, which description is most accurate?