Seminar: Risk-averse Allocation Indices for Multi-armed Bandit Problem by Milad MalekiPirbazari

Department of Industrial Engineering 

Risk-averse Allocation Indices for Multi-armed Bandit Problem

Milad MalekiPirbazari

PhD Candidate, Bilkent University 


In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multi-armed bandit problem with respect to this novel setting, and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.

Milad MalekiPirbazari is currently a Doctoral Candidate in the Industrial Engineering department at Bilkent University. His research interests are two-fold, namely, statistical machine learning with a particular emphasis on feature selection, and, risk-averse sequential decision making. He received his BS and MS degrees in Chemical Engineering from the University of Tehran and Tarbiat Modares University, Iran. He also received an MS degree in Industrial and Systems Engineering from Istanbul Sehir University. At Bilkent, he is currently employed as a teaching instructor.

Date: Friday, May 7, 2021


Online Seminar Link:

Meeting ID: 934 3051 2540

Passcode: 335899