Working Paper


Abstract: Firms can approach innovation in different directions: directions that can expedite innovation success are often more costly. How do firms choose the direction of experimentation in an innovation race? This paper develops a multi-arm bandit model to study the interaction between the direction of experimentation and reward structure. In a winner-takes-all contest where player action choices are public, players choose the faster but socially inefficient arm, leading to over-investment in the inefficient direction of experimentation. We find that a social planner can restore full efficiency by setting a dynamic reward structure that gives a relatively small share of the prize for the winner when they are still optimistic and gradually increases the winner’s share over time as beliefs drop in absence of success. Finally, while information disclosure makes no difference in a winner-takes-all contest in a two-armed model, we find that concealing player action choice in a three-armed bandit model improves efficiency.


  • Dismissal and Tenure in Strategic Experimentation


Abstract: We study a dynamic principal-agent problem where a principal hires an agent to experiment. The agent privately knows his own ability while the principal learns the agent's ability over time. The agent chooses between a risky project (experimentation) and a safe project. The probability of successfully completing a risky project increases with the ability of the agent, but a low-ability agent has incentive to imitate the high-ability agent in an attempt to secure the job for a longer period of time. We study how the principal can induce the agent to make the optimal decision in strategic experimentation with a retention policy. We show that if the principal cannot commit to a retention policy, efficient experimentation cannot be achieved. However, when the principal can commit to a certain retention policy in advance with probationary periods followed by infinite tenure, efficient experimentation can be achieved for parameters within a certain range.


  • Information Acquisition in Experimentation (Work in Progress)


Abstract: The classical multi-armed bandit problem focuses on the trade-off between exploration (finding out the best arm) and exploitation (sticking to the arm yielding the highest expected payoff). The selection of any given arm results in an immediate monetary payoff together with improved information about the prospects of the specific arm. Some real applications, however, are better modeled as a multi-armed bandit problem where some arms are purely informational in the sense that they do not yield immediate gains but provide information about the prospects of other arms. We build an exponential bandit model with an experimentation arm and an information arm, where the information arm provides positive but inconclusive evidence about the state of the world, to study whether and when an agent would benefit from acquiring information. We find that if information arm's arrival rate of information is sufficiently high, the agent chooses to acquire information when he is pessimistic, as a last-ditch effort before he quits. Moreover, choosing the information arm decreases the agent's stopping belief (i.e., he experiments for longer before quitting) relative to that of the case without the information arm.