搜索结果: 1-2 共查到“管理学 Contextual Bandits”相关记录2条 . 查询时间(0.109 秒)
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling Contextual Bandits Linear Payoffs
2012/11/23
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several s...
Efficient Optimal Learning for Contextual Bandits
Efficient Optimal Learning Contextual Bandits
2011/7/6
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken.