This semester SIAM will be holding a Casino Tactic Competition that people with all levels of math and computer science background are welcome to participate! This year’s problem is about slot machines and you are asked for a strategy that wins the most cumulative rewards from playing multiple slot machines! Excited about the question? check below for more information!
Poster:
Prompt: You are in a casino and faced with 10 slot machines. Each time you pull the arm of a slot machine, it will return a reward between $0 and $5. The rewards of each slot machine are not constant but follow the normal distribution with an unknown mean and standard deviation (let’s assume that the mean of each distribution is between 0 and 5 and the standard deviation is small enough so that we don’t need to worry about getting rewards below 0 or above 5 on the scale of 100 draws).
Now you are given 100 pulls and you can only pull one slot machine at a time, what strategy can you devise to maximize your cumulative reward?
Prizes:
($50, $30, $20) Gold, Silver, Bronze Medal: One metal each, given to the solution with clear reasoning that best addresses the problem.
($15) Most Creative Award: Given to the most original, innovative solution.
($15) Latex-Master Award: Given to the solution that contains the highest ratio of math equations (unrelated equations don’t count!).
($15) Space Elevator Award: Given to the solution that is most artistic, imaginative, and least likely to be implemented in the real world.
Anyone who participates will receive a gift for participation.
Participation Guidelines:
- Present your strategy in a comprehensible way with a clear logical flow. Make sure it’s legible if you do not choose to type it.
- There does not exist a single answer to this problem, so please provide reasoning behind your strategy. Your reasoning will be a key component in deciding the prizes. Feel free to use equations, graphs, and diagrams. Below are a couple of things to consider, we have also included two examples at the end of this post.
- Why do you think it will work a certain way? What additional assumptions does your strategy rely on? For example, one strategy might be best when each machine’s distribution has the same standard deviation while another strategy might be better when the standard deviations are drastically different.
- What inspired you to come up with this strategy.
- Cite if outside sources are used. No need to follow any citation style, a link or article/book title would suffice.
- Do not include personal information in your submission file. This is for ensuring fairness of the reviewing process.
- You are allowed to work in a group or ask for help from a third party (faculty members, your friends, etc.), but make sure you indicate people who you worked with or received help from (Don’t include their name in your solution, just CC them when you email your solution).
- Email to either d_cheng@coloradocollege.edu or y_zhong@coloradocollege.edu. If you have any questions, direct them to one of these two emails as well.
Deadline: Nov.17.2021 (end of block 3)
Prizes Release Date: Third Week of Block 4
How prizes are determined:
After making sure that a submission follows the participation guidelines, it will be sent to the judges. The judges are professors and paraprofs around campus. To ensure fairness, the reviewing process is double-blind, meaning the judges will not know whose submission they are viewing, neither will you know who reviewed your submissions. After viewing a submission, a judge will provide a rating for different aspects of the submission, which will be used later to determine who gets what prize. Each person can win at most one prize.
Examples
Example 1 (a creative solution):
First divide 10 machines into 5 groups of 2; try each machine once, then choose the relatively higher one in every group. Now we have 5 “winners”; put the winner A that has the most reward aside since A is going to skip some rounds, and we will have 4 “quaterfinal winners” left. Then, divide 4 winners into 2 groups of 2; try all four winners once, then choose the relatively higher one in each group; we will have 2 “semi-final winner”, say B,C. Now let the relatively lower one B in the 2 semi-final winners compete with the winner A. The winner between A and B will compete with C once more for the final championship. In the end, pull the champion machine for the rest of 84 times.
Reasoning:
This strategy is similar to the Olympics grouping rules where there is semi-quarter final round, quarter final round, semi-final round, final round. So the final champion will be the one who has stably high output. The only difference is that we let the winner A skip one round and give loser B in the semi-final round another chance to compete with A. So we can create a tree diagram as below:
Example 2 (a rigorous solution)
Pull each slot machine five times and identify the slot machine with the highest mean reward, then stick to that machine for the next 50 pulls.
Reasoning: We know that the more samples we draw from a normal distribution, the closer the mean of these samples would be to the true mean. Let the S = {s_1, s_2, …, s_10}, be the set of slot machines and let the slot machine with the highest mean reward be s*, and let the slot machine with the lowest mean reward be s^. Now let p: S -> R be the function that maps a slot machine to its true mean reward, and p’: S -> R be a function that maps a slot machine to its mean reward from the five pulls. Let t be the difference between p(s*) and p(s^). If the empirical mean reward of the worst slot machine is larger than the empirical mean reward of the best slot machine, it must be true that either p’(s*) < p(s*) – t/2 or p’(s^) > p(s^) + t/2. From Hoeffding’s inequality, we know
The Hoeffding’s Inequality is based on distributions between 0 and 1. To use it in this problem where the rewards are between 0 and 5, we scale the reward of the slot machine to between 0 and 1 by dividing t/2 by 5. Because we know the actual difference between the two true means <= 5, so t/10 <= 0.5. The following table shows the upper bound of the probability that the reward mean of the worst arm is higher than that of the best arm after five pulls.
t/10 | Upper bound of the probability that the worst arm has a higher mean than the best |
0.1 | 100% |
0.2 | 100% |
0.3 | 81% |
0.4 | 40% |
0.5 | 16% |
From this table, we can see that using this method of pulling every slot machine 5 times and then sticking to the machine with the highest mean reward, we can be sure that the chance of picking the worst machine over the best is small when t > 4.
sources: https://web.eecs.umich.edu/~cscott/past_courses/eecs598w14/notes/03_hoeffding.pdf