NSCS 344, Week 4

Assignment 4: Reinforcement learning 1

*** Due date: Start of class in Week 5 ***

Part 1: Simulate the bandit (2 points)

Part 2: Implement the simple averaging model (2 points)

Part 3: Implement the prediction error version of the averaging model (2 points)

Part 4: Implement the constant learning rate model (2 points)

Part 5: Test the constant learning rate model (2 points)

Part 6: Special cases for alpha (1 extra credit point)

The extra credit questions today involve Math, not coding. To hand in your work, just make a Word doc with equations in your Dropbox folder.

Part 7: Reinterpreting the constant learning rate model as a recency-weighted average of rewards (2 extra credit points)

Let's start from the equation for ...
What we are going to do is rewrite this in terms of a sum over all the rewards
where is a weight that you are going to compute.
To do this we first take the expression for and substitute it into the expression for to get ...
Now, take the expression for and substitute it into this expression. Then do the same again for . Hopefully by this point you can guess what the form of is.
If you can guess the form of use Matlab to plot it as a function of i.
What does this mean in terms of how is computed from a weighted sum of rewards.