NSCS 344, Week 7

Processing your data

Last time we introduced Expected Value theory and modeled how it might behave on the survey. Today you are going to explore your own data (assuming you did the survey) as well as data from other students who have taken this class.
To get the data you need to download the file
riskyChoiceData_2020.mat
from D2L.
Once you've downloaded the data make sure you move it into the directory you are writing your scripts in for this week (i.e. the Week_07 directory in your NSCS folder on Dropbox).
Then, from a script, you can load the data like this
clear
load riskyChoiceData_2020
This loads a bunch of variables into Matlab. Let's take a look at what we've got using the "whos" command
whos
Name Size Bytes Class Attributes BR 1x146 1168 double P 1x12 96 double QUS 1x12 1818 cell V 1x12 96 double rsk 146x12 14016 double
We've got 5 variables here
Note: Because I am making this example before this year's class completed the survey, I have fewer participants in my data set (146) than you will have.

The questions, QUS

We can explore these variables more in the Command Window. Let's start with the text of the questions, which we can get by typing ...
QUS'
ans = 12×1 cell
'50% chance to win $20'
'25% chance to win $19.74'
'26% chance to win $23.12'
'58% chance to win $11.93'
'50% chance to win $16.04'
'62% chance to win $14.41'
'28% chance to win $35.52'
'58% chance to win $18.79'
'71% chance to win $16.88'
'34% chance to win $38.04'
Each line here corresponds to the text of the risky option in each question (e.g. question 1 had a risky option of 50% chance to win $20). Remember that the safe option in this survey was always $10 for sure.
If you want to look at the text of just one question, say question 4, you can type
QUS{4}
ans = '58% chance to win $11.93'
Note that you have to use curly brackets { and } with cell arrays.
We're not actually going to do any more with QUS - it's mainly in here just to reorient you to the survey and in case you want to refer to the questions themselves.

Probabilities and values

Next let's take a closer look at P and V ...
P
P = 1×12
0.5000 0.2500 0.2600 0.5800 0.5000 0.6200 0.2800 0.5800 0.7100 0.3400 0.3600 0.7500
V
V = 1×12
20.0000 19.7400 23.1200 11.9300 16.0400 14.4100 35.5200 18.7900 16.8800 38.0400 38.5100 19.9700
P is a vector of 12 numbers telling you the probability of winning in risky option for each question. So
P(4)
ans = 0.5800
consistent with a 58% chance on question four.
V is a vector of 12 numbers telling you the value of winning for the risky option. So
V(4)
ans = 11.9300
consistent with a winning payout of $11.93 on question 4.
Using P and V we can compute the Expected Value of the risky option for each question. One way to do this is directly with a for loop
for i = 1:length(P)
EV_risky(i) = P(i) * V(i);
end
Or you could do it using element-wise multiplication
EV_risky = P.*V;
Or you could reuse your EVtheory_survey function from last week. If you want to do this be sure to copy and paste the function into your directory ...
for i = 1:length(P)
[EV_safe(i), EV_risky(i), choice(i)] = EVtheory_survey(10, P(i), V(i));
end
This approach also gives you the choice that pure EV theory would make for "free."
Let's see what those Expected Values are
EV_risky'
ans = 12×1
10.0000 4.9350 6.0112 6.9194 8.0200 8.9342 9.9456 10.8982 11.9848 12.9336
Note: I used the transpose here to transform EV_risky from a row vector into a column vector for easier viewing.
To get an even better idea about EV_risky let's plot it as a function of question number
clf;
plot(EV_risky, '.', 'markersize', 50)
xlabel('question number')
ylabel('EV_risky', 'interpreter', 'none')
set(gca, 'fontsize', 18)
This reveals the design of the experiment. I started with an "easy" question
QUS{1}
ans = '50% chance to win $20'
Which has simple numbers and where the Expected Value is just 10. Then I used more complicated questions like
QUS{2}
ans = '25% chance to win $19.74'
These questions were designed such that EV_risky would go from about 5 to about 15 in steps of 1. Hence the (very near) linear increase in EV_risky with question number from question 2 to 12 in the plot.
Why did I design the experiment like this? Well, I wanted a range of differences between EV_risky and EV_safe (which remember is always 10). This will allow us to compute the choice curve for people just like we computed the choice curve for the model last week. However, before we can compute the choice curve, we need to take a closer look at the choice data ...

The choices

The actual choices for each person on each question are in the matrix rsk. For me this matrix has size 146 x 12, meaning that there are 146 rows (1 per subject) and 12 columns (1 per question).
Note you will have more subjects in your data set than I do because it will also include this year's participants.
So if I look at rsk(103, 10) I get the choice of subject number 103 on question 10
rsk(103,10)
ans = 1
A value of 1 indicates that this person chose the risky option on this question.
If we want to look at all the choices made by participant 103 we can write
rsk(103,:)
ans = 1×12
1 0 0 0 1 1 1 1 1 1 1 1
which gives us a row vector of length 12.
Note the special use of the ":" here it's saying "take the whole of row 103."
If we want to look at an entire column (i.e. all the responses to one question, say question 5) we can write
rsk(:,5)
ans = 146×1
1 0 1 0 0 0 0 0 0 0
Which gives us a long column vector containing one entry per participant.
Finally, we can view the whole matrix by just typing in
rsk
rsk = 146×12
1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 0 1
However, as nice as it is to look at individual raw data points (and I always recommend doing this if you have data of your own - for example in your project for this class) it would be nice if we could visualize things a bit better. One way to do this is to make an image of the matrix like this ...
imagesc(rsk)
xlabel('question number')
ylabel('participant number')
set(gca, 'fontsize', 18)
In this plot we get to see the whole matrix at once. Question number is on the x-axis and subject number is on the y-axis. 1s (risky choices) are yellow and 0s (safe choices) are blue. There's definitely some structure here - there's more yellow on the right of the plot and more blue towards the left. We'll explore this in more detail in a moment, but first a ghost story ...

The Matlab ghost

It turns out that Matlab is haunted by the ghost of a child. You can see this for yourself by typing imagesc without any input ...
clf;
imagesc
In fact it's even spookier than that and there are actually all sorts of creepy dogs, people, and objects hidden in this image at different scales ... here's a gif I made of some of them ...
ghost_loop.gif
You can get this for yourself (without saving as a gif) using this code ...
clear
defImage = pow2(get(0,'DefaultImageCData'),47);
imgCell = repmat({zeros(size(defImage))},8,7);
for shift = 0:52
imgCell{shift+1} = bitshift(defImage,shift);
end;
allImages = cell2mat(imgCell.');
imshow(allImages,[min(allImages(:)) max(allImages(:))]);
figure(1); clf
for i = 1:52
i
imagesc(imgCell{i})
title(num2str(i))
pause(1)
end
Hopefully you're not too spooked to be able to continue with Matlab! Back to the choice curves ...

Human choice curve

The next thing we'd like to do is actually compute the average choice curve over all people. Specifically, for each question we are going to compute two things:
We will then plot these things against each other to get the choice curves.
For the frequency with which people choose the safe option, we can get this from one minus the frequency with which they choose the risky option. This in turn is given by the mean of the rsk matrix over the participants, which we can compute like this
f_risk = mean(rsk);
Note: By default, mean takes the average over the first dimension of a matrix (i.e. the rows) which for us is the subjects. We will take the mean over the columns later on when we consider how likely a given person is to choose a risky option.
Then we compute f_safe as
f_safe = 1 - f_risk;
We also already compute EV_safe and EV_risky so we can plot the choice curve like this ...
clf;
plot(EV_safe - EV_risky, f_safe, '.', 'markersize', 50)
set(gca, 'fontsize', 18)
xlabel('EV_safe - EV_risky', 'interpreter', 'none')
ylabel('probability of choosing safe option')
This looks (roughly) like the kind of curve we had last week. Indeed, if we borrow our softmax function from last time (i.e. copy and paste it into this week's folder) we can try to superimpose the model curve on this plot, for a particular value of sigma. First, let's compute the softmax curve like this ...
sigma = 3;
Delta = [-6:0.1:6];
p_safe_softmax = softmax(sigma, Delta);
Then let's plot it on the same plot like this ...
clf;
plot(EV_safe - EV_risky, f_safe, '.', 'markersize', 50)
hold on;
plot(Delta, p_safe_softmax, 'linewidth', 3)
set(gca, 'fontsize', 18)
xlabel('EV_safe - EV_risky', 'interpreter', 'none')
ylabel('probability of choosing safe option')
As you can see, that fit isn't terrible as a best fitting line. Of course, if you try it with different values of sigma you will get better or worse fits. Figuring out what the "best" value of sigma is is something we will come back to in the next two classes. But for now this is pretty nice - people look like their behavior is pretty well described by a softmax choice function with a noise standard deviation around 3.

Blink rates

Now let's take a look at the blink rates. The BR variable is just a row vector with one entry - the blinks per minute - for each participant
BR'
ans = 146×1
0 10.0000 NaN 16.5000 6.5000 25.0000 17.5000 19.0000 27.5000 20.0000
Note that some people didn't complete this part of the experiment and so their blink rate is given as NaN - which means not a number. These NaNs are a problem if we want to compute things like the average blink rate ...
averageBR = mean(BR)
averageBR = NaN
That's because the "mean" function includes the NaNs and doesn't really know what to do with them so it just reports NaN as the output. The way to get around this is with the function nanmean
averageBR = nanmean(BR)
averageBR = 20.0911
This average blink rate - about 20 times per minute is pretty close to values found in the literature, which gives us some confidence that this very crude way of measuring blink rates (i.e. by having you count your own blinks in a video) isn't totally crazy.
If we want to know more about the blink rates of the class we could plot the distribution of blink rates using the "hist" command
clf;
hist(BR, 20)
xlabel('blink rate')
ylabel('number of participants')
set(gca, 'fontsize', 18)
Here you get a sense of the variability in blinking across people - some people barely blink at all, while others blink more than once a second! Again this distribution is consistent with the literature so we can be reasonably confident that our measure of blink rate is good.
But why do we measure blink rate in the first place? Well, it turns out that blink rate correlates with dopamine in the striatum - with more blinking associated with higher dopamine levels.
This correlation between blinks and dopamine offers us an opportuntity to do some cognitive neuroscience. Specifically, there's some literature out there to suggest that increased dopamine is associated with increased risk taking behavior. If this is true then we'd predict that in our experiment, more blinking is associated with a greater propensity to choose the risky option. In the next section we will test whether that holds ...

The relationship between blink rate and individual differences in risk taking

If blinks = dopamine and dopamine = risk taking then we predict that blinks = risk taking. To test this hypothesis, we can ask whether there's a significant correlation between the average number of risky options each person chose and their blink rate.
To get the average number of risky item people chose, we can simply take the mean over the second dimension of rsk, i.e. take the mean over the questions for each subject
meanRiskTaking = mean(rsk,2);
Then we can plot the relatinoship between blink rate and risk taking like this ...
clf;
plot(BR, meanRiskTaking, '.', 'markersize', 50)
xlabel('blink rate')
ylabel('average risk taking')
set(gca, 'fontsize', 18)
Does this look like a real relationship? It's hard to say, let's try adding a best fitting straight line - in Matlab you can get this with the lsline function
clf;
plot(BR, meanRiskTaking, '.', 'markersize', 50)
xlabel('blink rate')
ylabel('average risk taking')
set(gca, 'fontsize', 18)
l = lsline;
set(l, 'linewidth', 3, 'color', 'r')
Well, at least it's a line with a positive slope! But we can do some formal statistics in Matlab too. Let's use the "corr" function to compute the correlation coefficient and the statistical significance ...
[r,p] = corr(BR', meanRiskTaking)
r = NaN
p = NaN
Oops! We have to deal with the NaNs. Unfortunately there's no "nancorr" like there is nanmean, instead we will have to do it manually.
First find the indices of BR that are not NaNs ...
ind = find(~isnan(BR))
ind = 1×135
1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 31 32 33 34 35 37 38 39 40 42 44 49 50 51 52 53 54 55 56 57 58 59 60
That's most, but not all of the data points. Then run the correlation on just these participants ...
[r,p] = corr(BR(ind)', meanRiskTaking(ind))
r = 0.0889
p = 0.3051
That's a small correlation coefficient (r = 0.0889) and a high p value. Not strong support for our hypothesis.
If you do the Extra Credit item in the Assignment you will take another look at these data with a differnet analysis that provides a hint that maybe there is a relationship after all. Indeed, in another experiment in my other classes (PSY 333 and PSY 433) I've consistently found a relationship between blinks and risk taking, although the effect size is small so it needs 100s of participants worth of data to detect.