Question about the random generator

Question

Question about the random generator

asked Jan 28, 2020 in SoSci Survey (English) by s108637 (205 points)

Greetings:

I seem to be having some issues with the random generator and I want to be sure I understand it properly. I conduced a study which resulted in 226 finished participants from a pool of 308. In a study with eight conditions there was a significant imbalance even though I had selected "Equal Distribution in Finished Interviews Option" The interviews were randomly started over about month long period so timing issues, particular related to when people who did finish completed the study, may have impacted the draw. Still it seemed as if the draw was uneven. Below are the draws for all eight conditions. There are four rows. The first are the draws for all of the participants who finished. The second are the draws for all the participants who did not finish (reach the final page in the survey). The third line is the total and the fourth line is sosci survey's listing of the draws for the variable. It is typically one more than the unfinished total which makes sense as the first numbers include an test run that would not have been counted as an interview.

            Cond 1	Cond 2	Cond 3	Cond 4	Cond 5	Cond 6	Cond 7	Cond 8

Finished 6 5 9 32 1 3 21 5
Unfinished 21 13 34 44 29 12 34 39
Total 27 18 43 76 30 15 55 44
Sosci Draw 22 14 35 46 30 13 35 40

Should I have chosen a different randomization option ?

commented Jan 28, 2020 by SoSci Survey (375k points)

commented Jan 30, 2020 by s108637 (205 points)
edited Jan 30, 2020 by s108637

Thank you for the quick response and my apologies for the delayed reply. The line "Sosci Draw" was what the random generator reported as to how often the conditions were drawn. This was a single questionnaire with only one set of responses per subject but it was a multi-wave survey spread out over 8 days in three sessions. The random generation occurred at the very end of the first session on the interrupt page.

I suspect the issue may be related to delays between when subjects finished and when the random cases were assigned. I downloaded the full finished data set and here are what came out as the number of cases in each of my eight conditions:

1. 22 cases
2. 14 cases
3. 35 cases
4. 46 cases
5. 30 cases
6. 13 cases
7. 35 cases
8: 40 Cases

One more apology. I labelled the columns incorrectly in my first report. They were exactly backwards. The first line was unfinished cases and the numbers per condition were (6,5,9,32,1,3,21,5) and the second line were the finished cases at (21,13,34,44,29,12,34,39) The discrepancy between the numbers I am reporting and the finished cases in the full downloaded data set is primarily due to one case that I excluded in my analysis but that sosci survey would count as finished.

1 Answer

Answer 1 · 2020-01-30T19:20:14+0000

commented Feb 1, 2020 by s108637 (205 points)

"Equal distribution in finished interviews (draw without returning)" is definitely selected for the "type of drawing". I am fairly certain that this was the case throughout the study. I remember setting at in the development phase and have no recollection of ever changing it. In fact this study is essentially a copy of two previously run studies each of which has the exact same setting for the same random generator variable. Of course I can't prove that it was the same but all evidence points that way.

I have done a further analysis of the drawing pattern and concluded that the algorithm is likely functioning as intended but was not a good choice my case. The conditions of my experiment are as follows:
1. Start times varied randomly over a period of about 5 weeks.
2. The study required at least 8 days to complete so there were significant delays between start and finish times.
3. There was high variation in the amount of time subjects took to finish ranging from 7.8 days days to 28.86 days with a mean of around 10.8 days and a standard deviation of about 3.8 days

What this amounted to were periods where many many surveys were started when the counts for which surveys had finished were the same. The algorithm would have a strong bias for those conditions with low finish numbers and assign many cases to them. During certain periods people in a given condition randomly finished more surveys than in others and that condition would be starved for some time until the others caught up. Again many surveys could start before that happened. In general, start times seemed to bunch together more than finish times.

All this to say that I don't think there is a bug but I would highly recommend that you include a note in your documentation suggesting that if you will have random start times over a long period and random start to finish duration times it is inadvisable to choose the "Equal distribution in finished interviews (draw without returning)" option. A simple round robin would likely have performed better in these circumstances.

commented Feb 1, 2020 by SoSci Survey (375k points)

Question about the random generator

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Categories