View Single Post
Old 01-27-2017, 04:58 PM  
woj
<&(©¿©)&>
 
woj's Avatar
 
Industry Role:
Join Date: Jul 2002
Location: Chicago
Posts: 47,882
Quote:
Originally Posted by Elli View Post
No. You can't have it both ways. Is the data flawed because the respondents lie and self-select, OR can you use the data to draw conclusions?

IF you accept the data is relevant to your cause, then I will direct you back to the article:

"We begin with an example. Suppose a survey question is asked of 20,000 respondents, and that, of these persons, 19,500 have a given characteristic (e.g., are citizens) and 500 do not. Suppose that 99.9 percent of the time the survey question identifies correctly whether people have a given characteristic, and 0.1 percent of the time respondents who have a given characteristic incorrectly state that they do not have that characteristic. (That is, they check the wrong box by mistake.) That means, 99.9 percent of the time the question correctly classifies an individual as having a characteristic—such as being a citizen of the United States—and 0.1 percent of the time it classifies someone as not having a characteristic, when in fact they do. This rate of misclassification or measurement error is extremely low and would be tolerated by any survey researcher. It implies, however, that one expects 19 people out of 20,000 to be incorrectly classified as not having a given characteristic, when in fact they do.

Normally, this is not a problem. In the typical survey of 1,000 to 2,000 persons, such a low level of measurement error would have no detectable effect on the sample. Even in very large sample surveys, survey practitioners expect a very low level of measurement error would have effects that wash out between two categories. The non-citizen voting example highlights a potential pitfall with very large databases in the study of low frequency categories. Continuing with the example of citizenship and voting, the problem is that the citizen group is very large compared to the non-citizen group in the survey. So even if the classification is extremely reliable, a small classification error rate will cause the bigger category to influence analysis of the low frequency category is substantial ways. Misclassification of 0.1 percent of 19,500 respondents leads us to expect that 19 respondents who are citizens will be classified as non-citizens and 1 non-citizen will be classified as a citizen. (This is a statistical expectation—the actual numbers will vary slightly.) The one non-citizen classified as a citizen will have trivial effects on any analyses of the overall pool of people categorized as citizens, as that individual will be 1 of 19,481 respondents. However, the 19 citizens incorrectly classified as non-citizens can have significant effects on analyses, as they are 3.7 percent (19 of 519) of respondents who said they are non-citizens.

Such misclassifications can explain completely the observed low rate of a behavior, such as voting, among a relatively rare or low-frequency group, such as non-citizens. Suppose that 70 percent of those with a given characteristic (e.g., citizens) engage in a behavior (e.g., voting). Suppose, further, that none of the people without the characteristic (e.g., non-citizens) are allowed to engage in the behavior in question (e.g., vote in federal elections). Based on these suppositions, of the 19 misclassified people, we expect 13 (70%) to be incorrectly determined to be non-citizen voters while 0 correctly classified non-citizens would be voters. Hence, a 0.1 percent rate of misclassification—a very low level of measurement error—would lead researchers to expect to observe that 13 of 519 (2.8 percent) people classified as non-citizens voted in the election, when those results are due entirely to measurement error, and no non-citizens actually voted."
imagine this hypothetical scenario:

You ask a group of 1000 people in a room, "raise your hand if you cheated on your taxes last year".... 20 people raise their hands... what conclusions would a reasonable person draw from that? that at LEAST 20 people in the room cheated on their taxes... is it likely that actual number is higher, perhaps much higher? of course, as there is strong bias to under-report illegal activity...

now comes along some wise-guy professor, and he tries to muddy the water a bit with some statistical bs: "such misclassifications can explain completely the observed low rate of a behavior"... implying that there is no tax evasion, because 20 people could have made a mistake when they raised their hands...

it's possible, but lets be real here, what is more likely in this hypothetical scenario? that 20 people made a mistake and there is no tax evasion, or that out of 100 that cheated on their taxes only 20 raised their hands?
__________________
Custom Software Development, email: woj#at#wojfun#.#com to discuss details or skype: wojl2000 or gchat: wojfun or telegram: wojl2000
Affiliate program tools: Hosted Galleries Manager Banner Manager Video Manager
Wordpress Affiliate Plugin Pic/Movie of the Day Fansign Generator Zip Manager
woj is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote