Blog:
HUMAN Protocol and the intricacies of bias
With the distinction between human and statistical bias highlighted in The basics of bias, this article will now examine the specific forms of bias that can adversely affect research. Bias is simply an imbalance in a dataset, and best thought of as error. It can manifest as poor voice recognition technology, or accidental racism; while different in their severity, the problem and solution are equally applicable to both. Briefly, statistical bias has to do with the results derived from a sample pool of respondents over- or underestimating the results that would be found if the entire parameter were sampled.
The error of bias can, to a large extent, be mitigated by providing volume which increases diversity in the sample datasets. However, even the largest datasets do not necessarily mitigate bias; the problem of respondents is only one fragment of the bias problem, which is, in fact, a systemic problem in the entire research process, from the sourcing of respondents (selection bias), to the procedure (measurement bias), and the interpretation of results (confounding). As we define each of these forms of bias, we examine the ways in which HUMAN Protocol offers real solutions to mitigate bias of all kinds throughout the research process.
At the highest level, selection bias indicates an error by which the tested population does not represent the population the researcher intends to study. There is an important distinction to be aware of here: selection bias does not relate to the idiosyncrasies in a study group, but rather to the error of that group not representing the group the researcher wishes to study.
When we are talking about AI products that understand the subtleties of the world they operate in, the target population cannot be too big. Accuracy improves with scale. The intention is to receive global feedback. The means, however, are often missing.
Often selection bias is not unconscious; resource limitations, rather than a wayward intention, determine what is practically feasible. Research institutions are currently so limited in resources that often the researchers themselves are used as data sources. ML PHD students are a minute niche, and a heavily biased sample of data labelers.
If you send a CAPTCHA to a website selling dentures, it is more likely your sample population self-selects as elderly. This is an example of sampling bias, a subsection of selection bias.
While selection bias is often defined as the result of non-random methods for curating respondents, randomness does not equate to representative data sets. Randomness is simply the best way of preventing selection bias: what is in fact meant by randomness is really the exclusion of extraneous variables introduced into the study group once the parameters have been set. Randomness is not simply a case of complete blindness; in fact, the opposite is true.
If a researcher wants to poll opinion about food in a restaurant, a random selection of the target population is required. The target population is ‘all people who have eaten at this restaurant’. To achieve a representative sample population, the researcher must omit non-random factors by including, for example, diners from breakfast, lunch, and dinner across every day of the week.
That said, a researcher wouldn’t wish for complete randomness if the area of study is opinions about food in a restaurant. The researcher would want only people who have eaten the food. So the selection is not random, but highly specified. Randomness refers to there being no extra variable unaccounted for, such as the mistake of only surveying people who go into the restaurant on Tuesdays. Selecting a specific day would make it non-random, which is a problem if the purpose is to measure overall sentiment about the food in general.
When we are talking about AI products that understand the subtleties of the world they operate in, the target population cannot be too big. Accuracy improves with scale. The intention is to receive global feedback. The means, however, are often missing.
What is required is both randomization and precision. Precision in the targeting of the parameter population; randomization thereafter in the selection of the corresponding pool. HUMAN Protocol provides the balance of both options.
Measurement bias in the fields of ML can dissemble into many categories. An example of this is interviewer bias, whereby interviewers skew the test results through their particular interview style; discrepancies between the groups are attributed to innate group dynamics and qualities rather than the difference in interviewer style.
Essentially, in this instance, the interviewer is the data scientist who frames the question differently from her colleague. With the same research brief of determining the likelihood of a cat in a given image, they send out for a million images to be labelled. One data scientist asks ‘Are there cats in this photo?’, and the other asks ‘Click all the squares containing a cat.’
The question itself has many presumptions and a subtle fingerprint of the scientist. The scientist has selected words – each choice reflects a position, an assumption, and will, inevitably, affect the outcome of the test.
If there are images without cats sent in a CAPTCHA, the question framed as ‘Are there cats in this photo?’ has a clear response. The responder is aware that the answer is binary and they can therefore select an answer, an image, in which there are no cats. If the question is ‘Click all the images containing cats,’ the parameter of ‘no cats’ is not accounted for; the question, while not forgoing the possibility of the image being without a cat, subtly suggests that there is a cat to be found, and that the test is measuring one’s ability to detect the cat. It is likely that the latter of the questions would return inaccurate responses, with responders eager to find a cat.
Whether technically a type of bias or not is up for debate, the bottom line is the same: confounding is when an unaccounted for variable influences the ‘cause and effect’ relationship being established. A confounding variable is correlated to cause and effect, but not necessarily causal.
For example, using a grammar labeling tool, a researcher observes an association between an increase in the usage of the US English ‘soccer’ as opposed to the UK English ‘football’ with an increase in the spelling of the US English ‘flavor’ as opposed to the UK ‘flavour’. The result would be an incorrect presumption of causality, whereby an increase in usage of one word causes an increase in usage of the other.
Are they causal? No. The confounding factor is nationality, education style, or system, or the audience that is being written for.
For the latest updates on HUMAN Protocol, follow us on Twitter or join our community Telegram channel.
Legal Disclaimer
The HUMAN Protocol Foundation makes no representation, warranty, or undertaking, express or implied, as to the accuracy, reliability, completeness, or reasonableness of the information contained here. Any assumptions, opinions, and estimations expressed constitute the HUMAN Protocol Foundation’s judgment as of the time of publishing and are subject to change without notice. Any projection contained within the information presented here is based on a number of assumptions, and there can be no guarantee that any projected outcomes will be achieved.