As a way to standardize ratings received from multiple fact-checkers on claims featured in our COVIDGlobal and COVIDGeo Misinformation Dashboards, we adopted a simplified four-category fact-checking rating system with four mutually exclusive categories/labels:
- “True” – a fact-checker reviewed a claim and deemed that it’s truthful in its entirety;
- “False” – a fact-checker reviewed a claim and deemed that it’s false in its entirety;
- “Misleading” – a fact-checker reviewed a claim and deemed that a claim contains some level of falsehood (e.g., partially false, questionable, or misleading);
- “Unproven” – a fact-checker reviewed a claim and deemed that a claim is not possible to prove currently either due to the lack of scientific knowledge or other reasons (e.g., unproven, unsupported, or unfounded).
Below is a brief description behind our decision to use this more simplified rating system.
Fact-checking rating systems give readers the ability to quickly discern whether and to what extent a claim is truthful. However, to-date, the development of fact-checking rating systems has suffered from the ‘tower of babel’ effect which creates confusion when words (or symbols) used have similar but different meanings to different audiences and in different contexts. Individual fact-checking groups often use different terminologies and systems to categorize claims across a spectrum that ranges from true, accurate, mostly true, to false, fake, misleading, partisan and “pants on fire.” Some use a numerical scale, while others use textual ratings like “misleading,” or “half-true.” Some fact-checkers use colour-coded ratings from red to green, to indicate varying levels of truth, or employ visual systems with a number of crows, bloodhounds, Pinocchios and the like. Table 1 summarizes common types of rating systems used by various fact-checkers. Notably, some fact-checkers do not use rating systems at all.
Table 1: Rating systems of COVID-19 fact-checkers in Fact-Checker dataset
|Rating System Type||Definition||Example|
|Numerical System||A rating system based on a numerical scale.||For example, 1-5 or 1-10, etc.|
|Colour-coded System||A rating system based on a series or groupings of colors to denote different levels of veracity. Examples include: red=false, orange=misleading, green=true|
|Pictorial System||A rating system based on pictures or pictograms that denote veracity. Examples include: a series of Pinocchios, pants-on-fire or crows, etc.|
Source: The Washington Post
|Textual System||A rating system based on words that denote veracity. Examples include: True, Mostly True, Misleading, Mostly False, False, etc.||True; Mostly True; Mostly False; Misleading; More Context Needed/ Wrong Context; Insufficient Evidence Analysis; False |
Source: Dubawa, Nigeria
Based on our Fact-Checkers dataset, at least 113 or 51 percent of COVID-19 fact-checkers use a rating system to convey the truthfulness of a claim to the public (see Figure 1; 223 total – 110 without an explicit use or available description of their rating system).
Figure 1: Fact-checkers organized by rating system
When we examined fact-checkers that use a textual rating system with clearly defined types of categories for labeling claims (see Figure 2), the most common total number of categories used across different rating systems is 5, which is used by 25 or 11 percent of the fact-checkers; followed by 6 and 4 item systems (used by 9 and 8 percent of the fact-checkers correspondingly). 122 fact-checkers, marked as N/A in the chart below, did not use or did not have a publicly available rating system.
Figure 2: Number of numerical or textual categories used in the rating systems.
The need to use multiple categories to label a claim often arises when the truthfulness of a claim cannot be determined by a binary label of true or false. As a result, depending on a rating system, there may be one or more other “in-between” categories that are being used to describe claims that are neither completely true nor false (see Figure 3).
Figure 3: Some of the categories used in between true and false.
In general, we find that “in-between” categories or labels (see Figure 3 above) can broadly be organized into three groups:
- The first group of categories is used to flag the level of falsehood (e.g., partially false, questionable, or misleading).
- The second group contains categories that describe a technique or strategy used to mislead (e.g., misattribution or decontextualization).
- The third group of categories refer to claims that are not possible to prove currently either due to the lack of scientific knowledge or other reasons (e.g., unproven, unsupported, or unfounded).
While categories in Group 2 refer to a technique or strategy used to mislead, they also imply that a claim is not completely truthful. Thus, there is a strong overlap between categories in Groups 1 and 2. Following this observation, one approach to standardizing rating systems is to use the same label such as “Misleading” when referring to claims in Groups 1 and 2.
Since categories in Group 3 are not meant to communicate the level of falsehood, but the fact that a fact-checker is unable to verify a claim at this time, we use it as a standalone category with a label called “Unproven.”
To avoid information loss from such standardization, we include a direct link to the Review Article provided by a fact-checker (see the “Review Article” column in the COVIDGlobal Misinformation Dashboard as shown in Figure 4).
Figure 4: COVID-19 Claims organized into 4-item rating system as appeared in COVIDGlobal Misinformation Dashboard
In sum, we propose that most of the claims can be labelled using just four mutually exclusive categories/labels: “True,” “False,” “Misleading” and “Unproven.”
As an initial validation of our approach, we manually reviewed textual rating categories assigned to 6,593 fact-checked COVID-related claims collected via Google Fact Check Tools API between January 22 and December 12, 2020. Since 56 percent of claims and rating categories were not in English, we used Google Translate API to automatically translate non-English categories to English. The main objective was to see if we can map the original rating categories to one of the four categories as proposed earlier (“True,” “False,” “Misleading” and “Unproven”).
Based on our initial review, we were successfully able to assign the original textual ratings to one of the four categories. Figure 5 visually demonstrates the outcome of our mapping procedure. After removing duplicate rating labels, there were 99 unique labels assigned to the False category, 51 to Misleading, 15 to True, and 16 to Unproven.
Figure 5: Visualization representation of mapping process between the original rating categories and the proposed four categories.