Assessment of Discoverability Metrics for Harmful Content

Many stakeholders are interested in tracking prevalence metrics of harmful content on social media platforms. TrustLab tracks several metrics and has produced a report for the European Commission's Code of Practice. One of TrustLab's prevalence metrics, which they refer to as "discoverability," is calculated by simulating a set of user searches, classifying the results as harmful or not, and reporting the proportion of harmful results.

At TrustLab's request, the University of Michigan Center for Social Media Responsibility (CSMR) has identified key concerns and considerations in producing and interpreting any discoverability metric, and some possible approaches for addressing these concerns. There is a family of possible discoverability metrics, each based on alternative design choices. This report analyzes the impacts of design alternatives on two key principles from measurement theory, validity and reliability, alongside a third principle, "robustness to strategic actors."

- Validity: the accuracy of the metric – whether it correctly measures the thing that it is supposed to measure. For discoverability metrics in particular, a key element of validity is comparability – the extent to which comparisons across platforms, countries, and time periods are meaningful. For example, is harmful content more prevalent on X or YouTube? Is it more prevalent in Slovakia or Poland? Did the prevalence decline on YouTube in Poland since last quarter?
- Reliability: the consistency of the metric – a measure could be accurate on average but have a high degree of variability between repeated individual measurements. In that case, any particular measurement would have to be treated as unreliable.
- Robustness to strategic actors: whether, for example, a platform could manipulate or game the discoverability metric without changing what real users experience on the platform.

Read the assessment report

— Paul Resnick, Siqi Wu, and James Park