We used publicly available, anonymized, and aggregated national-level data from Google’s Symptom Search Dataset (SSD), which reports the relative frequency of Internet searches for 420 signs, symptoms, and health conditions with well-documented privacy protections.31. For comparison, we used the following data: (1) from the National Syndromic Surveillance Program (NSSP) of the Centers for Disease Control and Prevention (CDC), which tracks emergency department (ED) visits for various health conditions in 48 US states;6 and (2) the US Census Bureau’s Household Pulse Survey (HPS), which assesses the social and economic impact of the pandemic.7. The main characteristics of these data sets are summarized in Table 1.
Information sources
SSD is open to the public30 and provides a daily and weekly time series of the relative volume of searches in the United States in English or Spanish for common symptoms and conditions. Data is available at the national, state, and county levels in the United States and five other English-speaking countries. Search queries for each trait are collected and anonymized using differential privacy32and then normalized by the total search volume in that region as detailed elsewhere31.
SSD was created using Google’s web search tools, which map queries to the Knowledge Graph33,34 objects by continuously learning the relationship between words in user queries and the objects depicted in the web pages viewed after those queries. The 420 symptoms and conditions included in the SSD represent the most searched entities (by query volume). Each entity (symptom or condition) is associated with tens or hundreds of thousands of individual queries submitted by Google users on desktop computers or mobile devices. Quotation marks and capital letters are ignored in queries, and spelling errors are automatically corrected. Includes sample queries [lexapro], [depression test]or [signs of depression] for depression; [trazodone], [agoraphobia] or [panic attack] for anxiety; and [I want to die], [how to die] and [I want to kill myself] for suicidal ideation.
For the current study, we focused on SSD searches for anxiety, depression, and suicidal ideation between January 1, 2018, and December 31, 2020. We selected these entities a priori because they represent commonly sought-after general conditions and thus their high relevance to population mental health. We also considered motion sickness searches as a potential negative control in part of our analyses.
We compared national-level, weekly data on Internet searches measured by the SSD with national-level data on ED visits reported by the NSSP. The NSSP is a CDC-led collaboration to collect, analyze, and share electronic health data at 48 of approximately 3,500 emergency departments, urgent and ambulatory care centers, inpatient health facilities, and laboratories (hereafter collectively referred to as ED facilities). states (except Hawaii and Wyoming) and Washington DC6. These facilities account for approximately 70% of all US ED facilities. Data used in this analysis were previously reported by Holland et al. (2021)20 and reused in the current study with the permission of the authors.
We focused on two variables reported by Holland et al. (2021)20: (1) national numbers of weekly ED visits for mental health conditions associated with natural or man-made disasters, such as stress, anxiety, symptoms consistent with acute stress disorder or post-traumatic stress disorder, and panic, and (2) national weekly suicide attempts. The data set includes weekly ED visit counts from December 30, 2018, to October 10, 2020.
We additionally compared Internet search data with HPS data. The HPS is a national survey designed to measure the effects of the COVID-19 pandemic on the economic, physical, and mental health of American households.7. Phase 1 of the survey will run from 23 April 2020 to 21 July 2020, Phase 2 will run from 19 August 2020 to 26 October 2020, and Phase 3 will run from 28 October 2020 to took place between 2021 and March 29, 2021. survey is still ongoing, in the current analysis we used the HPS data of these three phases35.
Questions about symptoms of anxiety and depression were included in all phases of the survey, and questions about mental health were included in phases 2 and 3. Questions about symptoms of anxiety and depression included 4 items with modified versions of two items. Patient Health Questionnaire (PHQ-2) and two-item Generalized Anxiety Disorder (GAD-2) questionnaires. Responses for each question covered the past 7 days and were coded as follows: not at all = 0, a few days = 1, more than half the day = 2, and almost every day = 3. Scores were obtained for anxiety and depression. summing the responses to the two questions for each construct. The percentage of respondents scoring 3 or higher on these aggregated scores is used in the analysis of the survey results. Items indexing mental health care assessed the percentage of adults who reported taking a prescription drug, receiving counseling or therapy from a mental health professional, or needing but not receiving counseling or therapy from a mental health professional in the past 4 weeks (ie, unmet need). ).
Statistical analyses
We first used graphical approaches and descriptive statistics to identify temporal patterns in Internet searches associated with anxiety, depression, and suicidal ideation. We then fit a generalized linear model with a log link function to quantify the effects on relative search volumes associated with the week of Thanksgiving and Christmas and the onset of the COVID-19 pandemic (defined as the first 4 weeks of March 2020). , adjustments for calendar year and season.
Second, we quantified the change in search volumes related to the pandemic by calculating the percentage change in search frequency for each topic for the period January 1, 2020, to December 31, 2020, compared to the same week 1 year ago. We estimated the change in the same way. On proportion of ED visits for mental health symptoms and suicide attempts from NSSP.
Third, we calculated pairwise Pearson correlation coefficients between contemporaneous measures derived from SSD, NSSP, and HPS. Results were not significantly different when using Spearman rather than Pearson correlation coefficients. We additionally used scatterplots to further visualize the association between specific marker pairs. In sensitivity analyses, we considered the potential for a 1- or 2-week lag between a change in search volumes and a change in rates of ED visits for mental health or suicidality. Specifically, we used a generalized linear model with a log link function to quantify the relative change in ED visits associated with same-week, prior-week, and 2-week-prior searches. We adapt separate models for each search concept. All analyzes were performed using R (version: 4.0.2). The code to replicate these analyzes is publicly available via GitHub at https://github.com/anthonysun95/Google_SSD_and_Mental_Health.