Representativeness and Response Quality of Survey Data
1University of Mannheim, Germany; 2GESIS - Leibniz Institute for the Social Sciences
Relevance & Research Question:
In the social sciences, research is often based on findings from survey data. Common research topics examine political behavior, societal attitudes and opinions, as well as personal values. Survey results shape societal debates and can have an impact on policy decisions. But however impactful the results might be, research and policy debates based on survey data rely on the assumption that the survey data are of high enough quality to be able to draw inferences from the data to a broader population. However, collecting high quality survey data is challenging. Common methodological issues include declining response rates, concerns about biases due to systematic misrepresentation of members of the target population, and measurement error.
That survey data can be wrong has recently been shown repeatedly in the area of election polling. Prominent in British news coverage, for example, were the mispredictions of many polls with regard to the 2015 general election. Most polls had predicted the Conservative Party to be tied with Labour. Yet the final election outcome was a clear win for the Conservatives. Similarly, most British polls predicted that the British public would vote to remain in the European Union in the 2016 referendum. Yet the outcome of the referendum was that Britain would leave the EU. Other examples of failed predictions from survey data include failure to predict voter turnout, income, and religious attendance.
Because results from survey data can be inaccurate and possibly lead to wrong predictions, it is necessary to ask whether survey data have the necessary quality to be able to draw valid inferences. This question is, however, often difficult to answer because multiple error sources can influence survey data quality. Among other factors, the quality of the survey data is influenced by survey design characteristics, such as the sampling method and the survey mode. In order to ensure that research findings from survey data are accurate, researchers need to keep the various error sources in mind and be able to detect them by applying a Total Survey Error perspective on overall survey data quality. With my four dissertation papers described below, I contribute to reaching this goal.
Methods, Data, and Results (by paper):
In the first paper, I synthesize the existing literature on measuring survey representativeness and I assess common associations between survey characteristics (such as the sampling type and mode of data collection) and representativeness (as measured using R-Indicators and benchmark comparisons). I find that probability-based samples, mixed-mode surveys, and other-than-Web mode surveys are more representative than nonprobability samples, single-mode surveys, and web surveys. In addition, I find that there is a positive association between representativeness and the response rate. I conclude that there is an association between survey characteristics and representativeness and these results are partly robust across two common representativeness measures. There is, however, a strong need for more primary research into the representativeness of different types of surveys.
In the second paper, I compare five common measures of survey representativeness and assess their informative value in the context of representativeness comparisons within as well as across two probability-based online panels in Germany. Both panels share a number of survey design characteristics but also differ in other aspects. I assess the informative value of each representativeness measure in our study (response rates, R-Indicators, Fractions of Missing Information, subgroup response rates, and benchmark comparisons). I find that all five measures have advantages and disadvantages and they all shed light on different aspects of survey representativeness. Therefore, the extent to which these representativeness measures lend themselves to comparative analyses depends on the purpose of the investigation. I conclude from this study that for survey comparative representativeness analyses it is advisable to apply at least one measure at the aggregate level (response rates or, preferably, R-Indicators) in addition to at least one measure at the variable or category level (for example, subgroup response rates or benchmark comparisons) to obtain a comprehensive picture.
In the third paper, I take a closer look at the value of commonly used types of auxiliary data. I examine the utility of different sources of auxiliary data (sampling frame data, interviewer observations, and micro-geographic area data) for modeling survey response in a probability-based online panel in Germany. I explore whether auxiliary data are systematically missing by survey response. In addition, I investigate the correlations of the auxiliary data with survey response as well as the predictive power and the significance of coefficients of the auxiliary data in survey response models. I find that all of these data have disadvantages (for example scarcity, missing values, transparency problems, or high levels of aggregation) and none of them predict survey response to any substantial degree. I conclude that more research into the quality and predictive power of similar and other types of auxiliary data is needed to allow meaningful application of auxiliary data in survey practice, as for example in measuring representativeness, monitoring fieldwork, nonresponse adjustment, or conducting responsive design surveys.
In the last paper of my dissertation, I shift the methodological focus to the measurement part of the Total Survey Error framework. In this paper, I investigate response quality in a study of seven nonprobability online panels and three probability-based online panels. In the analysis, I apply three response quality indicators: straight-lining in grid questions, item nonresponse, and midpoint selection in a visual design experiment. I find that there is significantly more straight-lining in the nonprobability online panels than in the probability-based online panels. However, I find no systematic pattern indicating that response quality is lower in nonprobability online panels than in probability-based online panels with regard to item nonresponse and midpoint selection. I conclude that there is a difference between nonprobability online panels and probability-based online panels in response quality on one out of three satisficing indicators.
The findings from this dissertation lead to the conclusion that great care has to be put into measuring and ensuring high survey data quality and more research is needed to fully understand how high representativeness and response quality can be reached. To undertake this research is imperative, because if survey data quality is compromised, research findings based on the data can be misleading.
Multilevel Modeling for Data Streams with Dependent Observations
1University of Liège, Belgium; 2Tilburg University
In the last decade, technological innovations have been rapidly changing how we study social phenomena. Instead of mailing questionnaires (on paper) to respondents, questionnaires are now often web-based; and instead of diary studies, where people have to write down what they did during the day, collecting data using Experience Sampling (ES, Barrett & Barrett, 2001; Trull & Ebner-Priemer, 2009) techniques, data can be collected throughout the day on what people are doing at that time. Using these digital approaches, it has become cheaper and faster to collect data from many persons at the same time and to monitor these persons over time. As a result, these technological innovations have led to an increase in digital data, which are collected on a large scale.
Analyzing these data might be challenging, because storing the data requires a large computer memory. Additionally, these streams of data complicate the analyses even further, because the analyses often have to be redone when new data enter to remain up to date.
When analyzing data streams, it might be necessary to act upon the data in real time: warn patients to take their medication, or give people an extra nudge to respond to the questionnaire. Failing to act in real time might result in deteriorating the patient’s health due to lack of medication, or a respondent failing to answer the questionnaire in time. These two examples illustrate that in many situations failing to analyze the data in real time makes the analysis rather ineffective.
Besides collecting data more efficiently, these developments have also created new opportunities to study individuals’ behavior. Using ES, respondents are asked to fill out a questionnaire about their current feelings instead of recalling their feelings from memory. ES commonly uses a smartphone application to alert respondents at random intervals to answer the questionnaire. ES has become a common method to collect data in social science (Hamaker & Wichers, 2017) and, even though rarely analyzed as such, the method does give rise to a data stream.
Especially when data enter rapidly, the demand for more computational power to analyze the data in real time and the memory capacity to store the data increases continuously. Even though computational power and memory capacity have grown substantially over the last decades, obtaining up-to-date predictions in a data stream is still a challenge. Due to the influx of data points, traditional methods which revisit all data to update the predictions when new data enter are bound to become too slow to be useful in a data stream.
In Ch. 2, multiple approaches for analyzing data streams are discussed, though the main focus is on online learning. Online learning refers to an updating method where parameter estimates are updated while the data enter, without revisiting older data. In this chapter, the standard computations of several common models for independent observations are adapted such that these models could be computed online. These online computations are illustrated with R code, e.g., to compute linear regression online. For more complex models that do not have simple (closed-form) computations, Stochastic Gradient Decent is introduced. This method approximates the solution (e.g., the Maximum Likelihood solution), a data point at a time.
Ch. 2 focuses on data streams consisting of independent observations, however, in data streams, the same individuals are observed repeatedly over time. These repeated measures result in dependencies between the data from the same individuals. In the following chapter, 4 methods that deal with dependent observations are developed. These 4 methods combine the observations of an individual with the data of all the other individuals, to obtain more accurate predictions than when using only the individual’s observations. However, fitting a model that accounts for both nested observations and binary outcomes in a data stream can be computationally challenging. The presented methods are based on existing shrinkage factors. The prediction accuracy of the offline and online shrinkage factors is compared in a simulation study. While the existing methods differ in their prediction accuracy, the differences in accuracy between the online and the traditional shrinkage factors are small.
Datasets with nested structures are typically analyzed using multilevel models. However, in the context of data streams, estimating multilevel models can be challenging: the algorithms used to fit multilevel models repeatedly revisit all data and, in the case that new data enter, have to redo this procedure to remain up to date. Ch. 4 presents an algorithm called the Streaming Expectation Maximization Approximation (SEMA) which fits random intercept models online. In a simulation study, we show that the prediction accuracy of SEMA is both competitive and much faster than traditional methods.
Ch. 5 provides an extension of the SEMA algorithm to allow online multilevel modeling with fixed and random effects. The performance of SEMA is illustrated in a simulation study and using empirical data where individuals’ weight was predicted in a
We developed methods to account for binary nested observations in a data stream, using the four (online) adapted shrinkage factor. These online shrinkage factors obtained equally accurate predictions as their traditional counterparts. In the thesis, we show that SEMA can compete with traditional multilevel model fitting procedures. On github is an R package to facilitate the use of multilevel models when analyzing data streams.
This thesis contributes to the literature by providing an introduction to data streams for social scientists, and developing new methods to analyze data streams. By introducing computationally-efficient methods to estimate well-known models, data streams become more accessible for social scientists. Secondly, the state-of-the-art methods currently used to analyze the data streams often do not account for nested observations (e.g., Neal & Hinton, 1998). In this thesis, computationally-efficient approaches to multilevel modeling are developed to account for the nested structure commonly found in data streams.
Barrett, L. F., & Barrett, D. J. (2001). An Introduction to Computerized Experience Sampling in Psychology. Social Science Computer Review, 19(2), 175–185
Hamaker, E. L., & Wichers, M. (2017). No time like the present. Current Directions in
Psychological Science, 26(1), 10-15
Neal, R., & Hinton, G. E. (1998). A View Of The Em Algorithm That Justifies Incremental, Sparse, And Other Variants. In Learning in graphical models (pp. 355–368)
Trull, T. J., & Ebner-Priemer, U. W. (2009). Using Experience Sampling Methods/Ecological Momentary Assessment (ESM/EMA) in Clinical Assessment and Clinical Research: Introduction to the Special Section. Psychological Assessment,
Recruitment strategies for a probability-based online panel: Effects of interview length, question sensitivity, incentives and interviewers
GESIS - Leibniz Institute for the Social Sciences, Germany
The thesis provides fundamental research in the field of probability-based online panel recruitment.
Probability-based online panels are widely discussed in the scientific community as an alternative to interviewer-administered studies of the general population (Blom et al., 2016; Bosnjak, Das, & Lynn, 2016; Hays, Liu, & Kapteyn, 2015). They are characterized by a multistep recruitment process (Callegaro & DiSogra, 2008) where future panelists go through several stages before becoming an active panel member (Vehovar, Lozar Manfreda, Zaletel, & Batagelj, 2002).
To date, there is little empirical evidence on online panel recruitment. The overall objective of this thesis is to identify ways to optimize the telephone recruitment process of a probability-based online panel in Germany and derive practical recommendations.
Based on the framework of survey participation (Groves & Couper, 1998), the four studies of this dissertation focus on several aspects of the recruitment process that researchers need to decide upon and have control about. In three survey experiments, the effect of varying survey features on the success of the recruitment process is analyzed. The experimental factors are the length of the recruitment interview, the inclusion of a sensitive question, and the amount of incentives. In addition, the role of interviewers as error source during the recruitment process is examined.
The analyses are based on data from the GESIS Online Panel Pilot - a methodological project with the aim of developing best practices for the recruitment and maintenance of a probability-based online panel in Germany.
The dissertation is written as a monograph; however, the four analytical chapters represent closed studies and include the respective literature. Chapter 1 to Chapter 3 represent the introductory part that provides an overview on the motivation of the dissertation (Chapter1), describes the framework of survey participation as the conceptual framework of the work and provides a review of the pertinent general literature (Chapter 2), and presents the data base of the analyses (Chapter 3). In Chapters 4, 5, and 6 the survey experiments are presented. To assess the quality of the recruitment process three indicators are used: (1) the proportion of respondents that are recruited for the online panel, (2) the proportion of respondents that participate in the online surveys, and 3) the selection bias that is introduced by experimental variations at the stage of recruitment and online participation.
In Chapter 4, the effect of the factor length of the recruitment interview, on the quality indicators is examined. Questionnaire length is one factor that is assumed to contribute to the respondent burden. The study tested two versions of the telephone recruitment interview: 3 minutes vs. 10 minutes duration. The analysis revealed that the shorter interview does not significantly increase the recruitment probability compared to the longer version. The sample composition of the respondents that were recruited into the panel and that of the resulting online panel was not affected by the experimental treatment.
In Chapter 5, the effect of the inclusion of a sensitive question was investigated. During the process of designing a recruitment interview, researchers are often concerned about including questions that are perceived as being sensitive by the respondents. The split-half experiment tested the effect of including a question about household income versus not including it in the recruitment interview. The analysis revealed no difference in the recruitment probability of the two experimental groups. However, respondents that refused to provide the income information had an almost 50% lower recruitment probability compared to the respondents that provided the income information.
In Chapter 6, the study investigated the effect of different amounts of a promised incentive for online participation. The study revealed that promising an incentive increases the recruitment probability. This finding is promising for panel recruitment strategies that are based on an RDD-sample without addresses available for sending a prepaid incentive. The analyses show an increasing recruitment and online participation probability with increasing incentive amount. The innovative part of the study was to test the effect of promising a bonus for loyal respondents. Adding a bonus for loyal respondents had a double positive effect: 1) it increased the participation probability, and 2) it increased the proportion of loyal respondents that participated in all online surveys.
The comparison of the sample composition across the incentive groups did not reveal systematic differences at the stage of recruitment and online participation. In contradiction to the assumption, I did not find a higher share of respondents that are usually underrepresented in surveys of the general population with increasing incentive.
In contrast to the survey experiments that focused on factors of the survey design, Chapter 7 focused on the interviewer as an additional factor that influences the recruitment success. The chapter is divided into two separate analysis parts. The aim of the first part was to quantify and explain interviewer variance on the propensity of recruiting respondents and compare the performance of the interviewers from two fieldwork agencies that conducted the recruitment interviews. The two agencies represented prototypes of agencies: 1) an academic agency with a strong focus on social research and a highly-qualified interviewer staff. 2) a market research agency with less experienced and less qualified interviewers. The analyses revealed major differences in the interviewer variance on panel recruitment between the two agencies. The variance in the ability to recruit respondents was smaller for the interviewers of the social research agency. In contradiction to the assumption, general work experience of the interviewers does not explain the differences in their recruitment abilities. In contrast, the survey-specific experience in terms of number of interviews conducted significantly adds to the explanation of recruitment propensity.
The second part focused on the research question whether interviewers differed in their ability to use the experimentally varied features of the survey (incentive amount, length of the recruitment interview). The analysis showed that the effect of experimentally varied survey features on recruitment is uniform across interviewers. This is a result that is highly desirable from a data quality perspective in standardized interviews.
The results of this dissertation are on high practical relevance and provide empirical evidence on the processes that contribute to the quality of the recruitment process of probability-based online panels. These results can guide researchers who plan to build online panels, as well as researchers who are designing additional experimental studies.