Associations in Probability-Based and Nonprobability Online Panels: Evidence on Bivariate and Multivariate Analyses
Carina Cornesse, Tobias Rettig, Annelies Blom
University of Mannheim, Germany
Relevance & Research Question:
A number of studies have shown that probability-based surveys lead to more accurate univariate estimates than nonprobability surveys. However, some researchers claim that, while they do not produce accurate univariate estimates, nonprobability surveys are “fit for purpose” regarding bivariate and multivariate analyses. In this study, we therefore assess to what extent bivariate and multivariate survey estimates from probability-based and nonprobability online panels lead to accurate conclusions.
Methods & Data:
We answer our research question using data from a large-scale comparison study in which three waves of data collection were commissioned in parallel to two academic probability-based online panels and eight commercial nonprobability online panels in Germany. For each of the online panels, we calculate bivariate associations and multivariate models and compare the results to gold-standard benchmarks, examining whether the strength, direction, and statistical significance of the coefficients accurately reflects the expected outcomes.
Results:
Regarding key substantive political scientific variables (voter turnout and voting for the main German conservative party (CDU)), we find that the probability-based online panels in our study generally lead to more accurate associations than the nonprobability online panels. Unlike the probability-based online panels, the nonprobability online panels produce a number of significant associations that are contrary to expected outcomes (e.g., that older people are significantly less likely to vote for the main conservative party). Furthermore, while the two probability-based online panels in our study produce similar findings, there is a lot of variability in the results from the nonprobability online panels and none of them consistently outperform the others.
Added Value:
While a number of studies have assessed the accuracy of univariate estimates in probability-based and nonprobability online panels, our study is one of the few that examine bivariate associations and multivariate models. Our preliminary results do not support the claim that nonprobability surveys are fit for the purpose of bivariate and multivariate analyses.
Semiautomatic dictionary-based classification of environment tweets by topic
Michela Cameletti2, Stephan Schlosser1, Daniele Toninelli2, Silvia Fabris2
1University of Göttingen, Germany; 2University of Bergamo, Italy
Relevance & Research Question:
In the era of social media, the huge availability of digital data allows to develop several types of research in a wide range of fields. Such data is characterized by several advantages: reduced collection costs, short retrieval times and production of almost real-time outputs. At the same time, this data is unstructured and unclassified in terms of content. This study aims to develop an efficient way to filter and analyze tweets by means of sentiment related to a specific topic.
Methods & Data:
We developed a semiautomatic unsupervised dictionary-based method to filter tweets related to a specific topic (environment, in our study). Starting from the tweets sent by a selection of Official Social Accounts linked with this topic, a list of keywords, bigrams and trigrams is identified in order to set up a topic-oriented dictionary. We test the performance of our method by applying the dictionary to more than 54 million tweets posted in Great Britain between January and May 2019. Since the analyzed tweets are geolocalized due to the method of data collection, we also analyze the spatial variability of the sentiment for this topic across the country sub-areas.
Results:
All the performance indexes considered denote that our semiautomatic dictionary-based approach is able to filter tweets linked to the topic of interest. Despite the short time window considered, we highlight a growing inclination to environment in any area of Great Britain. Nevertheless, the spatial analysis found a lack of spatial correlation (probably because environment is a broad argument, but also strongly affected by local factors).
Added Value:
Our method is able to build (and to periodically update) a dictionary useful to select tweets about a specific topic. Starting from this, we classify selected tweets and we apply a spatial sentiment analysis. Focusing on environment, our method of setting up a dictionary and of selecting tweets by topic leaded to interesting results. Thus, it could be reused in the future as a starting point for a wide variety of analysis, also on other topics and for other social phenomena.
What is the measurement quality of questions on environmental attitudes and supernatural beliefs in the GESIS Panel?
Hannah Schwarz1, Wiebke Weber1, Isabella Minderop2, Bernd Weiß2
1Pompeu Fabra University (UPF), Spain; 2GESIS Leibniz Institute for the Social Sciences
Relevance & Research Question:
The measurement quality of survey questions, defined as the product of validity and reliability, indicates how well a latent concept is measured by a question. Measurement quality also needs to be estimated in order to correct for measurement error. Multitrait-Multimethod (MTMM) experiments allow us to do this. Our research aims to determine the measurement quality resulting from variations in formal characteristics such as number of scale points and partial versus full labelling of scale points, for the given questions in web mode.
Methods & Data:
We conducted two MTMM experiments on the mixed-mode (majority web) GESIS panel, one dealing with environmental attitudes and the other with supernatural beliefs. We estimate the quality of three different response scales for each of the experiments by means of structural equation modelling.
Results:
We do not have results yet. Based on evidence from face-to-face surveys, we would expect that, in both cases, a continuous scale with fixed reference points will lead to the highest measurement quality among the three, that a partially labelled 11-point scale will result in the second highest measurement quality and that a fully labelled 7-point scale will yield the lowest measurement quality.
Added Value:
Quite some research exists on MTMM experiments in more traditional modes, especially face-to-face. However, only few MTMM experiments in web mode have been conducted and analyzed so far.
Open Lab: a web application for conducting and sharing online-experiments
Yury Shevchenko1, Felix Henninger2
1University of Konstanz, Germany; 2University of Koblenz-Landau, Germany
Relevance & Research Question:
Online experiments have become a popular way of collection data in social sciences. However, high technical hurdles in setting up a server prevent a researcher from starting an online study. On the other hand, proprietary software restricts the researcher’s freedom to customize or share the code. We present Open Lab – the server-side application that makes online data collection simple and flexible. Open Lab is not dedicated to one particular study, but is a hub where online studies can be easily carried out.
Methods & Data:
Available online at https://open-lab.online, the application offers a fast, secure and transparent way to deploy a study. It takes care of uploading experiment scripts, changing test parameters, managing the participants’ database and aggregating the study results. Open Lab is integrated with the lab.js experiment builder (https://lab.js.org/), which enables the creation of new studies from scratch or the use of templates. The lab.js study can be directly uploaded to Open Lab and is ready to run. Integration with the Open Science Framework allows researchers to automatically store the collected data in an OSF project.
Results:
At the conference, we will present the main features of the web application together with results of empirical studies conducted with Open Lab.
Added Value:
Open Lab enables interdisciplinary projects where behavioral scientists work together, and participants not only play a role of passive subjects, but also learn about the science, talk to a researcher or even propose and implement new versions of the task.
Using Nonprobability Web Surveys As Informative Priors in Bayesian Inference
Joseph Sakshaug1,2,3
1Institute for Employment Research, Germany; 2Ludwig Maximilian University of Munich, Germany; 3University of Mannheim, Germany
Relevance & Research Question: Survey data collection costs have risen to a point where many survey researchers and polling companies are abandoning large, expensive probability-based samples in favour of less expensive nonprobability samples. The empirical literature suggests this strategy may be suboptimal for multiple reasons, amongst them probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. However, nonprobability samples are often preferred due to convenience and cost effectiveness.
Methods & Data: Instead of forgoing probability sampling entirely, we propose a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses within a Bayesian inferential framework. By using simulated data, we evaluate supplementing inferences based on small probability samples with prior distributions derived from nonprobability data. The method is also illustrated with actual probability and nonprobability survey data
Results: We demonstrate that informative priors based on nonprobability data can lead to reductions in variances and mean-squared errors for linear model coefficients.
Added Value: A summary of these findings, their implications for survey practice, and possible research extensions will be provided in conclusion.
|