Online Data Generated by Voice Assistants – Data Collection and Analysis Using the Example of the Google Assistant
Ruhr-Universität Bochum, Germany
Relevance & Research Question:
Voice assistants play an increasing role in many people's everyday life. They can be found in cars, cell phones, smart speakers or watches and the fields of application are increasing. The use is seldom questioned, although meanwhile children grow up with it and the voice assistants are often people’s only "conversation partner" during one day. At the same time, a large amount of data is automatically generated and ongoing online logs in the form of conversations are created. The question arises as to how this mass of personal data can be used for sociological research and based on this, what the special features of communication between humans and voice assistants are.
Methods & Data:
The considered data consists of conversation logs from one person with the Google Assistant over a whole year. In addition, there is information about whereabouts, music the person listened to, shopping lists and many more aspects. The entries in the logs are provided with time markers and, in most cases, are stored with the recorded audio files. The logs can be downloaded as PDF-files from the user’s personal account. They are strictly anonymized and examined with a qualitative approach using conversation analysis.
Collecting and processing the data for sociological research requires much effort. The barriers to obtain the data are very high, but once it is available, it is of great value because it contains an enormous amount of information. The communication between human and voice assistant is also very special as it differs greatly from other forms of communication. It is characterized by an imperative way of speaking, paraphrases and constant repair mechanisms. The personalization of the voice assistant is also a key finding in the analysis of human-technology communication.
The study not only provides initial results and suggestions for approaches in the sociological handling of data from voice assistants. In addition, the findings on the specifics of communication between people and voice assistants are relevant as they are increasingly becoming part of households, work places, public space and thus changing social dynamics.
Eyes, Eyes, Baby: BYOD Smartphone Eye Tracking
1HTW Berlin, Germany; 2oculid UG (haftungsbeschränkt), Germany
Relevance & Research Question
The methodology of eye tracking is an established toolset typically used in a laboratory setting. The established technological toolset of infrared devices creates solid results, but makes it impossible to go into the remote testing field. Research accepted lower quality with webcams as a trade-off for the better access to more diverse research samples.
With the rise of smartphones as the preferred digital device, the methodology did not keep pace so far. App concept or mobile website tests still take place in a confined environment of established hardware that is in effect more suitable for eye tracking on bigger screens.
The approach presented brings the technology right into the hands of a research participant, who can use their own device’s camera while performing research tasks. The idea of BYOD (Bring your own device) is not new, but now it offers a high-tech toolset with exceptional quality.
Methods & Data
The presented approach offers an online based framework for the setup of studies for the less tech savvy researcher who can design, distribute and analyze a smartphone eye tracking test. The tool captures eye movements and touch interactions of a participant on the screen. The recording of thinking aloud helps to better understand the individual’s attention while performing research tasks. The entire interaction data is uploaded to the online platform and can be analyzed individually or in comparison.
The contribution shows the first experiments with the new eye tracking app from the Berlin based start-up Oculid, showing how to test advertising material, online task solving and a market research questionnaire being eye tracked and user behaviour.
The contribution will show the process of setting up a study, distribution and analysis using several experiments performed by external researchers using the tool. The entire process of set-up, field recruitment, connection to external tools and analysis will be explained with all their advantages, insights and challenges.
Smartphone usage does not only grow in quantity, but also the mobile camera technology is outperforming compared to non-mobile installations. The smartphone BYOD concept therefore may be more than just competitive.
Separating the wheat from the chaff: a combination of passive and declarative data to identify unreliable news media
1Respondi; 2Université Paris Nanterre; 3Toulouse School of Economics
Relevance & Research Question: Fake news website detection
Hype aside, fake news have grown massive and threaten the proper functioning of our democracies. The detection of fake news has thus become a major focus of research both to the social media industry and in the academia. While most approaches to the issue are aimed at classifying news items as fake or legit, one may also wish to look at the problem in terms of sources’ reliability, aiming at a classification of news emitters as trustworthy or deceptive. Our aim in the present research is to explore the prospects for an automated solution to this problem, by trying to predict and extend existing man-made classification of news sources in France.
Methods & Data: browsing data, random forest, NLP, deep learning
A sample of 3192 French panelists aged from 16 to 85 had their online browsing activity recorded for one year from November 2019 to October 2020. Additionally, a survey was conducted in May 2020 to gather information about their socio-demographics and degrees of beliefs in various fake news. On this basis, we are using four kinds of predictors: (1) websites’ traffic (mean time spent, etc.), (2) origins of traffic, (3) websites’ audience features, (4) types of articles read (clustering titles embeddings obtained via a fine-tuned BERT language model). Our predictive target is the binary adjusted version of Le Monde’s media classification where medias are either reliable or not (61% vs. 39% of the total sample).
Predictions are made with random forests algorithm and K-Fold cross-validated with K=10. Combining all sets of variables, we achieve 75.42% accuracy on the test set. The top 5 predictors are average age, number of pages viewed, total time spent on websites, category of preceding visits and panelists’ clusters based on degrees of belief in fake news.
Added Value: combining passive and declarative data
Combining passive and declarative data is a new standard for online research. In this study, we show the potential of such an approach to fake news detection, which is usually tackled with by means of brute force NLP or pattern based algorithms.
Measuring smartphone operating system versions in surveys: How to identify who has devices compatible with survey apps
1University of Essex, United Kingdom; 2University of Michigan, USA
Data collection using mobile apps relies on sample members having compatible smartphones, in terms of operating system (OS) and OS version. This potentially introduces selection bias. Measuring OS version is however difficult. In this paper we compare the quality of data on smartphone OS version collected with different methods. This research arose from analyses of the uptake of the coronavirus test & trace app in the UK, which requires smartphones running Android 6.0 and up or iOS 13.5 and up.
We use data from the Understanding Society COVID-19 study, a probability sample aged 16+ in the UK. The analyses are based on 10,563 web respondents who reported having an Android or iOS smartphone. We compare three ways of measuring smartphone OS version: i) using the user agent string (UAS), which captures characteristics of the device used to complete the survey, ii) asking respondents to report the make and model of their smartphone and matching that to an external database, and iii) asking respondents to report the OS version of their smartphone (by checking its settings, typing “whatismyos.com” into its browser, or scanning a QR code opening that webpage).
The UAS provided a smartphone OS version for just 58% of respondents, as the rest did not use a smartphone to complete the survey; 5% of the OS versions were too old to use the coronavirus app.
Matching the self-reported smartphone make and model to a database provided an OS version for 88% of respondents; only 2% did not answer the question, but 10% of answers could not be matched to the database; 10% of OS versions were too old for the app.
When asked for the OS version of their smartphone, 66% answered, 31% said don’t know and 3% refused or gave an incomplete answer; 15% reported an OS version that was too old.
Further analyses will examine the reasons respondents gave for not providing the OS version and cross-validate the three measures.
This study provides evidence on how to identify sample members who have smartphones with the required OS version for mobile app-based data collection.