Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
GOR Thesis Award 2021 Competition
Time:
Thursday, 09/Sept/2021:
11:30 - 12:30

Session Chair: Olaf Wenzel, Wenzel Marktforschung, Germany

sponsored by Tivian

Presentations

Generalized Zero and Few Shot Transfer for Facial Forgery Detection

Shivangi Aneja

Technical University of Munich, Germany

Relevance & Research Question:

With recent developments in computer graphics and deep learning, it is now possible to create high-quality fake videos that look extremely realistic. Over the last two years, there has been tremendous progress in the creation of these altered videos, especially Deepfakes. This has several benign applications in computer graphics, but on the other hand, this can also have dangerous implications on society, such as in political propaganda and public shaming. Especially, the fake videos of politicians can be used to spread misinformation. This calls for urgency to build a reliable fake video detector. Different manipulation methods come out every day. So, even if we build a reliable detector to detect fake videos generated from one manipulation method, the question still remains how successfully it will detect videos forged with a different and unseen manipulation method. This thesis is a step towards this direction. Taking advantage of available fake video creation methods and using as few images as possible from a new and unseen manipulation method, the aim is to build a universal detector that detects most of the fraudulent videos surfacing the internet to the best of its capability.

Methods & Data:

We begin the thesis by exploring the relationship between different computer-graphics and learning-based manipulation methods, i.e., we evaluate how well a model trained with one manipulation method generalizes to a different and unseen manipulation method. We then investigate how to boost the performance for a different manipulation method or dataset in case of limited data availability. For this, we explored a variety of transfer learning approaches and proposed a new transfer learning technique and an augmentation strategy. This proposed technique was found to be surprisingly effective in detecting facial manipulations in zero-shot (when the model has no knowledge about new videos) and few-shot (when the model has seen very few frames from the new videos) settings.

We used the standard classification backbone architecture (ResNet) for all our experiments and evaluated different pointwise metric-based domain transfer methods like MMD, Deep coral, Ccsa, D-sne. Since none of these methods worked well on unseen videos and datasets, we proposed a distribution-based approach where we model each of our classes (real or fake) as a component of mixture model and our model learns these distribution components, which we enforce with a loss function based on Wasserstein distance. Inspired by our insights, we also propose a simple data augmentation strategy that spatially mixes up images from the same classes but different domains. The proposed loss function and augmentation cumulatively perform better compared to existing state-of-the-art supervised methods as well as transfer learning methods. We benchmarked our results on several face forgery datasets like FaceForensics++, Google DF, AIF and even evaluated our results on in-the-wild deepfake videos (Dessa dataset).

The FaceForensics++ dataset provides fake videos created with 4 different manipulation techniques including Face2Face, FaceSwap, Deepfakes, and Neural Textures and corresponding real videos. The Google DF dataset provides fake videos generated with high-quality deepfake videos. The AIF dataset is the most challenging dataset that was donated to authors by AI Foundation which consists of deepfake videos generated in very bad illumination conditions and cluttered environments. And finally, we used the Dessa dataset which consists of high-quality deepfake videos downloaded from youtube.

Results :

We compare our results with current state-of-the-art transfer learning methods, and the experimental evaluation suggests that our approach consistently outperforms these methods. We also provide a thorough analysis of transferability among different manipulation methods, which provides a clear picture of which methods are more closely related to each other and exhibit a good transfer. We notice that learning + graphics-based methods transfer relatively well within each other, however purely graphics-based methods do not exhibit transfer. Additionally, we also compare transfer on different datasets to explore out-of-distribution generalization. Overall, we achieve a large 10% improvement (64% to 74%) compared to baseline across dataset generalization where the model has never seen the videos (zero-shot) and 7% improvement (78% to 85%) for few-shot transfer for in-the-wild deepfake videos.

Added Value :

The standard supervised classification models build by researchers detect fakes really well on datasets that they are trained on, however fail to generalize to unseen videos and datasets that the model has not seen before, commonly known as out-of-domain generalization. With this thesis, we combat these failure cases and were able to successfully build an unsupervised algorithm, where our model has no or very little knowledge about the unseen datasets and still is able to generalize much better compared to standard supervised methods. Our proposed technique generalizes better compared to other state-of-the-art methods and hence generates more reliable predictions, thus can be deployed to detect in-the-wild videos on social media and video sharing platforms. The proposed method is novel and effective, i.e. the thesis proposed a new loss function based on learning the class distributions that empirically generalizes much better compared to other loss functions. The added spatial augmentation further boosts the performance of our model by 2-3%. The proposed technique is not only limited to faces but can also be applied to various other domains where the datasets are diverse and scarce.



How Does Broadband Supply Affect the Participation in Panel Surveys? An analysis of mode choice and panel attrition

Maikel Schwerdtfeger1,2

1GESIS - Leibniz-Institut für Sozialwissenschaften, Germany; 2University of Mannheim

Relevance & Research Question:

Over the last decades, online surveys became a crucial part of quantitative research in the social sciences. This development yielded coverage strategies such as implementing mixed-mode surveys and motivated many scientific studies to investigate coverage problems. From the perspective of the research on coverage, having a broadband connection often implies that people can participate in online surveys without any problems. In reality, the quality of the broadband supply can vary massively and thereby affect the online experience. Negative experiences lower the motivation to use online services and thus also reduce individual skills and preferences. Considering this, I expect that regional differences in broadband supply have a major impact on survey participation behavior, which leads me to the following research questions:

1st Research Question: How does the broadband supply affect the participation mode choice in a mixed-mode panel survey?

2nd Research Question: How does broadband supply determine attrition in panel surveys?

Methods & Data:

In order to investigate the effects of broadband supply on participation mode choice and panel attrition, I combine geospatial broadband data of the German “Breitbandatlas” and geocoded survey data of the recruitment interview and 16 waves of the mixed-mode GESIS Panel. The geospatial broadband data classifies 432 administrative districts in Germany into five ordinal categories according to their proportion of broadband supply with at least 50 Mbit/s, which is seen as a threshold value for sufficient data transmission.

To answer the first research question, I apply a binomial logistic regression model to estimate the odds of choosing the online participation mode based on broadband supply, internet familiarity, and further control variables. Besides broadband supply, I included internet familiarity as a substantially relevant independent variable based on previous research results in the field of participation mode choice.

Following the theoretical background (see 2.2. Mode choice), I expect a person deciding between online or offline participation in a recruitment interview to consider their last and most prominent internet experiences with a particular focus on their internet familiarity and their perceived waiting times. The waiting times are largely affected by the data transmission rate of the available broadband supply.

Consequently, I derive the following two hypotheses for participation mode choice in mixed-mode panel surveys that provide web-based and paper questionnaires:

1st Hypothesis: Having a more pronounced internet familiarity increases the probability of deciding for online participation in a mixed-mode panel.

2nd Hypothesis: Living in a region with better broadband supply increases the probability of deciding for online participation in a mixed-mode panel.

To answer the second research question, I apply a Cox regression model to estimate the hazard ratios of panel dropout based on broadband supply, perceived survey duration, and further control variables. Besides broadband supply, I considered perceived survey duration as substantially relevant based on previous research results in the field of panel attrition.

According to the theoretical background (see 2.3. Panel attrition), I expect a person in a panel survey to constantly evaluate their satisfaction and burden of participation, whereas the flow experience and the perceived expenditure of time are the crucial factors in the decision process. The flow experience is largely determined by the quality of the available broadband supply. Consequently, I derive the following two hypotheses for attrition in panel surveys:

3rd Hypothesis: Living in a region with better broadband supply decreases the risk of attrition in an online panel survey.

4th Hypothesis: Evaluating the survey duration as shorter decreases the risk of attrition in an online panel survey.

Results:

The results of the first analysis show that both living in a region with better broadband supply and having a higher internet familiarity increases the probability of choosing the online mode in a mixed-mode panel survey. However, the effect of internet familiarity is found to be substantially more powerful and stable.

The results of the second analysis show that a longer perceived survey duration increases the risk of panel dropout, whereas the effect of broadband supply is small, opposite to the hypothesis, and not significant.

For the interpretation of the results in the overall context, it must be noted that the classification of about 400 administrative districts in Germany into five groups with different proportions of sufficient broadband supply is not ideal for the purpose of this analysis. Despite this limitation, the weak effect of broadband supply in the first analysis suggests greater potential in this methodological approach. In the discussion section, I provide further details on this issue and an outlook for a follow-up study that can test the presented methodological approach with more precise broadband data.

Added Value:

The present study aims to expand methodological research in the context of online surveys in two different ways. First, the approach of combining geospatial data on broadband supply and survey data is a novelty in survey methodology. The advantage is that there is no need to ask additional questions about the quality of the internet connection, which reduces survey duration. Additionally, geospatial data is not affected by motivated or unintentional misreporting of respondents. This is particularly important in the case of information that is excessively biased by subjective perceptions or by misjudgments due to lack of knowledge or interest. Technical details on broadband supply are vulnerable to this kind of bias.

Second, analyzing response behavior in the context of available broadband supply allows to draw conclusions about whether participants with poor broadband supply still choose the online mode. And if so, whether they have a higher probability of panel attrition than panelists with better broadband supply. These conclusions can be used to develop targeting strategies that actively guide the participation mode choice based on the panelists' residence, thereby reducing the likelihood of panel attrition.



Voice in Online Interview Research

Aleksei Tiutchev

HTW Berlin, Germany

Relevance & Research Question: Recently, voice and speech technologies’ developments significantly improved, reaching a high accuracy of speech recognition for the English language. Among others, the technologies could also be applied in market research. In the last years, only a few studies addressed the possibility of using speech recognition in online market research. The thesis further investigates the possibility of incorporating speech recognition technology into online surveys in various languages in six continents. Research Question is “What is the impact of voice in global online interviewing by the example of several languages and countries regarding…

... technological capabilities of participants?

... willingness to participate?

... quality of voice answers?

... the respondents’ level of engagement?

... respondents’ satisfaction?”

Methods & Data: Based on the review of the current state of speech recognition and related literature, online questionnaires with voice input and text input in five languages (English, German, French, Russian, and Spanish) were created and distributed through the online panel to 19 countries. The questionnaires consisted of 40 questions with 14 open questions on various topics, which participants could answer either with text or with voice depending on the technical possibilities and willingness to participate in a voice study. In addition to the open questions, the surveys included the Kano Model questions to measure how the respondents perceive the possibility of answering the survey with voice, Net Promoter Score question, and others. The data were collected between September 3, 2020, and October 27, 2020, and 1958 completed questionnaires became the focus of the study. Out of the all completed surveys, 1000 were filled in with text input, whereas 958 were filled in with voice input. Collected data were analysed with IBM SPSS Statistics v.27.

Results: The results of the study demonstrated that the technological capabilities of the respondents to participate in the voice research varied from country to country. The highest number of browsers and devices that support voice input was observed in developing countries. Also, those countries had the highest number of participants who use smartphones to fill in the questionnaires. At the same time, in developed countries, due to the popularity of iOS devices, which did not support voice input, it was more challenging to conduct voice research. Even with technical possibilities, 43 per cent of respondents were still unwilling to grant access to their microphones. The answers collected through voice input were 1.8 longer comparing to the text input answers. At the same time, questions with voice input took on average two seconds more time to answer. Moreover, surveys with voice input had two times higher dropout rate. Participants with voice input were more satisfied with the surveys and showed a very high willingness to participate in voice studies again. Meanwhile, respondents’ technological capabilities to participate in voice surveys, dropout rates, response times, and quality of voice answers significantly differed depending on the country. Analysis of the Kano Model questions demonstrated the participants’ indifference to the possibility to answer the surveys with voice. Key Driver Analysis demonstrated that such categories as tech-savvy, early adopter or data security concerns did not influence respondents’ willingness to participate in voice research again. Meanwhile, the most important categories that influenced such decision were frequency of Internet usage and information seeker behaviour.

Added Value: The study results have partially confirmed previous research on speech recognition use in online questionnaires in regards to higher dropout rates and longer answers in terms of characters for answers received through voice input. At the same time, some results have contradicted previous studies, as the voice answers appeared to be longer in time compared to text input answers, thus not confirming the lower response burden of the voice input answers in online surveys. In addition to that, the results of the study have complemented the existing research and provided more information about the use of voice input in online surveys in different countries. The technology is still new and currently not all devices support such technologies, which makes the research more complicated, more expensive, and more time-consuming in the countries, where the number of not supporting devices is great. Starting from the technological possibilities of the voice questionnaires to dropout rates and amount of data received with voice input, everything varied significantly and notably depended on the geographical location of the study. Even though voice input in online surveys requires more effort and demands higher costs for participants’ recruitment, and the transcriptions are not perfect in terms of quality, especially in non-English languages, marketers and researchers of different industries might consider using voice input in their studies to receive extensive quality data through online questionnaires. This method may allow professionals to conduct research among people, who cannot or do not want overwise to participate in classical text surveys.