Conference Agenda

General Online Research 2019

B06: Social Media and Online Communities
Thursday, 07/Mar/2019:
5:00 - 6:00

Session Chair: René Schallner, NIM, Germany
Location: Room 158
TH Köln – University of Applied Sciences


Optimized Strategies for Enhancing the Territorial Coverage in Twitter Data Collection

Stephan Schlosser1, Michela Cameletti2, Daniele Toninelli2

1University of Göttingen, Germany; 2University of Bergamo, Italy

Relevance & Research Question: The use of social media as promising data source has become increasingly important in recent years. Social media data, such as tweets, do not only pave the way for new research possibilities, but also raise completely new methodological and substantial questions in a lot of research field (e.g., social sciences, statistics and so forth). This work aims at finding an efficient and optimized way of data collection. In particular, we compare different data collection strategies in collecting Twitter data (for example in order to enhance the territorial coverage of different geographical areas).

Methods & Data: For this purpose, we collected Twitter data among the whole United Kingdom for a period of 90 days, implementing three different parallel tweet collection strategies, set as follows: 1) the boarders of the 12 UK territorial regions (NUTS) were precisely mapped by means of a large number of medium-sized sub-areas (whereas big cities were covered by many smaller sub-areas); 2) the same borders were mapped as precisely as possible, adapting, at the same time, the size of the sub-areas to the actual population density. 3) A high amount of small and equally sized sub-areas was used in order to map NUTS, without considering the population density. In total, we collected more than 300 million tweets, out of which 1% includes geographical metadata (useful to check the accuracy of data collection’s geo-coordinates).

Results: The analysis of tweets including geographical metadata reveal that these tweets were actually posted in the expected regions. This leads to the conclusion that the same probably happens for tweets without geographical metadata. Moreover, the strategy of population density-adapted sub-areas has proven to cover the posted tweets in the most accurate way.

Added Value: Our findings indicate that, using our second collection strategy, tweets can be correctly assigned to territorial regions, such as cities or country units. Furthermore, we were able to identify an efficient and exhaustive strategy for collecting Twitter data that balances the territorial coverage and the need of dealing with a reasonably sized dataset .

Schlosser-Optimized Strategies for Enhancing the Territorial Coverage-188.pdf

Exploring Instagram Data: What’s in Instagram for Market Research and Social Sciences?

Yannick Rieder1, Simon Kühne2, Daniel Jörgens3

1Janssen-Cilag GmbH, Germany; 2Universität Bielefeld, Germany; 3KTH Royal Institute of Technology, Sweden

Relevance & Research Question: With 1 billion users posting around 100 million photos per day, Instagram has become one of the most relevant social media platforms. Many use Instagram to document their daily life events and share their photos immediately. Thus, Instagram allows for researching social interaction and social phenomena. However, due to Instagram’s restrictive data access policies, almost no research exists that makes use of this valuable data source.

Methods & Data: With access to the Instagram API, we collected over 300.000 posts (photos and accompanying texts) at various time frames in late 2017 and 2018. We focused on selected geographic areas such as Berlin. The data was analyzed using an explorative approach. We apply state-of-the-art text-mining techniques, face and object recognition algorithms and match the geocoded posts with structural data.

Results: Our preliminary results provide insights about the mechanisms of Instagram usage, e.g., most instagrammable places, photo aesthetics and content of posts. Furthermore we identified patterns of social behaviour: What are posting occasions and what influences the event of a post. The analysis will be completed early 2019.

Added Value: We exploit the possibilities of analyzing Instagram data for market research and the social sciences and illustrate examples of applicable research approaches. The application of a broad range of analysis techniques offers insights relating to their quality, costs, benefits and limitations. Finally, general limitations and challenges are discussed with a focus on the upcoming API restrictions recently announced by Instagram.

Rieder-Exploring Instagram Data-137.pdf

The keyboard is the key—Language cues in online dating

Dorothea C. Adler1, Maximilian T. P. Freiherr von Andrian-Werburg1, Frank Schwab1, Sascha Schwarz2, Benjamin P. Lange1

1Julius-Maximilians-Universität Würzburg, Germany; 2Bergische Universität Wuppertal

Relevance & Research Question

Online dating changes how we meet people (e.g., Koch et al., 2005). As gender-typical communication styles are transmitted to cmc (e.g., Guiller & Durndell, 2007) and written cues are used for personality assessments (e.g., Heisler & Crabill, 2006), the choice of words could be particularly important in online dating.

RQ 1: Can the personality of a conversation partner be detected based on a chat?

RQ 2: Are linguistic features linked both to a person’s personality and the receiver’s assessment?

Methods & Data

Two two-step experiments were conducted.

1) Participants completed an online questionnaire assessing several personality traits (study 1: N=189, e.g., Big Five; study 2: N=610, e.g., IQ).

2) Up to 6 participants were invited to the lab (study 1: N=58, study 2: N=116). They chatted anonymously 8 (study 1) / 10 minutes (study 2) with opposite-gender participants and assessed their personality afterwards.

Analyses. The sender’s personality was correlated with the respective assessments. Further, sender’s personality and the personality assessments were correlated with linguistic markers (LIWC; Pennebaker et al., 2007).


Study 1. People assessed sociosexual desire (SD) (r=.33) and IQ (r=.29) correctly. SD correlated with commata (r=-.34), colons (r=-.33), and parentheses (r=-.33). Based on commata (r=-.27) recipients detected SD. A person’s IQ correlated with word count (r=.31), apostrophes (r=.27), and certain topics (anger, r=.28, social aspects, r=-.31, other references, r=-.29, and humans r=-.33). Receivers guessed IQ through words referring to present, r=-.26, achievement, r=.28, money, r=-.30, metaphors, r=.28, religion, r=.31, and death, r=-.28; all ps < .05.

Study 2. People detected openness (r=.20, p=.04), extraversion (r=.17), female (r=.20, p=.04) and male gender role (r=.21, p=.02), and IQ (r=.26, p = .01) correctly (ps = .01).

Added Value

Our studies indicate the signaling character of language and give a first insight on human perception of linguistic markers in online dating. In the presentation, we will discuss the consequences of our findings in terms of their practical relevance for big data approaches as well as with respect to future research.

Adler-The keyboard is the key—Language cues in online dating-149.pdf