Conference Agenda

Session
B 2: Turning Unstructured (Survey) Data into Insight with Machine Learning
Time:
Thursday, 10/Sep/2020:
11:40 - 1:00

Session Chair: Stefan Oglesby, data IQ AG, Switzerland

Presentations

Huge and extremely deep language models for verbatim coding at human level accuracy

Pascal de Buren

Caplena GmbH, Switzerland

Relevance & Research Question:

In the last 2 years, substantial improvements in many natural language processing tasks were achieved by building models with several 100s of millions of parameters and pre-training them on massive amounts of publicly available texts in an unsupervised manner [1] [2]. Among the tasks are also many classification problems, which are similar to verbatim coding, the categorization of free-text responses to open-ended questions popular in market research. We wanted to find out if these new models could set new state-of-the-art results in automated verbatim coding and thus potentially be used to accelerate or replace the tedious manual coding process.

Methods & Data:

We train a model based on [1] but adapted to the task of verbatim coding through architecture tweaks and pre-training on millions of reviewed verbatims on https://caplena.com. This model was benchmarked against simpler models [3] and widely available tools [4] as well as against human coders on real survey data. Chosen datasets were provided by two independent market research institutes in the Netherlands (Dutch) and in Brazil (Portuguese and English) with n=525 and n=5900 respectively to test the new model on small surveys and large ones alike.

Results:

Our model was able to outperform both our previous simpler models as well as standard online tools on a variety of surveys in multiple languages with a weighted-F1 improvement on the Brazilian data from 0.37 to 0.59. We achieve new state-of-the art results for automated verbatim coding matching or even surpassing the intercoder agreement between human coders with an F1 of 0.65 from our model vs 0.61 for the human intercoder agreement on the Dutch dataset.

Added Value:

Researchers can now have a tool that performs verbatim coding almost instantaneously and at a fraction of the cost with similar accuracy to full human coding. Besides the quantitative benchmark results, we also provide qualitative examples and guidance as to which surveys are well suited for automated coding and which less so.

References:

[1] https://arxiv.org/abs/1810.04805

[2] https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

[3] Support-Vector-Machines with Bag-of-Words-Features and Long-Short-Term-Memory Networks with Word-Embeddings

[4] https://monkeylearn.com/



A Framework for Predicting Mentoring Needs in Digital Learning Environments

Cathleen M. Stuetzer1, Ralf Klamma2, Milos Kravcik3

1TU Dresden, Germany; 2RWTH Aachen, Germany; 3DFKI - Deutsches Forschungszentrum für Künstliche Intelligenz, Berlin, Germany

Relevance & Research Question:

Modeling online behavior is a common tool for studying social “ecologies” in digital environments. For this purpose, most of the existing models are used for analysis and prediction.More prediction tools are needed for providing evidence especially for exploring online social behavior to depict the right target group within the right contexts. As a prominent use case In higher education research, the quality of learning processes shall be ensured by providing suitable instruments like mentoring for the pedagogical and social support of students. But how can we identify mentoring needs in digital learning environments? And to what extent can predictive models contribute to the quality assurance of learners’ progress?

Methods & Data:

For contributing to the research questions, we firstly extract and analyze text data from online discussion boards in a distance learning environment by using (automated) text mining procedures (e.g. by topic modelling, semantic analytics, and sentiment analytics). Based on the results, we identify suitable behavioral indicators for modeling specific mentoring occasions by using network analytic instruments. To analyze the emergence of (written) language and to explore latent patterns of sentiments within students’ discussions, GAT (discourse analytic language transcription) instruments are applied. Finally, we build a predictive model of social support in digital learning environments and compare the results with findings from neurolinguistic models.

Results & Added Value:

The contribution proposes an analytical implementation framework for predicting procedures to handle behavioral data of students in order to explore social online behavior in digital learning environments. We present a multidisciplinary model that implements theories from current research, depicts behavioral indicators, and takes systemic properties into account, so that it can be used across the board for applied research. In addition, the findings will provide insights on social behavior online to implement suitable pedagogical and social support in digital learning environments. The study is still a work in progress.



Using AI for a better Customer Understanding

Stefan Reiser1, Steffen Schmidt1, Frank Buckler2

1LINK Institut, Switzerland; 2Success Drivers, Germany

Relevance & Research Question:

Many companies are struggeling with the growing amount of customer data and touchpoint-based feedback. Instead of learning from this feedback and instead of using it to continuously improve their processes and products, many tend to waste it. Besides, traditional causal models like regession analyses do not help to truely understand why KPIs end up at a certail level. Our research objective was to develop a new process, based on AI models, that help to...

a) most automatically structure and analyze customer feedback.

b) better understand customers incl. hidden drivers of their behaviour.

c) help companies to take action based on their customer feedback.

Methods & Data:

Our approach was to make use of information that almost every company has: the NPS score and open-ended text feedback (=reasons for giving this NPS score per customer). The text feedback is being coded by means of supervised learning incl. the sentiment behind statements (=NLP), then the outcomes are being analyzed by means of neural network analysis.

Results:

The explorative nature of this approach (= open-ended feedback and machine learning algorithms) reveals, which influence individual criteria will have on the NPS. Unexpected nonlinearities and interactions can be unveiled. Hidden Drivers to leverage the NPS can be uncovered. A large amount of data is reduced to the most significant aspects, we also developed a simple dashboard to illustrate these.

Added Value:

We found that this approach produces a far better explanation power than traditional methods like manual coding and linear regression - usually, the explanation power is twice as good! Besides, these analyses may be implemented for any industry or product, and they can produce insights for historical data. Finally, when dealing with big data sets, the machine learning approaches help to be faster and more efficient.



Read my hips. How t9 address AI transcription issues

André Lang1, Stephan Müller1, Holger Lütters2, Malte Friedrich-Freksa3

1Insius, Germany; 2HTW Berlin, Germany; 3GapFish GmbH, Germany

Relevance & Research Question:

Analyzing voice content in large scale is a promising but difficult task. Various sources of voice data exist, ranging from audio streams of YouTube videos to voice responses in online surveys. While automated solutions for speech-to-text transcription exist, the question remains how to validate, quality-check and leverage the output of these services to provide insights for market research. This study focuses on methods for evaluating quality, filtering and processing of the texts returned from automated transcript solutions.

Methods & Data:

The feasibility and challenges of processing text transcripts from voice is assessed along the process steps of error recognition, error correction and content processing. As a baseline, a set of 400 voice responses, transcribed with Google’s Cloud Speech-to-Text service, are enriched with the real responses taken from original audio. Differences, being errors in recognition, are categorized into their different causes, such as unknown names or similar sounding words. Different methods for error detection and correction are proposed, applied and tested against this goldset corpus and evaluated. The resulting texts are processed with an NLP concept detection method usually applied to UGC content in order to check how further insights such as inherent topics can be derived.

Results:

Although having improved substantially over the past years, handling speech-to-text output remains challenging. Unsystematic noise generated from mumbling or individual pronunciation is less problematic, but systematic errors such as mis- or undetected person or brand names, due to difficult or ambiguous pronounciation, may distort results substantially. The error detection methods shown, along with detection confidence values retrieved from the transcription service, provide a first baseline for filtering, rejecting input of low quality, and further processing in order to get meaningful insights.

Added Value:

Previous studies have shown that voice responses are more intense and longer than typed ones. Having ways of evaluating and controlling the output of speech-to-text services, knowing their limitations, and checking the applicability of NLP methods for further processing is vital to build robust analytical services. This study covers the topics that have to be addressed in order to draw substantial benefits from voice as a source of (semi-)automated analytics.