Events

AI-Assisted Qualitative Text Analysis Bootcamp

MAGICS Aalto / Creative Technologies (CT) community is organizing a hands-on bootcamp to demonstrate the strength and importance of generative AI models in one’s own research. Our purpose is to show how to effectively use these tools for large datasets, with examples from conversation analysis, and survey answers.
MAGICS infrastructure header image

AI-Assisted Qualitative Text Analysis Bootcamp 

Time: Thursday October 31 between 12:00-16:00 
PLEASE NOTE changed Venue: Aalto University, TUAS building, Maarintie 8, 02150 Espoo, Floor 1, room 1521 (AS6)
Target audience: Academic and industry researchers and designers working with text data such as interview transcripts, surveys, and think aloud data from user tests. 
Format: An introductory talk + hands-on browser-based exercises (bring your own laptop). 
Materials: All materials linked from https://github.com/PerttuHamalainen/LLMCode, further links at the bottom of this page.

Does your research involve analysis of text corpora? Join us to learn how to harness  open source tools based on Large Language Models (LLMs) for AI-assisted data visualization, exploration, filtering, and inductive & deductive coding.  No technical expertise needed. The tools are designed to be useful across multiple qualitative analysis paradigms such as thematic analysis, qualitative content analysis, and grounded theory

Tools 

The bootcamp will be given by Prof. Perttu Hämäläinen, Dr. Enrico Glerean, and doctoral researcher Joel Oksanen. The primary tool utilized is the LLMCode Python library and notebooks: https://github.com/PerttuHamalainen/LLMCode. LLMCode is based on this ACM CHI 2023 best paper, with various extensions: https://dl.acm.org/doi/full/10.1145/3544548.3580688  

Following an introductory lecture, we will proceed to hands-on tutorials and exercises usable also by those without programming experience. Exercises (notebook colab) and other materials are linked from the repository https://github.com/PerttuHamalainen/LLMCode

Design principles 

Our tool design philosophy can be summarized as: 

  1. AI should do the dishes and other chores, letting humans focus on what they love. Our aim is not to replace researchers with AI. Instead, we strive to empower researchers to tackle new research questions and work with datasets that are too large for traditional approaches. 

  2. If parts of the research process are outsourced to AI, one needs to be able to scrutinize and measure the quality of the results. 

Based on these principles, we cater to different target groups and use cases: 

If you love doing qualitative analysis, but you have too much data to analyze, you can use AI to 1) filter the data and produce a manageable sample of relevant data, and 2) verify and question your findings with the full dataset.  

If you value doing qualitative analysis manually but it’s not the main focus of your work, you can do as much manual analysis as you can, offload the rest to AI, and use your manually analyzed data (extracts, codes) to guide and evaluate the AI results. 

If you view qualitative analysis as simply something that needs to be done as efficiently as possible (e.g., for industry user research), you can run things more automatically to generate quick reports such as a table of most frequent themes with counts and quotes. 

For all types of qualitative analysis, we believe that AI-assisted data visualization and exploration can help in the initial stage of immersing oneself in the data and thinking of research questions and analysis approaches. 

Bring your own data? 

The tutorials and exercises use public datasets. If you want to use your own data in the bootcamp, you need to format your data either as  

  1. A word document (.docx) with independently analyzed units of text such as paragraphs separated by “-----” and examples of codes provided as comments. Example: https://raw.githubusercontent.com/PerttuHamalainen/LLMCode/master/test_data/bopp_test_augmented_1.docx 

  2. A spreadsheet (.csv or .xlsx) with units of text in one column and examples of codes in another column. Example: https://raw.githubusercontent.com/PerttuHamalainen/LLMCode/master/test_data/bopp_test.xlsx  

We are happy to answer data-related questions ahead of time to make sure your efforts are not wasted. Email [email protected] with your questions. 

Materials used during the workshop

All materials are linked from the GitHub repository LLMCode

Please note that as the bootcamp uses OpenAI’s APIs, we cannot support processing of personal or sensitive data during the workshop. This is merely a practical limitation; while we possess technical means that enable GDPR-safe processing, we cannot offer them to participants without Aalto University user accounts. We are happy to provide pointers that should help you get things up and running with your own organisation’s computational resources. 

  • Published:
  • Updated:
Share
URL copied!