Social Data Science

Online materials for Social Data Science


Social Data Science

David Garcia, 2024

Welcome to the online materials for Social Data Science.

Social Data Science is an emerging field that studies human behavior and social interaction through digital traces. The revolution in measurement brought by our digital society gives us data at global scales, very high frequencies, and unprecedented levels of depth and resolution.

This course focuses both on the fundamentals and applications of Data Science in the Social Sciences, including technologies for data retrieval. Students of Social Data Science learn how to plan, execute, and interpret complete Data Science projects to address questions about human behavior. After this course, students will know how to gather data from social media, search trends, and other online and offline sources, how to process and store that data, and how to combine, analyze, and visualize data to address specific questions. The course makes a special emphasis in interpretation and critique of Data Science in the Social Sciences, aiming at an interdisciplinary approach that can inform students from various disciplines.

Who am I?

I am the Professor for Social and Behavioral Data Science at the University of Konstanz. You can find more about my research group here: http://dgarcia.eu. My background is Computer Science but I worked my whole career with psychologists, sociologists and physicists to learn new ways to understand human behavior. I got my PhD from ETH Zurich in 2012 and a habilitation in 2018, starting to work as full professor TU Graz in 2020 and then at the University of Konstanz in 2022. To learn more about my research, check my publications.

Course Contents

The course is organized as a block course in five days with several topics each. There is an R crash course and four exercises for you to apply what you learned in the block. In exercises, you collect your own data to answer Social Data Science questions. The online materials do not contain the solutions to the exercises, but if you are stuck or want to start from an easier point, in the github folder of the exercise you can find a version of the exercise with hints in the form of parts of the code of the solution.

  1. Introduction to Social Data Science
    1.1. What is Social Data Science?[Slides]
    1.2. SDS Story: Google Flu Trends[Slides]
    1.3. Measuring temporal orientation with Google Trends[Slides]
    1.4. Measuring correlation[Slides]
    1.5. R crash course[R crash course materials]
    1.6. Accessing the World Development Indicators from R - [Tutorial files]
    1.7. Google Trends data in R - [Tutorial files]
    Exercise 1: Future orientation and economic development[Exercise materials]

  2. Social dynamics
    2.1. Social Impact Theory[Slides]
    2.2. The Simmel Effect[Slides]
    2.3. SDS Story: Baby name trends[Slides]
    2.4. Data wrangling with dplyr[Tutorial files]
    2.5. Linear regression[Slides]
    2.6. Bootstrapping[Tutorial files]
    2.7. Accessing the Reddit API from R[Tutorial files]
    Exercise 2: Division of impact on Reddit[Exercise materials]

  3. Computational Affective Science
    3.1. Measuring emotions[Slides]
    3.2. Unsupervised sentiment analysis[Slides]
    3.3. SDS Story: Emotions in pagers after 9/11[Slides]
    3.4. Running unsupervised sentiment analysis in R[Tutorial files]
    3.5. The Measurement of meaning from text[Slides]
    3.6. Supervised sentiment analysis[Slides]
    3.7. Sentiment Analysis Applications – [Slides]
    3.8. Training supervised sentiment analysis in R[Tutorial files]
    Exercise 3: Evaluating sentiment analysis methods[Exercise materials]

  4. Social network analysis
    4.1. Introduction to social networks[Slides]
    4.2. The Friendship paradox[Slides]
    4.3. Handling network data in R[Tutorial files]
    4.4. SDS story: sampling opinions on Twitter[Slides]
    4.5. Centrality in social networks[Slides]
    4.6. Handling Twitter network data[Tutorial files in Exercise 4 folder]
    4.7. Privacy in online social networks – [Slides]
    Exercise 4: Assortativity among Swiss politicians on Twitter[Exercise materials]

  5. Social network phenomena
    5.1. Social resilience[Slides]
    5.2. SDS story: the death of social networks[Slides]
    5.3. Structural holes and communities[Slides]
    5.4. Permutation tests[Tutorial materials]
    5.5. Ethics in social data science – [Slides]
    5.6. Assortativity[Slides]
    5.7. Network analysis in R[Tutorial materials]

Where to access materials

  • Handouts, codes, and data can be found on the Github repository of the course.
  • Students at ETH Zurich can access the course moodle to get information about evaluation criteria for the course and to participate in online quizzes.

To learn more about my research in Social Data Science

  1. Measuring Gender Divides with Facebook Data
  2. Analyzing the Digital Traces of Collective Emotions after a Terrorist Attack
  3. Measuring large-scale emotion aggregates through social media text
  4. Complex Privacy in Online Social Networks
  5. Food Polarization on Social Media
  6. Linguistic Embeddigs for the Identification of Affect