Social Data Science
David Garcia, 2021
Welcome to the online materials for Social Data Science.
Social Data Science is an emerging field that studies human behavior and social interaction through digital traces. The revolution in measurement brought by our digital society gives us data at global scales, very high frequencies, and unprecedented levels of depth and resolution.
This course focuses both on the fundamentals and applications of Data Science in the Social Sciences, including technologies for data retrieval. Students of Social Data Science learn how to plan, execute, and interpret complete Data Science projects to address questions about human behavior. After this course, students will know how to gather data from social media, search trends, and other online and offline sources, how to process and store that data, and how to combine, analyze, and visualize data to address specific questions. The course makes a special emphasis in interpretation and critique of Data Science in the Social Sciences, aiming at an interdisciplinary approach that can inform students from various disciplines.
Who am I?
I am the Professor for Computational Behavioral and Social Sciences the Graz University of Technology, where I lead the Computational Social Science Lab. I am also group leader at the Medical University of Vienna and at the Complexity Science Hub Vienna and I am teaching faculty at ETH Zurich, where I teach this course every February. My background is Computer Science but I worked my whole career with psychologists, sociologists and physicists to learn new ways to understand human behavior. I got my PhD from ETH Zurich in 2012 and a habilitation in 2018, starting to work as full professor TU Graz in 2020. To learn more about my research, check my publications.
The course is organized in five blocks with several topics each. Each block has one or more exercises for you to apply what you learned in the block. In exercises, you collect your own data and try to answer Social Data Science questions. The online materials do not contain the solutions to the exercises, but if you are stuck or want to start from an easier point, in the github folder of the exercise you can find a version of the exercise with hints.
Introduction to Social Data Science
1.1. What is Social Data Science?
1.2. SDS Story: Google Flu Trends
1.3. Measuring temporal orientation with Google Trends
1.4. Measuring correlation
1.5. R crash course – [R crash course github folder]
1.6. Accessing the World Development Indicators from R
1.7. Google Trends data in R
1.8. Exercise: Future orientation and economic development – [Exercise github folder]
2.1. Social Impact Theory
2.2. The Simmel Effect
2.3. SDS Story: Baby name trends
2.4. Linear regression
2.5. Bootstrapping – [Github folder]
2.6. Data wrangling with dplyr – [Tutorial github folder]
2.7. The Twitter API in R – [Tutorial github folder]
2.8. Exercise: Division of impact on Twitter – [Exercise github folder]
Computational Affective Science
3.1. Measuring emotions
3.2. Unsupervised sentiment analysis
3.3. SDS Story: Emotions in pagers after 9/11
3.4. Supervised sentiment analysis
3.5. Running unsupervised sentiment analysis in R
3.6. Training supervised sentiment analysis in R
3.7. Exercise: Evaluating sentiment analysis methods – [Exercise github folder]
3.8. Exercise: Twitter sentiment and retweeting – [Exercise github folder]
Social network analysis
4.1. Introduction to social networks
4.2. The Friendship paradox
4.3. SDS story: sampling opinions on Twitter
4.4. Centrality in social networks
4.5. Handling network data in R
4.6. Twitter network data
4.7. Exercise: Swiss politicians on Twitter – [Exercise github folder]
Social network phenomena
5.1. Social resilience
5.2. SDS story: the death of social networks
5.3. Structural holes and communities
5.5. Permutation tests
5.6. Network analysis in R
5.7. Exercise: Assortativity among Swiss politicians on Twitter – [Exercise github folder]