What is the aim of Social Data Science?
The aim of Social Data Science is the Quantitative Understanding of Human Behavior.
Each word in this aim is important:
- Quantitative: As opposed to qualitative or descriptive, we aim for robust findings grounded in strong evidence that can be quantified.
- Understanding: Not just predicting, we want to be able to generalize and combine knowledge, and even to motivate interventions or policies.
- Human: We will not study particles or objects. Measurement validity and ethics will be a challenge.
- Behavior: Observable changes, structures, dynamics, and patterns; not just stories or theories
How are we going to do it?
Retrieving, processing, analyzing, and interpreting Digital traces.
Digital traces are the leftovers of information that we leave behind when we use Information and Communication Technologies. These happen when you use your mobile phone, when you use a search engine, and when you post something on social media. A famous example of digital trace data is this map:
This maps shows the friendship links among half a billion people. The links connnect the locations where these people live. Note that there are no map lines drawn below the friendship links, the contour of countries and continents is visible thanks to the data of user locations. You can also see that, while there are some international friendships, most links happen between close regions, making the inner part of countries brighter than the onceans between. You can learn more about this map in this Facebook blog post.
What is data?
Data: Facts in the form of stored and transmittable information.
- Data is the plural of the Latin word datum. Data means ”given (things)”.
- Data is given to us, it is not fabricated nor simulated.
- Is Data singular or plural?
What is Social Data Science?
Data Science is the application of methods from Computer Science and Statistics to empirical questions and practical problems. In Data Science, sometimes the ability to combine methods is more important than excellence in individual disciplines. Communicating resutls and methods and synthesizing what we learn from data is very important.
Social Data Science is the application of Data Science to study human behavior and social interaction. We gather digital trace data with Computer Science techniques, performing statistical analyses of ”messy” Social Data. It is important to intrepret results with respect to what is known in Sociology, Political Science, Psychology, and even Anthropology and Economics.
Why digital traces?
The data captured by digital traces can have six qualities that complement other data sources:
What are the limitations of digital traces?
With the great potential of digital traces, aslo come great challenges:
- Platform biases: The design of platforms and their algorithms can affect the behavior we observe. Some platforms have non-human actors (e.g. bots) that affect the data.
- Data gatekeepers: Not everyone can access the same data. Some research might be impossible to reproduce.
- Performative behavior: Talking online is not the same than offiline. We have to consider what is the purpose of the people involved and if that distorts their behavior.
- Representativity issues: Not everyone leaves digital traces. We should always ask oursevels if the people in a sample reflect the group we are interested to analyze.
- Observational data: Testing causal mechanisms is not straightforward. We should always consider alternative explanations.
- The data deluge: Too much data enables black-box predictions that can be useful but limit our understanding.
A great reference about these limitations is the 2019 review by Olteanu et al..
This course has Social Data Science Stories that illustrate these limitations with examples from previous research. One of the aims of the course is for you to get a critical understanding of the use of digital trace data to study human behavior. Even though they have limitations, like every other scientific method, digital traces are a great data source and in this course you will learn how to use them!
The importance of questions in Social Data Science
In the book series The Hitchhiker’s Guide to the Galaxy, and advanced civilization builds Deep Thought, a supercomputer to find the answer to the Ultimate Question of Life, the Universe, and Everything. After millions of years of computation, it answered “42”.
Understanding our questions is a prerequisite to understanding their answers.
Machine Learning is data-driven, statistics is method-driven, and many social sciences are theory-driven. Social Data Science is question-driven. In this course, you will see example of questions about human behavior. We will study how social science theories give us expectations about their answers, what data to retrieve to answer the questions, and which methods to apply to make it possible.