Chapter 1: Introduction to Data Science Ethics
1.1 The rise of Data Science (Ethics)
It is crucial for businesses, both large and small, to understand the ethical aspects of data science and
its outcomes for businesses and society both positive and negative.
What is data science ethics?
Ethics is all about what is right and what is wrong?
1.2 Why Care
Data scientists and business students are not inherently unethical, but at the same time not trained
to think this through neither
1. Expected from society
Power of data science has become clear to both the data subjects, data scientists and
business leaders.
Generation Z => cares about social justice and ethics
2. Huge potential risks
Be aware of the risks and countermeasures
Risks for humans
o Physical and mental well-being (self-driving)
o Privacy
o Discrimination
Risks for businesses
o Reputational => reputational risk can easily translate into financial risk
o Financial
3. Potential benefits
Understanding ethical concerns and applying techniques to deal with this, can improve the
model beyond the ethical part => more accurate predictions or better user acceptance
o Remove bias in data: improve the accuracy and fairness of the model
o Explain predictions: improve trust in the model
o Ensure proper data gathering: better data quality
o Part of a company’s brand (cf. expected from society)
Summary:
Life goal in itself (philosophical goal)
Societal and business reasons:
1. Expected from society
2. Huge potential risks
3. Data science ethics can bring value
Future:
Increased digitalization
Increased automation
Increased use of AI
=> An EU legal framework is coming
,1.3 Right and Wrong
Ethics: moral principles that control or influence a person’s behaviour
Moral: Concerned with principles of right and wrong behaviour
Ethics Theories: Utilitarianism vs Deontological ethics
Utilitarianism:
o Ethical theory that determines right from wrong by focusing on outcomes
=consequentialism: what is produced in the consequence of the act
Action is moral if the consequence is moral, action is seen as a means to an
end => trolley problem
Justifies immoral things
Deontology
o Ethical theory that the morality of an action should be based on whether that action
itself is right or wrong under a series of rules, rather than based on the consequences
Not doing immoral actions
Aristotle’s Nicomachean Ethic
We study ethics to improve our lives
Trough proper upbringing and teaching we can find the righteous actions to take => the right
habits => a good stable character
Moral behaviour can be found at the mean between two extremes: excess and deficiency
‘Golden mean’ condition
o Deficiency: Not using any data at all
o Excess: Using all data available without any concern for issues such as privacy,
discrimination or transparency,
Law <-> Ethics
• The law tells us what we can and can’t do
• Ethics tells us what we should do => What is right and what is wrong?
Who decides what is ethical?
If there are no clear laws guiding us, each person and business must decide for themselves where
they want to be on the continuum between excess and deficiency. This is influenced by how we as a
society value data science ethics. => Ethics are subjective
Discrimination:
Data science in itself is all about discrimination. E.g. discriminating possible loan applicants
who are likely to repay the loan from the ones likely not to etc.
Discrimination is also an ethical aspect:
o It is important not to discriminate against sensitive groups. But who decides what
sensitive groups are? => Ethics are subjective
Application-dependency
o Fair to use gender and race data?
o Typically discrimination against race, gender and religion is considered unfair. But
this is not always the case and depends on the application: credit scoring vs medical
diagnosis
, Time dependent
o Women: allowed to vote in US in 1920, in Belgium in 1948, in Moldova in 1978
o Black people: slavery in the US
o Victims of our time:
What we find normal right now, might not be so ethical in the future
Can become sensitive groups in the future => 2 groups
Those we consider not wrong to discriminate against, but who
currently have the same rights as all humans: elderly people, low
income, etc.
o In future we might consider this unacceptable, making age
and income also sensitive variables
Those we consider to have less rights than humans: animals, robots
Location dependent
o Respect for elder, disrespect for criminals, etc.
o Respect for the right of individuals vs. the state
o Trolley-problem:
Preference to spare young over older characters much less in Eastern
countries in Comparison to Western countries
Latin-American countries have a weaker preference to spare humans over
pets
1.4 Data Science
Data, Algorithms and Models
Data: facts or information, especially when examined and used to find out things or to make
decisions
The data itself can be unethical
o E.g. personal data for which you do not have consent or data with a bias against
sensitive groups
Algorithm: a set of rules that must be followed when solving a particular problem
An algorithm is nothing more than a set of rules/steps that are to be followed => not as
straightforward as ethical or unethical => garbage in garbage out
o E.g. Decision tree algorithm => data and model biased => algorithm biases as well?
Prediction or AI Model: the decision-making formula, which has been learnt from data by a
prediction/AI algorithm
A predictive model can also be unethical (especially when built on unethical data)
o E.g. a predictive model that discriminates against sensitive groups.
Types of Data
Personal data: ‘personal data’ means any information relating to an identified or identifiable natural
person (‘data subject’);
An identifiable natural person is one who can be identified, directly or indirectly, in
particular by reference to an identifier such as a name, an identification number, location
data, an online identifier or to one or more factors specific to the physical, physiological,
genetic, mental, economic, cultural or social identity of that natural person; (GDPR, Article 4)
, Sensitive data: personal data revealing racial or ethnic origin, political opinions, religious or
philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data
for the purpose of uniquely identifying a natural person, data concerning health or data concerning a
natural person's sex life or sexual orientation shall be prohibited. (GDPR, Article 9)
Behavioural data: data providing evidence of actions taken by persons, such as location data,
Facebook likes, online browsing data, payment data.
Data science
Data science: a set of fundamental principles that support and guide the principled extraction of
information and knowledge from data
Data mining: the actual extraction of knowledge from data via technologies that incorporate these
(data science) principles
Artificial Intelligence: methods of improving the knowledge or performance of an intelligent agent
over time, in response to the agent’s experience
Most discussion within data science ethics are on predictive modelling or supervised learning,
because these are used to find patterns in the form of a prediction model, that predicts the value of
some target variable.
Descriptive modelling or unsupervised learning will also extract patterns, but these patterns are not
used to make predictions, but rather to discover descriptive patterns in the data.
A major ethical issue with artificial intelligence (large artificial neural network) is that they are
typically black box and provide no further information on why a certain prediction is made.
Ethical AI Frameworks => Context, no need to memorize
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (2018)
1. Human Rights: A/IS shall be created and operated to respect, promote, and protect internationally
recognized human rights.
2. Well-being: A/IS creators shall adopt increased human well-being as a primary success criterion for
development.
3. Data Agency: A/IS creators shall empower individuals with the ability to access and securely share
their data, to maintain people’s capacity to have control over their identity.
4. Effectiveness: A/IS creators and operators shall provide evidence of the effectiveness and fitness
for purpose of A/IS.
5. Transparency: The basis of a particular A/IS decision should always be discoverable.
6. Accountability: A/IS shall be created and operated to provide an unambiguous rationale for all
decisions made.
7. Awareness of Misuse: A/IS creators shall guard against all potential misuses and risks of A/IS in
operation.
8. Competence: A/IS creators shall specify and operators shall adhere to the knowledge and skill
required for safe and effective operation.