LECTURE 1 The inferences made are stated in terms of probabilities and
the quality of classification is measured in terms of error rates.
Patterns and Pattern Recognition
Pattern: (relevant part of) acquired or sensed data (e.g. a face Learning: using training data to automatically adjust some inner
or an iris in an image, the sound of a gun shot in an audio parameters of a classifier system. Commonly called ”machine
signal, a specific combination of words in a text). learning (ML)”. The type of learning in which a feature vector
is provided with a label that specifies the class/identity/location
Pattern recognition (the process): of the object is called ‘supervised learning’.
1) data acquisition • Relation PR and ML: PR makes use of ML. ML concerns
2) segmentation of relevant data only one part of PR.
3) extraction of features (descriptors) • Relation PR and AI: AI - ‘software that does things that
4) classification / detection / identification / authentication / look intelligent to educated people’, ‘intelligence exhibited
localization by computers’. Most AI applications involve some form of
The pattern recognition framework: PR. Some PR applications may not be considered as AI,
e.g. bar-code recognition.
Person authentication by iris pattern
We can use an iris scan to extract a binary code from an iris
image. Then we can calculate the Hamming distance between
two iris codes:
Types of pattern recognition:
• detection: spam detection
• classification: tumor classification.
• identification: fingerprints in crime scene.
• authentication: fingerprint scan phones.
• regression: linear regression, PCA.
Features: mapping of objects into a multi-dimensional feature
space.
Bar code: a pattern designed to be easily recognized by
a computer. Universal product code (UPC): 30 bars that code
for 95 bits. A group of 7 bits codes a decimal digit. Starts and Where XOR is 0 if the two bits are equal and 1 if the two bits
ends with 101 and has 01010 in the middle. are different.
Automatic UPC recognition (feature extraction):
1) segment UPC in an image We can do histograms of the Hamming distances of pairs
2) divide UPC in 95 segments of equal width of iris codes: one comparing of two iris patterns of the same
3) assign 1 or 0 to dark or light segments person and another one comparing two iris patterns of different
4) result: 95 bits (binary features) persons. The distribution of values of the Hamming distance of
A given product has always the same binary number. the iris codes of different persons can be modeled very well by
a normal distribution, i.e. the envelope of that histogram can be
Classification using a decision boundary fitted by a Gaussian function:
In a single-feature (or one-dimensional) case, classification can 1
1 (x − µ)2
be done by taking a certain value, called decision criterion, N (µ, σ 2 ) = √ exp −
σ 2π 2 σ2
and classifying objects to class A or B, depending on whether
their corresponding feature values are larger or smaller than This mathematical model is useful because we can use it to
the decision criterion. One single feature (length or lightness) compute the probability of measuring a certain value of the
might not be enough to achieve good classification. We can Hamming distance also in domains in which the histogram
pack them in a feature vector x = (length, lightness). In that contains no data. For the histogram of the HD for different
multi-dimensional case, we use a decision boundary. There person’s iris, we have to select a threshold C (decision criterion),
are many different possible decision boundaries. then:
• If HD < C, we conclude that the two irises are identical
Statistical Pattern Recognition (Obtaining HD ≤ C for two different irises is still possible
The values of the features extracted from natural objects exhibit but it is such a rare event that it is considered as an
a certain spread. This spread is dealt with by statistical methods. anomaly)
, 2
• If HD > C, we conclude that this HD comes from the Degrees of freedom: is (in general) the number of independent
comparison of two different irises. quantities used in the computation of a statistic. Example: n
degrees of freedom for the computation of the normalized
Statistical decision theory (Hypothesis testing) Hamming distance of binary vectors of n statistically
Given the pdf(HD) that two irises are different Formulate independent bits.
hypotheses
• H0 , null hypothesis: the two irises are different Binary degrees of freedom: not all bits in an iris code
• Ha , alternative: the two irises are identical are statistically independent. We know that: Theoretical
HD ∼ N (p, p(1 − p)/n), empirical HD ∼ N (p, σ 2 ). From
Select a decision criterion (HD value) C and a rejection region the empirical values of σ and p we can compute some
according to the desired significance level (odds of false decision n = p(1 − p)/σ 2 (n < N ) that represents the number of
= false rejection of H0 ). Compare two iris codes and obtain their statistically independent bits needed to encode an iris pattern
HD. Test H0 hypothesis: and that we call effective number of (binary) degrees of
• if HD > C: do not reject H0 (type II error, fn) freedom. The higher n, the larger the number 2n of unique
• if HD ≤ C: reject H0 and accept Ha (type I error, fp) objects that can be represented and discriminated.
Applications of Iris Recognition: National border control, Generalized binary degrees of freedom: in other applications
Computer login, Data security, Anti-terrorism, Secure e- (e.g. finger print recognition) the values used to compute a
banking, etc. dissimilarity are not binary. Then, how should we compare the
discriminative power of different methods?
Missing binary features • Assume that the dissimilarity is normally distributed D ∼
Origins of missing of data: N (µ, σ 2 )
0 0 2
• Occlusion by eyelids or eyelashes • Transform D to D = D/2µ, D ∼ N (1/2, (σ/2µ) )
0
• Reflections from eye-glasses • We can now think of D as the Hamming distance of two
• Ring shadow by hard contact lenses binary vectors of n statistically independent bits, D0 ∼
• Local signal-to-noise ratio not good enough N (p, p(1 − p)/n)
2
• No good local iris texture • From N (1/2, (σ/2µ) ) = N (p, p(1 − p)/n) we get n =
2
(µ/σ)
Iris code mask: takes value 1 (0) where data is considered LECTURE 2
good (bad).
Elements of Bayesian Decision Theory
Hamming distance of iris codes A and B with missing
Performance measures in binary classification tests or systems:
data masks: N • Sensitivity (also called true positive rate or recall) measures
k(code A code B) ∩ mask A ∩ mask Bk
Hamming Distance = the proportion of positives that are correctly identified as
kmask A ∩ mask Bk such. (In pattern recognition and information retrieval the
where kmask A ∩ mask Bk is the number of bits n that are equivalent of the medical test term ’sensitivity’ is called
good in both iris codes (n can change from scan to scan). ‘Recall’.)
• Specificity (also called true negative rate) measures the
Statistics reminder proportion of negatives that are correctly identified as such.
Let a (coin) tail (encoded as 1) appear with a probability p in • Positive predictive value (PPV): The fraction of true
a coin tossing experiment. Let the coin be tossed n times The positives to all (true+false) positives. PPV depends on the
number of tails d is normally distributed: sensitivity of the test and the prevalence of the disease. In
p ∼ N (pn, p(1 − p)n) pattern recognition and information retrieval the PPV is
called ’precision’.
Normal distribution of normalized iris code Hamming distance
(HD): Bayes rule ( x – positive test result, wj – being sick):
HD = d/n HD ∼ N (p, p(1 − p)/n) c
p(x|wj )P (wj ) X
P (wj |x) = , where p(x) = p(x|wj )P (wj )
p(x) j=1
where c is the number of classes, P (wj ) is the prior, p(x)
the evidence, p(x|wj ) is the likelihood (pdf as a function of
the first argument (feature value x) with the second argument
(class) fixed) and P (wj |x) the posterior probability (the last
two are also examples of class conditional probabilities).
Probabilistic approach to classification: For each point,
estimate the probability for each class. Choose the class with
the highest probability. So we would select a threshold to apply
a simple selection rule using the prior probabilities. With the
rule being: (
w1 , if P (w1 |x) > P (w2 |x)
w2 , otherwise
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Ariadnaaz. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.07. You're not tied to anything after your purchase.