100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Stanford University STATS 231 hw1-solutions. $9.49   Add to cart

Exam (elaborations)

Stanford University STATS 231 hw1-solutions.

 31 views  0 purchase
  • Course
  • Institution

Homework 1 solutions CS229T/STATS231 (Fall) 1. Value of labeled data (11 points) In many applications, labeled data is expensive and therefore limited, while unlabeled data is cheap and therefore abundant. For example, there are tons of images on the web, but getting labeled images is much ...

[Show more]

Preview 2 out of 10  pages

  • July 6, 2021
  • 10
  • 2020/2021
  • Exam (elaborations)
  • Questions & answers
avatar-seller
Homework 1 solutions
CS229T/STATS231 (Fall 2018–2019)
Note: please do not copy or distribute.

Due date: Wed, Oct 10, 11pm




1. Value of labeled data (11 points)
In many applications, labeled data is expensive and therefore limited, while unlabeled data is cheap and
therefore abundant. For example, there are tons of images on the web, but getting labeled images is much
harder. What is the statistical value of having labeled data versus unlabeled data? This problem will explore
this formally using asymptotics.
Specifically, suppose we have an exponential family model over a discrete latent variable h and a discrete
observed variable x:
pθ (h, x) = exp{θ · φ(h, x) − A(θ)},
P
where A(θ) = log h,x exp{θ · φ(h, x)} is the usual log-partition function.
Suppose that n examples (h(1) , x(1) ), . . . , (h(n) , x(n) ) are drawn i.i.d. from some true distribution pθ∗ .
Define the following two estimators:
n
1X
θ̂sup = arg max log pθ (h(i) , x(i) ) (1)
θ∈Rd n i=1
n
1X X
θ̂unsup = arg max log pθ (h, x(i) ). (2)
θ∈Rd n i=1
h

The supervised estimator θ̂sup uses the variable h(i) and maximizes the joint likelihood, while the unsuper-
vised estimator θ̂unsup marginalizes out the latent variable h.
One important caveat: our results will hold when we assume that data is actually generated from our
model family and that unsupervised learning is possible. Otherwise, labeled data is worth a lot more.



a. (2 points) (supervised asymptotic variance) Compute the asymptotic variance of θ̂sup : that is,
√ d
given that n(θ̂sup − θ∗ ) −
→ N (0, Vsup ), write an expression for Vsup that depends on expectations/variances
involving φ.

Solution:
Using notation from class, let ` denote the log-likelihood and L be the expected log-likelihood. Recall that

Vsup = ∇2 L(θ∗ )−1 Covθ∗ [∇`(z, θ∗ )]∇2 L(θ∗ )−1 = Covθ∗ [∇`(z, θ∗ )]−1 ,

where the second equality follows from Bartlett’s identity since we have assumed the model is well specified.
Now, letting z = (h, x),

∇`(z, θ∗ ) = ∇(θ∗ · φ(h, x) − A(θ∗ )) = φ(h, x) − Eθ∗ [φ(h, x)],

so
−1
Vsup = Covθ∗ [φ(h, x) − Eθ∗ [∇`(z, θ∗ )]] = Covθ∗ [φ(h, x)]−1 .


1

, b. (2 points) (unsupervised asymptotic variance) Compute the asymptotic variance of θ̂unsup : that
√ d
is, given that n(θ̂unsup − θ∗ ) −
→ N (0, Vunsup ), write an expression for Vunsup that depends on expectation-
s/variances involving φ.

Solution:
Similar to the previous part, we have
X
∇`(z, θ∗ ) = ∇ log exp{θ∗ · φ(h, x) − A(θ∗ )}
h
P
· (φ(h, x) − Eθ∗ [φ(h, x)])
h pθ (h, x)P

=
h pθ (h, x)



= Eθ∗ [φ(h, x) | x] − Eθ∗ [φ(h, x)].

Therefore,
−1 −1
Vunsup = Covθ∗ [Eθ∗ [φ(h, x) | x] − Eθ∗ [φ(h, x)]] = Covθ∗ [Eθ∗ [φ(h, x) | x]] .




c. (3 points) (comparing estimators) Prove that θ̂sup has lower (or equal) asymptotic variance
compared to θ̂unsup . That is, show that

Vsup  Vunsup ,


Solution:
We have,
−1
Vsup = Covθ∗ [φ(h, x)] = Eθ∗ [Covθ∗ [φ(h, x) | x]] + Covθ∗ [Eθ∗ [φ(h, x) | x]]
−1
= Eθ∗ [Covθ∗ [φ(h, x) | x]] + Vunsup
−1
 Vunsup ,

where the last inequality follows since the covariance is positive semi-definite. Therefore, Vsup  Vunsup ; i.e.,
θ̂sup has lower asymptotic variance.



d. (4 points)
Consider the exponential family
1
pθ (h, x) = exp(θhx),
Z
where h, x ∈ {0, 1} and Z = h,x∈{0,1}2 exp(θhx).1 Essentially, (h, x) is a pair of correlated biased coin
P

flips, where pθ (1, 1) = exp(θ)/Z and pθ (0, 0) = pθ (0, 1) = pθ (1, 0) = 1/Z.
1Z is often referred to as the partition function.


2

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Themanehoppe. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75632 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$9.49
  • (0)
  Add to cart