SOLUTION MANUAL
Linear Algebra and Optimization for Machine
Learning
1st Edition by Charu Aggarwal. Chapters 1 – 11
vii
,Contents
1 Linear Algebra and Optimization: An Introduction
V G V G V G V G V G 1
2 Linear Transformations and Linear Systems
V G V G VG V G 17
3 Diagonalizable Matrices and Eigenvectors
VG VG VG 35
4 Optimization Basics: A Machine Learning View
VG VG VG VG VG 47
5 Optimization Challenges and Advanced Solutions
VG VG VG VG 57
6 Lagrangian Relaxation and Duality
V G V G V G 63
7 Singular Value Decomposition
V G V G 71
8 Matrix V G Factorization 81
9 The Linear Algebra of Similarity
V G V G V G V G 89
10 The Linear Algebra of Graphs
V G V G V G V G 95
11 Optimization in Computational Graphs
V G V G V G 101
viii
,Chapter V G 1
Linear Algebra and Optimization: An Introduction
VG VG VG VG VG
1. For any two vectors x and y, which are each of length a, show
V G V G V G V G V G V G V G V G V G V G V G V G V G V G
that (i) x − y is orthogonal to x + y, and (ii) the dot product of x −
V G V G VG VG V G VG VG VG VG VG VG VG VG VG VG VG VG VG VG
3y and x + 3y is negative.
V G VG VG VG V G VG
(i) The first is simply
VG · − x x y y using the distributive property of m
VG VG VG V G VGV V G V G V G VG VG VG VG VG VG
atrix multiplication.· The dot product of a vector with itself is its squ
VG
G
VG VG VG VG VG VG VG VG VG VG VG
ared length. Since both vectors are of the same length, it follows tha
VG VG VG VG VG VG VG VG VG VG VG VG
t the result is 0. (ii) In the second case, one can use a similar argume
VG VG VG VG VG VG VG VG VG VG VG VG VG VG VG
nt to show that the result is a2 − 9a2 , which is negative.
VG VG VG VG VG VG VG VG VG VG VG VG
2. Consider a situation in which you have three matrices A, B, and C, of
VG VG VG VG VG VG VG VG VG VG VG VG VG
VG sizes 10 × 2, 2 × 10, and 10 × 10, respectively.
VG VG VG VG VG VG VG VG VG VG VG
(a) Suppose you had to compute the matrix product ABC. From an effic
VG VG VG VG VG VG VG VG VG VG VG
iency per- VG
spective, would it computationally make more sense to compute (AB)C
VG VG VG VG VG VG VG VG VG VG VG
or would it make more sense to compute A(BC)?
VG VG VG VG VG VG VG VG
(b) If you had to compute the matrix product CAB, would it make more
VG VG VG VG VG VG VG VG VG VG VG VG V
G sense to compute (CA)B or C(AB)?
VG VG V G V G V G
The main point is to keep the size of the intermediate matrix as s
VG VG VG VG VG VG VG VG VG VG VG VG VG
mall as possible in order to reduce both computational and space
VG VG V G VG VG VG VG VG VG VG VG
requirements. In the case of ABC, it makes sense to compute BC fi VG VG VG VG VG VG VG VG VG VG VG VG
rst. In the case of CAB it makes sense to compute CA first. This t
VG VG VG VG VG VG VG VG VG VG VG VG VG VG
ype of associativity property is used frequently in machine learning
VG VG VG VG VG VG VG VG VG V
in order to reduce computational requirements.
G VG VG VG VG VG
V G
—
3. Show that if a matrix A satisfies A = V G V G V G V G V G V G V G
A , then all the diagonal elements
T
VG V G VG VG VG VG VG
of the matrix are 0.
VG VG VG VG
Note that A + AT = 0. However, this matrix also contains twice the
VG VG VG VG VG VG VG VG VG VG VG VG VG
diagonal elements of A on its diagonal. Therefore, the diagonal el
VG VG VG VG VG VG VG VG VG VG VG
ements of A must be 0. VG VG VG VG VG
VG —
4. Show that if we have a matrix satisfying A= VG VG VG VG VG VG VG VG
1
, AT , then for any column vector
VG VG VG VG VG VG VG
x, we have x Ax = 0.
VG VG V G
T
VG VG VG
Note that the transpose of the scalar xT Ax remains unchanged. Ther
V G V G V G V G V G V G V G VG V G V G V G
efore, we have
V G V G
xT Ax = (xT Ax)T = xT AT x = −xT Ax. Therefore, we have 2xT Ax
VG VG VG VG V G VG VG VG VG VG VG VG V G V G VG VG VG
= 0.
VG
2