SOLUTION MANUAL
Linear Algebra and Optimization for Machine Learning
1st Edition by Charu Aggarwal. Chapters 1 – 11
vii
,Contents
1 Linear Algebra and Optimization: An Introduction 1
2 Linear Transformations and Linear Systems 17
3 Diagonalizable Matrices and Eigenvectors 35
4 Optimization Basics: A Machine Learning View 47
5 Optimization Challenges and Advanced Solutions 57
6 Lagrangian Relaxation and Duality 63
7 Singular Value Decomposition 71
8 Matrix Factorization 81
9 The Linear Algebra of Similarity 89
10 The Linear Algebra of Graphs 95
11 Optimization in Computational Graphs 101
viii
,Chapter 1
Linear Algebra and Optimization: An Introduction
1. For any two vectors x and y, which are each of length a, show that
(i) x − y is orthogonal to x + y, and (ii) the dot product of x − 3y and
x + 3y is negative.
(i) The first is simply
· −x · x y y using the distributive property of matrix
multiplication. The dot product of a vector with itself is its squared length.
Since both vectors are of the same length, it follows that the result is 0. (ii)
In the second case, one can use a similar argument to show that the result
is a2 − 9a2, which is negative.
2. Consider a situation in which you have three matrices A, B, and C, of
sizes 10 × 2, 2 × 10, and 10 × 10, respectively.
(a) Suppose you had to compute the matrix product ABC. From an
efficiency per- spective, would it computationally make more sense to
compute (AB)C or would it make more sense to compute A(BC)?
(b) If you had to compute the matrix product CAB, would it make more
sense to compute (CA)B or C(AB)?
The main point is to keep the size of the intermediate matrix as small
as possible in order to reduce both computational and space
requirements. In the case of ABC, it makes sense to compute BC first. In
the case of CAB it makes sense to compute CA first. This type of
associativity property is used frequently in machine learning in order
to reduce computational requirements.
3. Show that if a matrix A satisfies —A = AT , then all the diagonal
elements of the matrix are 0.
Note that A + AT = 0. However, this matrix also contains twice the
diagonal elements of A on its diagonal. Therefore, the diagonal
elements of A must be 0.
4. Show that if we have a matrix satisfying
— A = AT , then for any
column vector x, we have xT Ax = 0.
Note that the transpose of the scalar xT Ax remains unchanged. Therefore, we
have
1
, xT Ax = (xT Ax)T = xT AT x = −xT Ax. Therefore, we have 2xT Ax = 0.
2