Country Meat Consumption Cancer Rate
(grams/day/person) (per 100,000 women/year)
--------------------------------------------------------------------------------
Japan 26 7.5
Finland 101 9.8
Israel 124 16.4
Great Britain 205 23.3
US 284 34How closely related is meat consumption to cancer rate? At first glance, it looks as if there may be some connection. It looks like as meat consumption goes up, there is a corresponding increase in cancer rates. We should take a closer look. We can use MATLAB to plot the data:
>> x = [ 26, 101, 124, 205, 284 ]
>> y = [ 7.5, 9.8, 16.4, 23.3, 34]
>> plot(x,y,'*',x,y)
Based on this graph, is it reasonable to think that the data should
lie along a straight line? Adjusting somehow for experimental error
could we fit this data onto a straight line? We get a better idea
if adjust the data so that our theoretical line would run through the origin.
To do this we compare to the means, that is we compute the average meat
consumption and the average cancer rate and for each country we look at
how far above or below the average it is. Again MATLAB
is a help:
>> xm = x - sum(x)/5
>> ym = y - sum(y)/5tells us that in Japan meat consumption per person per day is 122 grams below the average for all 5 countries. In general we get a new table
Country Meat Consumption Cancer Rate
(grams/day/person) (per 100,000 women/year)
--------------------------------------------------------------------------------
Japan -122 -10.7
Finland -47 -8.4
Israel -24 -1.8
Great Britain 57 5.1
US 136 15.8
Question: Is the rise in cancer rates we see here proportional
to the increase in meat consumption?
If so then these two vectors in 5-space would be multiples of each other. This is not the case -- 122/10.7 = 11.4019 whereas 47/8.4 = 5.5952 and 24/1.8 = 13.3333. On the other hand, it is a bit much to expect that measured data would demonstrate this relationship perfectly. Mathematically we can ask, how close do these vectors come to being multiples of each other. To answer this question we can compute the angle between the two vectors. If this angle is small, then the vectors are nearly multiples of each other. We can determine this angle by computing the dot product of the vectors and dividing out their lengths and finding the inverse cosine:
>> costheta = sum( xm.* ym)/(norm(xm)*norm(ym))
% Note that xm.* ym multiplies the vectors xm and ym componentwise.
% Applying sum to this product vector gives the dot product of xm and ym
% As we might expect norm(xm) is just the length of xm.
>> theta = acos( costheta )In this way we find that the angle between these two vectors in 5-space is about 12 degrees and so we can view this data as evidence that there is a direct relationship between meat consumption and cancer rates. Based on our data what should the constant of proportionality be? If there is no meat consumption at all would we still expect to see occurrences of cancer? What can we deduce from this very limited data?
Assume that their is a linear relationship between meat consumption
and cancer rates. that is
Cancer Rate = a*(Meat Consumption Rate) + bfor some constants a and b. This translates into a matrix equation
[ 26 1 ] [ 7.5 ]
[ 101 1 ] * [ a ] = [ 9.8 ]
[ 124 1 ] [ b ] [ 16.4 ]
[ 205 1 ] [ 23.3 ]
[ 284 1 ] [ 34 ]This system is inconsistent as we already know. Otherwise the data points (26,7.5), (101,9.8), etc would all have fallen on a line of slope a and intercept b when we plotted them. We are assuming that their failure to lie on a line is due to experimental error. We want to correct for that error by finding a slope a and an intercept b that comes as close as possible to our measured data. How can we do this?
More generally, if we wish to solve Ax = b but we find that it has no solutions, we want to find the best approximate solution. We must consider all possible choices of x and pick out the one so that Ax comes closest to b.
Claim: The best approximation is found when we choose x so that Ax is the orthogonal projection of b onto the image of A.
Explanation: Let v be any vector in the image of A. Let p be the orthogonal projection of b onto the image of A. Then b - p is in the orthogonal complement of the image of A. Note that
b - p is orthogonal to p since p is in the image of A.
b - p is orthogonal to v since v is in the image of A.
b - p is therefore orthogonal to p - v as well.Thus we have a right triangle whose edges are the vectors b - p and p - v and their vector sum b - v. By the Pythagorean theorem we know that
|| b - v ||2 = || b - p ||2 + || p - v ||2
Therefore, || b - v ||2 is greater than or equal to || b - p ||2 for all v in image(A).The vector v in the image of A that makes the norm of b - v smallest is therefore the projection vector p.
Upshot: The best approximation to b in the image of A is the orthogonal projection of b onto the image of A. We must choose our approximate solution x* so that
Ax* = pHow do we do that? If x* makes Ax* = p then b - Ax* is orthogonal to the image of A. Now we claim that the orthogonal complement of the image of A is just the kernel of AT .
A = [ v1 | v2 | ...| vn] where vi is the ith column of A.Since the image of A is just the span of the columns of A, a vector x is in the orthogonal complement of the image of A if and only if its dot product with any linear combination of the vi's is 0. But this is true if and only if the dot product of x with all the individual vi's is zero. In matrix terms this is exactly the same as saying that x is in the kernel of the matrix whose ith row is vi. Thus x is in the orthogonal complement of the image of A if and only if x is in the kernel of AT .
Going back to our problem of finding the best approximate solution to
Ax = b, we need to choose x* so that b - Ax* will be in the kernel of AT
. Thus
AT (b-Ax*) = 0 -----> ATb = ATA x*In our example we find that although
[ 26 1 ] [ 7.5 ]
[101 1 ] *[ a ] = [ 9.8 ]
[124 1 ] [ b ] [ 16.4 ]
[205 1 ] [ 23.3 ]
[284 1 ] [ 34 ]has no solution, we can find the best approximate solution by solving
[ 26 1 ] [ 7.5 ]
[ 26 101 124 205 284 ] [101 1 ] *[ a ] = [ 26 101 124 205 284 ] *[ 9.8 ]
[ 1 1 1 1 1 ] * [124 1 ] [ b ] [ 1 1 1 1 1 ] [ 16.4 ]
[205 1 ] [ 23.3 ]
[284 1 ] [ 34 ]
[ 148934 740 ] *[ a ] = [ 17650.9 ]
[ 740 5 ] [ b ] [ 91 ]instead. This we could solve by hand or in MATLAB to get that the line of best fit to the original data is
y = 0.1061*x + 2.4932In general we see that if we have n data points (xi,yi) and we want to find the line of best fit y = ax + b, then we will end up with a 2 x 2 linear system of the form
[ sum of xi^2 sum of xi ]*[ a ] = [ sum of xi*yi ]
[ sum of xi n ] [ b ] [ sum of yi ]and in MATLAB we could set up and solve the above system using
>> M = [ sum(x.*x) sum(x) sum(x.*y) ;
sum(x) 5 sum(y) ]
>> rref(M)A final question -- will this always work out? Will there always be a unique line of best fit and will the method above give it to us. This is the same as asking whether the system
ATb = ATA x*will always have a unique solution. The matrix ATA will be square, so the equation above will have exactly one solution x* if and only if this matrix is invertible. But we claim that this matrix will be invertible if and only if the columns of A are independent.
Theorem: The matrix ATA is invertible if and
only if the columns of A are independent.
Proof:
First notice that ATA is invertible if and only if the kernel
of ATA is the zero vector and similarly the columns of
A are independent if and only if the kernel of A is the zero vector.
So it would suffice to show that
kernel(ATA) = kernel(A)If x is in the kernel of A, then Ax = 0 and therefore ATA x = 0. Thus the kernel of A is a subset of the kernel of ATA. Conversely, if x is in the kernel of ATA then ATA x = 0 and therefore xTATA x = xT0 = 0. But the left hand side of this equation is the same as Ax dotted with itself. In other words if x is in the kernel of ATA, then Ax must be a vector whose dot product with itself is 0 so Ax must be the zero vector and x is in the kernel of A. Thus the kernel of ATA is a subset of the kernel of A. We conclude that they are equal.