MIDTERM -- SPRING 2009

There are two options: you can write an essay related to a topic covered in class, or you can solve a problem that requires mathematical thinking. In the list below, the first category is marked with (Essay), the second with (Math). Regardless whether you take a more mathematical project or an essay, it is important to not just rehash things you found, but to submit a serious effort on your part.

For essays, we expect a text of about 3,500 words (or more!).
(If you have the wonderful gift of writing succinct yet an intelligent, cogent essay that illuminates many aspects of the topic you picked, you are welcome to hand in a shorter essay. Experience has taught us, however, that students who feel they need much less space typically need to do some more research. Whatever you do, if your text is shorter than our expectations, don't start padding it, PLEASE.)

For mathematical projects we want to see something that is related to the material covered in class, but was not actually covered in the class or in the labs. Again, we ask that you prepare a substantial report, explaining what the problem is you looked at, how it fits in with a topic we saw in class, and then explaining clearly what you figured out, and how you did it. If there are some conclusions to be drawn, give them! There is no length minimum for a mathematical project.

You can also do something that is intermediate, where you need to explain some mathematical concept to make your point, and where you then go on to a more general discussion. In this case our length expectations apply again.

If you pick a topic not on the list below, you MUST clear it with Professors I. Daubechies or S. Hughes BEFORE the midterm break.

In every case, you MUST mention all the sources you have used for your paper.

POSSIBLE TOPICS (not an exhaustive list):

Cryptography.

Error Correction and Compression.

Probability and Statistics.

### Cryptography

CQJXUCQJYSISEDIYIJIEVFHELYDWJXUCEIJERLY
EKIJXYDWYDJXUBUQIJERLYEKIMQOWUEHWUFEBOQ

The next example was encrypted with a more serious encryption algorithm than the previous one. Decipher and explain how you did it:

"Bpm barm tnvavb zu wmsvfpb, bpm mkqsbqbvze, bpm tmetm zu omvef izam bpqe Iqe, cpvxp vt bzrxptbzem zu bpm pvfpmtb mkxmssmexm, vt bz om uzrew ve iqbpmiqbvxt qt tramsg qt ve nzmbag." Omabaqew Arttmss
The last example is the most difficult. Decipher and explain how you did it:

GCTNYPZDPRNDMEQQXMMCMJUOQPFSKXFSKPYZMCM
JURWCJNKXFSKCLPGCYNJWNKFNYPDCXXMQXXSCYG
CYJWNZDPRNDRWNYCJCKZPDSWXRLPYUKXFSKPDND
NBECDNZJXONIEQQNZXEJJXONKEDNXMGNJJCYGPI
PCDJXRNPDKJEICZINXIQNRCQQMPQQCYJXJWNJDP
IPYZGCTNJWNPYKRNDMCMJUXYNWXRNTNDJWNDNPQ
PYKRNDCKXMFXEDKNXYNWEYZDNZPKLXKJXMJWNLR
CQQWPTNWXQNKCYJWNLPYZKXONEYRNPDPOQN

Let us use modular arithmetic to try to generate a sequence of random digits. We will use the prime number p = 1069 in our computation. We will use the number 342 as our "seed", or starting point. Now we generate the our first three digits. We calculate seed * seed (modulo p) = 342 * 342 (mod 1069) = 443. Then we generate the next three digits by multiplying the result 443 by the seed 342 modulo p, and we will get 777. Then we continue: we multiply each result by the seed modulo 1069 to get the next 3-digit entry of the sequence. In case we obtain a number below 100, we add leading zeros to get three digits total. In case we get a number more than 999, we disregard this number and continue (that is, we do not use it in our sequence, but we still multiply it with 342 to obtain the next 3-digit number).

Explain why we want to add leading zeros to numbers with fewer than 3-digits, and explain why we disregard numbers with more than 3 digits. That is, explain why the digits would be less randomly distributed if we didn't follow these procedures.

Explain why this pseudo random digit generator will always end up generating digits in cycles. (That is why it is a *pseudo*-random digit generator.)

Try different seeds. Find seeds that generate cycles of different lengths. Explain how we should try to choose seeds to make the cycle longer. Explain which seeds we need to avoid, so as not to get easy patterns.

Wilson's Theorem. Let p be an integer greater than one. p is prime if and only if (p-1)! = -1 (mod p).

This beautiful result is of mostly theoretical value because it is relatively difficult to calculate (p-1)! In contrast it is easy to calculate ap-1, so elementary primality tests are built using Fermat's Little Theorem rather than Wilson's Theorem. For example, the largest prime ever shown prime by Wilson's theorem is most likely 1099511628401 (as far as we know), and even with a clever approach to calculating n!, this still to about one day on a SPARC processer. But numbers with tens of thousands of digits have been shown prime using a converse of Fermat's theorem in a less than an hour.

Explain why Wilson's theorem is true.
Make an estimation of how much time proving primality with Wilson's theorem can take.

On the page: http://www.utm.edu/research/primes/prove/index.html you can find a discussion on primality testing.

There are several primality tests there which are good for special case for the number n (the primality of which we would like to prove). Pick one of the test and explain how it works.

There are some probabilistic tests there. Pick one of them and explain how it works.

Search the Internet and find the biggest known prime. How many digits does it have? What is special about this number?

Explain why the biggest known prime has much more digits than one of the numbers nobody can factor with present-day factoring algorithm.

"Secret sharing schemes were discovered independently by Blakley and Shamir. The motivation for secret sharing is secure key management. In some situations, there is usually one secret key that provides access to many important files. If such a key is lost (for example, the person who knows the key becomes unavailable, or the computer which stores the key is destroyed), then all the important files become inaccessible. The basic idea in secret sharing is to divide the secret key into pieces and distribute the pieces to different persons in a group so that certain subsets of the group can get together to recover the key."

The security scheme of Shamir is based on polynomial interpolation. Please describe in detail how and why this scheme works, and give an example of a set of 4 keys which together encodes a message using Shamir's scheme. If any of these keys are missing, one should not be able to decode the message! For some good descriptions of this scheme, please check out:

Note: There are other secret sharing schemes as well. If you would like to use another secret sharing scheme instead, you must ask Prof. Daubechies ahead of time.
"The good news about cryptography is that we already have the algorithms and protocols we need to secure our systems. The bad news is that that was the easy part; implementing the protocols successfully requires considerable expertise. The areas of security that interact with people -- key management, human/computer interface security, access control -- often defy analysis. And the disciplines of public-key infrastructure, software security, computer security, network security, and tamper-resistant hardware design are very poorly understood. Companies often get the easy part wrong, and implement insecure algorithms and protocols. But even so, practical cryptography is rarely broken through the mathematics; other parts of systems are much easier to break. The best protocol ever invented can fall to an easy attack if no one pays attention to the more complex and subtle implementation issues. Netscape's security fell to a bug in the random-number generator. Flaws can be anywhere: the threat model, the system design, the software or hardware implementation, the system management. Security is a chain, and a single weak link can break the entire system. Fatal bugs may be far removed from the security portion of the software; a design decision that has nothing to do with security can nonetheless create a security flaw."

Please write a 10 page essay describing 5 or 6 incidences that involve security flaws in computer systems. Please describe in detail how the hacker broke into the system, and whether or not these break-ins were due to mathematical flaws in the encryption scheme. Please try to focus your discussion away from specific hacker's personal lives and try to concentrate on the way these hackers were able to bypass security systems by finding flaws in the current system. Try to talk mostly about technical problems with the systems and touch only briefly on social aspects.

Just to satisfy your curiousity, one hacker you might want to read about (but only briefly mention in your essay!) is Kevin Mitnick, AKA Condor. He did a lot of hacking by tricking people into giving him passwords and codes. It's generally known that the weakest link in security are the employees, not the computers! Here's a little paragraph about him from the following website: http://www.takedown.com/bio/index.html

"That break-in occurred over Memorial Day weekend in 1981, when Kevin and two friends decided to physically enter Pacific Bell's COSMOS phone center in downtown Los Angeles. COSMOS, or Computer System for Mainframe Operations, was a database used by many of the nation's phone companies for controlling the phone system's basic recordkeeping functions. The group talked their way past a security guard and ultimately found the room where the COSMOS system was located. Once inside they took lists of computer passwords, including the combinations to the door locks at nine Pacific Bell central offices and a series of operating manuals for the COSMOS system. To facilitate later social engineering they planted their pseudonyms and phone numbers in a rolodex sitting on one of the desks in the room. With a flourish one of the fake names they used was "John Draper," who was an actual computer programmer also known as the legendary phone phreak, Captain Crunch, the phone numbers were actually misrouted numbers that would ring at a coffee shop pay phone in Van Nuys."
"Digital watermarking, sometimes called "fingerprinting," allows copyright owners to incorporate into their work identifying information invisible to the human eye. When combined with new tracking services offered by some of the same companies that provide the watermarking technology, copyright owners can, in theory, find all illegal copies of their photos and music on the Internet and take appropriate legal action."
Please create your own "digital watermark" by altering 4 images in a way which is imperceptible to the human eye (unless one knows where to look), so that the new image contains your watermark. The watermark must have the following properties:
• One should not be able to identify the presence of the watermark, even if one zooms in very closely (unless of course, they know how you did it).
• If one alters the image slightly to try to remove the watermark, they should not easily be able to do it.

Please write an essay explaining exactly how you implemented this watermark (please describe your algorithm in detail) and describe how you would test whether or not an image had been watermarked by you. Also, please tell us how one might destroy your watermark in a way that would only alter the image slightly (perhaps by changing the brightness, size, adding random noise, rotating the image, or other). In other words, how could someone destroy your watermark even if they didn't know what it was (does it have a weakness)?

You can find the game of Nim here, and there is a link from this page describing the winning strategy, which relies on parity addition. http://www.cut-the-knot.com/nim_st.shtml

Currently, the game is set up so that the first player can always win using the strategy (if the amount of objects in the three piles do not parity add to 0 at the start of the game). Please answer the following questions about Nim.

1) True or false: If there were 4 piles instead of 3, the first player could still always win. If it's true, explain why, if it's false, give a counterexample of a game where the first player always tries to play by the strategy but still loses.

2) True or false: If there were 3 players instead of 2, and assuming players 2 and 3 don't play by the winning strategy described and assuming players 2 and 3 both want to win, the first player almost always loses if she or he plays by the strategy. Explain why or why not.

3) If we change the rules a little bit, so that the last player who picks up a piece loses (instead of wins), is it possible to alter the first player's strategy so that she or he can still win?

The webpage http://www.cut-the-knot.com/front.shtml actually has a lot of games that are Nim in disguise! Pick 2-3 of these games (i.e. Nimble, Northcott's Game, Plainim, Scoring, or Turning Turtles) and describe in detail why these games and their winning strategies are the same as Nim. (You must describe the winning strategy!) Also, for each game you pick, play a sample game between you (who knows the winning strategy) and an opponent (who doesn't know the strategy), describing every move and how you knew to make that move. Draw a diagram of the game board after every move that is made to show us what you mean, and after each diagram, write down how you knew to make that particular move.

### Error Correction and Compression

 b5 = b1 b2 b3 b6 = b1 b3 b4 b7 = b2 b3 b4

to make the transition from an arbitrary 4-bit sequence b1 b2 b3 b4 to the corresponding 7-bit codeword b1 b2 b3 b4 b5 b6 b7.

There is also a variant on this construction, on which one uses a slightly different correspondence. In this variant, a 4-bit sequence d1 d2 d3 d4 gets translated to the 7-bit codeword d1 d2 d3 d4 d5 d6 d7 with:

 d5 = d1 d2 d4 d6 = d1 d3 d4 d7 = d2 d3 d4

The advantage of this method is that there is a very neat way to check whether a given 7-bit string is a codeword or not; if it isn't, it also gives a neat trick to find the digit that should be changed to revert back to a codeword. Here is this trick: given d1 d2 d3 d4 d5 d6 d7,

1) Compute the 3-digit string c1 c2 c3 as follows:

 c1 = d4 d5 d6 d7 c2 = d2 d3 d6 d7 c3 = d1 d3 d5 d7

2) If c1 c2 c3 turns out to be 000 then d1 d2 d3 d4 d5 d6 d7 is a codeword.

3) If c1 c2 c3 is not 000, then it gives, in binary, the label of the d-bit to be changed. For instance, if c1 c2 c3 is 101, then d5 (and only d5) should be changed (because 101 is the binary notation for 5); if c1c2c3 is 011, then d3 (and only d3) should be changed (because 011 is the binary notation for 3 - the leading zero could be disregarded).

The assignment for this midterm topic is to

• check that this alternate 4-7 Hamming code, with its neat correction trick, works.
• this way of Hamming coding is really constructed by requiring that the trick with the c1 c2 c3 works, and deriving what then the formulas for the d5, d6, d7 (and c1, c2, c3) should be. Use the same idea to construct the 11-15 Hamming code, with the corresponding neat correction procedure.
This feature of the utility is known as recursive or iterative compression, and will enable you to compress your data files to a tiny fraction of the original size. In fact, virtually any amount of computer data can be compressed to under 1024 bytes using [the method] to compress its own output files muliple times. Then, by repeating in reverse the steps taken to perform the recusive compression, all original data can be decompressed to its original form without the loss of a single bit.

Sounds like a great idea. Try it with the LZ applet used in the Compression Lab and with a standard compression program (e.g., WINZIP). That is, compress a text/file and then try compressing the compressed text/file. For the applet, this can be done by copying the compressed text from the compression window into the (cleared) input text window. Does the compression work a second time? Try the process again with the doubly compressed text/file. And again. What happens?

There is a fundamental mathematical principle at work here. This principle is one of the foundations of a branch of mathematics called Information Theory.

It is possible to show (using the mathematics you've learned in Math Alive), that the 1992 claim is clearly false. Let us assume the claim is that there exists an algorithm which can compress every possible string of 64K = 64*1024*8 bits = 524,288 bits into a string of just 64*1024*8/16 = 32,768 bits. We can show that this is impossible using a simple counting argument.

Despite the fact that we can show such claims as the ones made by the company are ludicrous, they are made with alarming regularity. In fact, a number of U.S. patents have been issued for schemes which we can prove could never work. For example, in 1996, patent #5533051 entitled "Method for data compression" was granted which claims a method "for reducing an input string by one bit regardless of the bit pattern of the input string." This can also be proven false via a counting argument.

• Show that the company's claim from 1992 cannot be true.
• Show that the 1996 patent's claim cannot be true.
• Discuss why it is difficult to compress random data as compared to English text.
• Experiment with the LZ applet (or WINZIP) compressing large English texts and estimate the redudancy of English text encoded using ASCII. Describe your experiments and justify your estimate.
• Estimate the redundancy of other languages (e.g., French, Spanish, German, Hebrew) using the same technique. Which language did you find is the most redundant? Why do you think this is the case?
• Define in your own words what "entropy" means with regard to information theory (the listed reference will be useful) and discuss how it relates to the redundancy of a language.
• Describe some of the techniques used to estimate the redundancy or entropy of English in the listed reference and relate those estimates to your estimates of redundancy.

The following reference will be useful: Silicon Dreams, Robert Lucky

### Probability and Statistics

Choose a topic that interests you from one of the "Chance" articles. These articles are often followed up with Letters to the Editor offering counterpoints, as well as rebuttals from the original authors. Follow one such chain of exchanges. Consult the original cited sources, and write your own Letter to the Editor.

To illustrate the wide range of topics covered, here is a list of the feature articles in issues of a few years ago:

STATISTICS FOR HOMELAND DEFENSE by David L. Banks

DOES HAVING BOYS OR GIRLS RUN IN THE FAMILY (Fall 2001) by Joseph Rodgers and Debby Doughty

SURPRISES FROM SELF EXPERIMENTATION by Seth Roberts with discussion by Robert Rosenthal and Donald Rubin

THE EFFECT OF ADMISSIONS TEST PREPARATION: EVIDENCE FROM NELS88 by Derek Briggs

IS THE USGA GOLF HANDICAP SYSTEM EQUITABLE? by Lawrence Kupper, Leonard Hearne, Sandra Martin and Jeffrey Griffin

THE POTS OF PHYLAKOPI: APPLYING STATISTICAL TECHNIQUES TO ARCHAEOLOGY by Ina Berg and Selwyn Blieden

ARE THOSE OTHER DRIVERS REALLY GOING FASTER? by Donald Redelmeier and Robert Tibshirani

REFINING THE POINT(S)-AFTER-TOUCHDOWN DECISION by Harold Sackrowitz

FOUR MILLION ADOLESCENTS SMOKE: OR DO THEY? by Mary Grace Kovar

CRIMINAL VIOLENCE OF NFL PLAYERS by Alfred Blumstein and Jeff Benedict

MYTH OF SELF-DEFENSE GUN USES by David Hemenway (related exchange of letters from Winter 2000 issue)

IS USING A CAR PHONE LIKE DRIVING DRUNK? by Donald A. Redelmeier and Robert J. Tibshirani

TIME MANAGEMENT IN SPORTS: BALL CONTROL AND OTHER MYTHS by Harold Sackrowitz and Daniel Sackrowitz

Case 1. As chance would have it, you neither have, nor have access to, any coins. You do, however, have a fair die. Explain how you would use the die to make your decision.

Case 2. You have one coin, which is biased in such a way that tails occur with probability close to 49%. You would like to use it, but you do not want to flip it more than twice. What can you do to make your probability much closer to 1/2, than the original 49%?

Case 3. You have only one coin, which is biased in such a way that tails occur with probability 1/3. Can you use this coin? Explain how. Can you make the probability exactly 1/2?

Case 4. You have only one coin, which is, however, biased in such a way that tails occurs with probability 1/4. Do you think you could use this coin to help you make your decision? If you can, explain how; if you can't explain why.

Case 5. You have one coin, which is biased in such a way that tails occur with probability p. Can you use this coin? Explain how.

Case 6. You have a biased coin, and you do not know the probability with which tails occur. How can you use the coin to create the probability close to 1/2?

Case 7. You have a biased coin, and you do not know the probability with which tails occur. You need to design a way to use this coin that allows a yes/no decision, where each alternative (yes or no) has probability exactly 1/2. Because this may not be possible, you are willing to accept a plan in which with low probability you cannot make the decision at all, but if you make it you will do so with probability exactly 1/2 for each alternative.

Case 8. In another situation you have a perfect coin, but you need to make one decision out of three. That is, you want to use it to generate probability 1/3 (you are allowed to make many flips). Can you do that? Explain. Can you generate any fixed probability with one coin? Explain. Describe the probabilities that you can generate exactly.

Case 9. A classical puzzle is to design ways to "toss a fair coin over the phone". That is, Alice and Bob agree that one of them will be in charge of a chore that neither really likes. If they were in the same room, they would toss a coin for it, but they are at opposite sides of Princeton and have to reach a decision over the phone. Bob first proposes to toss a coin and to tell Alice, but Alice doesn't trust that he will relay the answer truthfully, and she refuses to use this method. Bob next proposes the following: he will open the Princeton White Pages phone book for this year, he will read a phone number to her, and she has to reply immediately whether it is on a left or right page. Bob then tells her the name corresponding to the phone number, and she can look up for herself whether she was correct or not. If she is correct, she wins the coin toss, if not, she loses. What do you think of this protocol?, That is, would you use it to toss a coin over the phone? Is there a possibility for either Bob or Alice to cheat?

In your essay discuss the arguments for and against the "placebo effect". What statistical analysis led to the "placebo effect"? What analysis led to the questions of whether the "placebo effect" actually exists? What lurking variables may cause the improvements often attributed to the "placebo effect"? Can you think of any study that would resolve the issue?

You may find the following three pdf files and webpage useful (in addition, the web has a great deal of information on this topic):

When all the cards are collected the government takes 50% of all the money paid. The rest of the money goes to a pot to be divided among the winners. The winners are decided during a TV show when an innocent child picks randomly 6 balls from a pile of balls numbered 1 through 49. If all the numbers you crossed belong to the set of 6 numbers drawn, you are a winner.

What is the probability to become a winner if you cross out 3 numbers? Or 4? Or 5? Or 6 numbers?

The money in the pot is distributed among the winners in such a way that those who guessed 3 numbers receive 10 times less than those who guessed 4, 100 times less than those who guessed 5 and 1000 times less than those who guessed 6.

What is your expected return on each dollar you invest in playing this lottery if you cross out 3 numbers? Or 4? Or 5? Or 6 numbers?

Many people are playing this lottery every week. There are always winners who selected 3 or 4 numbers. But winners who select 5 or 6 numbers do not occur every week. Suppose that you know that this week nobody guessed 6 correct numbers. What is your expected return on 3, 4, or 5 guesses? Answer the same question if nobody wins on 5 guesses. What about when nobody wins on either 5 or 6 guesses?

The proportion of people who guess 3 numbers varies very much from week to week. Sometimes it is one out of 50, sometimes it is one out of 500. Can you explain how can this happen?

A similar lottery was run for a while in Russia. Some players had found a way to make a living of this lottery (without cheating). When the government noticed that some people won far too often, they started investigation because they suspected mischief. As soon as the strategy of the frequently winning players became public, it stopped being effective. Can you find an explanation for both the strategy and its loss of effectiveness.

Find some big lottery in the US with a jackpot. Describe it. Describe the probabilities of winning with different guesses. How large should the jackpot be at the closing time so that your expected return would be more than 100%? Does it depend on the number of people playing that time? Suppose you buy millions of cards and cover all possible combinations of numbers. For which jackpots would this be profitable? Does this depend on the number of winners? For which jackpots is it profitable when you are the only winner? When is it profitable if there are 2 winners? 3 winners? Can you estimate the number of winners?

Of course other suggestions you may have would be very welcome! Several of you have contacted me with great proposals for topics, and if anyone else still wants to do so as well, please get in touch with me!

As announced in class, the absolutely firm deadline for handing in the midterm papers is at noon on Friday, March 27.