**MIDTERM -- SPRING 2009**

There are two options: you can write an essay related to a topic covered in class, or you can solve a problem that requires mathematical thinking. In the list below, the first category is marked with (Essay), the second with (Math). Regardless whether you take a more mathematical project or an essay, it is important to not just rehash things you found, but to submit a serious effort on your part.

For essays, we expect a text of about 3,500 words (or more!).

(If you have the wonderful gift of writing succinct yet an intelligent, cogent essay that illuminates many aspects of the topic you picked, you are welcome to hand in a shorter essay. Experience has taught us, however, that students who feel they need much less space typically need to do some more research. Whatever you do, if your text is shorter than our expectations, don't start padding it, PLEASE.)

For mathematical projects we want to see something that is related to the material covered in class, but was not actually covered in the class or in the labs. Again, we ask that you prepare a substantial report, explaining what the problem is you looked at, how it fits in with a topic we saw in class, and then explaining clearly what you figured out, and how you did it. If there are some conclusions to be drawn, give them! There is no length minimum for a mathematical project.

You can also do something that is intermediate, where you need to explain some mathematical concept to make your point, and where you then go on to a more general discussion. In this case our length expectations apply again.

If you pick a topic not on the list below, you MUST clear it with Professors I. Daubechies or S. Hughes BEFORE the midterm break.

In every case, you MUST mention all the sources you have used for your paper.

**POSSIBLE TOPICS** (not an exhaustive list):

**Cryptography.**

- Decipher (Math).
- Pseudo Random Number Generation (Math).
- Primality Tests (Math).
- Secret Sharing Schemes (Essay).
- The Math isn't the Problem (Essay).
- Digital Watermarking (Essay).
- The Game of Nim and its Variations (Essay).

**Error Correction and Compression.**

**Probability and Statistics.**

**Decipher (Math).**

To warm up start with deciphering the following text which was encrypted with the simplest possible algorithm:

EKIJXYDWYDJXUBUQIJERLYEKIMQOWUEHWUFEBOQ

The next example was encrypted with a more serious encryption algorithm than the previous one. Decipher and explain how you did it:

"Bpm barm tnvavb zu wmsvfpb, bpm mkqsbqbvze, bpm tmetm zu omvef izam bpqe Iqe, cpvxp vt bzrxptbzem zu bpm pvfpmtb mkxmssmexm, vt bz om uzrew ve iqbpmiqbvxt qt tramsg qt ve nzmbag." Omabaqew Arttmss

JURWCJNKXFSKCLPGCYNJWNKFNYPDCXXMQXXSCYG

CYJWNZDPRNDRWNYCJCKZPDSWXRLPYUKXFSKPDND

NBECDNZJXONIEQQNZXEJJXONKEDNXMGNJJCYGPI

PCDJXRNPDKJEICZINXIQNRCQQMPQQCYJXJWNJDP

IPYZGCTNJWNPYKRNDMCMJUXYNWXRNTNDJWNDNPQ

PYKRNDCKXMFXEDKNXYNWEYZDNZPKLXKJXMJWNLR

CQQWPTNWXQNKCYJWNLPYZKXONEYRNPDPOQN

**Pseudo Random Digits Generation (Math).**

Very often, scientists need random numbers for their computer programs,
for example, when they are simulating random processes that occur in
nature. Since computers don't actually do anything randomly, it is
impossible to actually get a random number from a computer. But for many
applications, a set of numbers that seem random are good enough, so we
settle for that. A computer does possess a pseudorandom number generator,
which is just a program that generates numbers that seem random in many
respects, but are not, in fact, random.

**Primality Tests (Math).**

In 1770 Edward Waring announced the following theorem by his former student John Wilson.

On the page: http://www.utm.edu/research/primes/prove/index.html you can find a discussion on primality testing.

There are several primality tests there which are good for special case for the number n (the primality of which we would like to prove). Pick one of the test and explain how it works.

There are some probabilistic tests there. Pick one of them and explain how it works.

Search the Internet and find the biggest known prime. How many digits does it have? What is special about this number?

Explain why the biggest known prime has much more digits than one of the numbers nobody can factor with present-day factoring algorithm.

**Secret Sharing Schemes (Essay).**

Imagine the government trusted you and 15 other people to do a secret
project this Saturday. To enter the extreme-security building you work in,
you and your co-workers must enter a secret number, call it X, into a
computer on the door of the building. You get a secret key (which is
different from X), and your co-workers get their own secret keys, but the
government doesn't trust any one of you to enter the building alone, so
for security reasons, you must get together with at least 10 people with
different keys (one or two of the 15 people might be sick, so you'll still
be fine without them) and do a computation together to figure out the
secret key to the door. The computation won't work without all 11 keys.
What sort of computation should the government set up for you to do?

Here's a quote from the website http://www.rsasecurity.com :

"Secret sharing schemes were discovered independently by Blakley and Shamir. The motivation for secret sharing is secure key management. In some situations, there is usually one secret key that provides access to many important files. If such a key is lost (for example, the person who knows the key becomes unavailable, or the computer which stores the key is destroyed), then all the important files become inaccessible. The basic idea in secret sharing is to divide the secret key into pieces and distribute the pieces to different persons in a group so that certain subsets of the group can get together to recover the key."

The security scheme of Shamir is based on polynomial interpolation. Please describe in detail how and why this scheme works, and give an example of a set of 4 keys which together encodes a message using Shamir's scheme. If any of these keys are missing, one should not be able to decode the message! For some good descriptions of this scheme, please check out:

- http://www.rsasecurity.com/rsalabs/faq/2-1-9.html
- http://www.rsasecurity.com/rsalabs/faq/3-6-12.html

**The Math isn't the Problem (Essay).**

Here is a quote from http://www.counterpane.com/whycrypto.html :

"The good news about cryptography is that we already have the algorithms and protocols we need to secure our systems. The bad news is that that was the easy part; implementing the protocols successfully requires considerable expertise. The areas of security that interact with people -- key management, human/computer interface security, access control -- often defy analysis. And the disciplines of public-key infrastructure, software security, computer security, network security, and tamper-resistant hardware design are very poorly understood. Companies often get the easy part wrong, and implement insecure algorithms and protocols. But even so, practical cryptography is rarely broken through the mathematics; other parts of systems are much easier to break. The best protocol ever invented can fall to an easy attack if no one pays attention to the more complex and subtle implementation issues. Netscape's security fell to a bug in the random-number generator. Flaws can be anywhere: the threat model, the system design, the software or hardware implementation, the system management. Security is a chain, and a single weak link can break the entire system. Fatal bugs may be far removed from the security portion of the software; a design decision that has nothing to do with security can nonetheless create a security flaw."

Please write a 10 page essay describing 5 or 6 incidences that involve security flaws in computer systems. Please describe in detail how the hacker broke into the system, and whether or not these break-ins were due to mathematical flaws in the encryption scheme. Please try to focus your discussion away from specific hacker's personal lives and try to concentrate on the way these hackers were able to bypass security systems by finding flaws in the current system. Try to talk mostly about technical problems with the systems and touch only briefly on social aspects.

Just to satisfy your curiousity, one hacker you might want to read about (but only briefly mention in your essay!) is Kevin Mitnick, AKA Condor. He did a lot of hacking by tricking people into giving him passwords and codes. It's generally known that the weakest link in security are the employees, not the computers! Here's a little paragraph about him from the following website: http://www.takedown.com/bio/index.html

"That break-in occurred over Memorial Day weekend in 1981, when Kevin and two friends decided to physically enter Pacific Bell's COSMOS phone center in downtown Los Angeles. COSMOS, or Computer System for Mainframe Operations, was a database used by many of the nation's phone companies for controlling the phone system's basic recordkeeping functions. The group talked their way past a security guard and ultimately found the room where the COSMOS system was located. Once inside they took lists of computer passwords, including the combinations to the door locks at nine Pacific Bell central offices and a series of operating manuals for the COSMOS system. To facilitate later social engineering they planted their pseudonyms and phone numbers in a rolodex sitting on one of the desks in the room. With a flourish one of the fake names they used was "John Draper," who was an actual computer programmer also known as the legendary phone phreak, Captain Crunch, the phone numbers were actually misrouted numbers that would ring at a coffee shop pay phone in Van Nuys."

**Digital Watermarking (Essay).**

From the site http://webreference.com/content/watermarks/

"Digital watermarking, sometimes called "fingerprinting," allows copyright owners to incorporate into their work identifying information invisible to the human eye. When combined with new tracking services offered by some of the same companies that provide the watermarking technology, copyright owners can, in theory, find all illegal copies of their photos and music on the Internet and take appropriate legal action."Please create your own "digital watermark" by altering 4 images in a way which is imperceptible to the human eye (unless one knows where to look), so that the new image contains your watermark. The watermark must have the following properties:

- One should not be able to identify the presence of the watermark, even if one zooms in very closely (unless of course, they know how you did it).
- If one alters the image slightly to try to remove the watermark, they should not easily be able to do it.

Please write an essay explaining exactly how you implemented this watermark (please describe your algorithm in detail) and describe how you would test whether or not an image had been watermarked by you. Also, please tell us how one might destroy your watermark in a way that would only alter the image slightly (perhaps by changing the brightness, size, adding random noise, rotating the image, or other). In other words, how could someone destroy your watermark even if they didn't know what it was (does it have a weakness)?

**The Game of Nim and its Variations (Essay).**

Here is how Nim is played:

- There are three piles of objects. Each pile can have any number of objects in it.
- At your turn, you choose a pile and remove any number of objects you wish. Those objects are disregarded.
- If you are the last person to pick up an object, you win.

Currently, the game is set up so that the first player can always win using the strategy (if the amount of objects in the three piles do not parity add to 0 at the start of the game). Please answer the following questions about Nim.

1) True or false: If there were 4 piles instead of 3, the first player could still always win. If it's true, explain why, if it's false, give a counterexample of a game where the first player always tries to play by the strategy but still loses.

2) True or false: If there were 3 players instead of 2, and assuming players 2 and 3 don't play by the winning strategy described and assuming players 2 and 3 both want to win, the first player almost always loses if she or he plays by the strategy. Explain why or why not.

3) If we change the rules a little bit, so that the last player who picks up a piece loses (instead of wins), is it possible to alter the first player's strategy so that she or he can still win?

The webpage http://www.cut-the-knot.com/front.shtml actually has a lot of games that are Nim in disguise! Pick 2-3 of these games (i.e. Nimble, Northcott's Game, Plainim, Scoring, or Turning Turtles) and describe in detail why these games and their winning strategies are the same as Nim. (You must describe the winning strategy!) Also, for each game you pick, play a sample game between you (who knows the winning strategy) and an opponent (who doesn't know the strategy), describing every move and how you knew to make that move. Draw a diagram of the game board after every move that is made to show us what you mean, and after each diagram, write down how you knew to make that particular move.

**Hamming Codes (Math).**

In class we discussed the 4-7 Hamming code, and we used the rules (as inspired by Venn diagrams in the movie):

b5 = b1 b2 b3 |

b6 = b1 b3 b4 |

b7 = b2 b3 b4 |

to make the transition from an arbitrary 4-bit sequence b1 b2 b3 b4 to the corresponding 7-bit codeword b1 b2 b3 b4 b5 b6 b7.

There is also a variant on this construction, on which one uses a slightly different correspondence. In this variant, a 4-bit sequence d1 d2 d3 d4 gets translated to the 7-bit codeword d1 d2 d3 d4 d5 d6 d7 with:

d5 = d1 d2 d4 |

d6 = d1 d3 d4 |

d7 = d2 d3 d4 |

The advantage of this method is that there is a very neat way to check whether a given 7-bit string is a codeword or not; if it isn't, it also gives a neat trick to find the digit that should be changed to revert back to a codeword. Here is this trick: given d1 d2 d3 d4 d5 d6 d7,

**1) **Compute the 3-digit string c1 c2 c3 as follows:

c1 = d4 d5 d6 d7 |

c2 = d2 d3 d6 d7 |

c3 = d1 d3 d5 d7 |

**2) **If c1 c2 c3 turns out to be 000 then d1 d2 d3 d4 d5 d6 d7 is a codeword.

**3) **If c1 c2 c3 is not 000, then it gives, in binary, the label of the d-bit to be changed.
For instance, if c1 c2 c3 is 101,
then d5 (and only d5) should be changed (because 101 is the binary notation for 5); if c1c2c3 is 011,
then d3 (and only d3) should be changed (because 011 is the binary notation for 3 -
the leading zero could be disregarded).

The assignment for this midterm topic is to

- check that this alternate 4-7 Hamming code, with its neat correction trick, works.
- this way of Hamming coding is really constructed by requiring that the trick with the c1 c2 c3 works, and deriving what then the formulas for the d5, d6, d7 (and c1, c2, c3) should be. Use the same idea to construct the 11-15 Hamming code, with the corresponding neat correction procedure.

**Miraculous Compression (Essay).**

In April of 1992, a small technology company announced that they had
developed a compression algorithm that could compress all files of more
than 64K in size to about 1/16th their original size. They claimed that
the compressed output file created by their algorithm can be used as
the
input file to subsequent executions of the program. In their words:

This feature of the utility is known as recursive or iterative compression, and will enable you to compress your data files to a tiny fraction of the original size. In fact, virtually any amount of computer data can be compressed to under 1024 bytes using [the method] to compress its own output files muliple times. Then, by repeating in reverse the steps taken to perform the recusive compression, all original data can be decompressed to its original form without the loss of a single bit.

Sounds like a great idea. Try it with the LZ applet used in the Compression Lab and with a standard compression program (e.g., WINZIP). That is, compress a text/file and then try compressing the compressed text/file. For the applet, this can be done by copying the compressed text from the compression window into the (cleared) input text window. Does the compression work a second time? Try the process again with the doubly compressed text/file. And again. What happens?

There is a fundamental mathematical principle at work here. This principle is one of the foundations of a branch of mathematics called Information Theory.

It is possible to show (using the mathematics you've learned in Math Alive), that the 1992 claim is clearly false. Let us assume the claim is that there exists an algorithm which can compress every possible string of 64K = 64*1024*8 bits = 524,288 bits into a string of just 64*1024*8/16 = 32,768 bits. We can show that this is impossible using a simple counting argument.

Despite the fact that we can show such claims as the ones made by the company are ludicrous, they are made with alarming regularity. In fact, a number of U.S. patents have been issued for schemes which we can prove could never work. For example, in 1996, patent #5533051 entitled "Method for data compression" was granted which claims a method "for reducing an input string by one bit regardless of the bit pattern of the input string." This can also be proven false via a counting argument.

In your paper,

- Show that the company's claim from 1992 cannot be true.
- Show that the 1996 patent's claim cannot be true.
- Discuss why it is difficult to compress random data as compared to English text.
- Experiment with the LZ applet (or WINZIP) compressing large English texts and estimate the redudancy of English text encoded using ASCII. Describe your experiments and justify your estimate.
- Estimate the redundancy of other languages (e.g., French, Spanish, German, Hebrew) using the same technique. Which language did you find is the most redundant? Why do you think this is the case?
- Define in your own words what "entropy" means with regard to information theory (the listed reference will be useful) and discuss how it relates to the redundancy of a language.
- Describe some of the techniques used to estimate the redundancy or entropy of English in the listed reference and relate those estimates to your estimates of redundancy.

The following reference will be useful: Silicon Dreams, Robert Lucky

** Bar Code Schemes (Math).**

Find some bar code schemes that we did NOT discuss (and that are
not written up in the notes), and try to find out how they work. Don't
forget to discuss how effective the check digit system is, for several
types of mistakes! You could also find another check digit system, figure
out what the formula is for the check digit, and discuss its effectiveness.

**"Chance" (Essay).**

"Chance" is a magazine about statistics and its uses in society, from government issues to sports
(http://www.public.iastate.edu/~chance99/).
The feature articles demonstrate the use (and sometimes misuse) of statistical analysis of data in the social,
biological, physical and medical sciences.

STATISTICS FOR HOMELAND DEFENSE by David L. Banks

DOES HAVING BOYS OR GIRLS RUN IN THE FAMILY (Fall 2001) by Joseph Rodgers and Debby Doughty

THE EFFECT OF ADMISSIONS TEST PREPARATION: EVIDENCE FROM NELS88 by Derek Briggs

THE POTS OF PHYLAKOPI: APPLYING STATISTICAL TECHNIQUES TO ARCHAEOLOGY by Ina Berg and Selwyn Blieden

ARE THOSE OTHER DRIVERS REALLY GOING FASTER? by Donald Redelmeier and Robert Tibshirani

REFINING THE POINT(S)-AFTER-TOUCHDOWN DECISION by Harold Sackrowitz

FOUR MILLION ADOLESCENTS SMOKE: OR DO THEY? by Mary Grace Kovar

CRIMINAL VIOLENCE OF NFL PLAYERS by Alfred Blumstein and Jeff Benedict

MYTH OF SELF-DEFENSE GUN USES by David Hemenway (related exchange of letters from Winter 2000 issue)

IS USING A CAR PHONE LIKE DRIVING DRUNK? by Donald A. Redelmeier and Robert J. Tibshirani

TIME MANAGEMENT IN SPORTS: BALL CONTROL AND OTHER MYTHS by Harold Sackrowitz and Daniel Sackrowitz

**Probability and Coin Flipping (Math).**

You have to make a decision and need to flip a coin.

**The Placebo Effect (Essay).**

In 1955 Henry Beecher published a paper in the The Journal of the
American Medical Association entitled "The powerful placebo" which
essentially claimed that one third of people who take a suger pill
(but are told the pill is an effective drug) will report an improvement
in their condition. This originated the idea of the "placebo effect"
and for decades this effect was accepted as scientific fact.
Recently, the "placebo effect" has come under attack by a number
of physicians who claim that the "placebo effect" is simply due to a
misinterpretation of data, and may truly be just in our heads (and
not exist at all).

- Hrbjartsson A, Gtzsche PC. Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med 2001;344:1594-1602.
- Bailar JC III. The powerful placebo and the Wizard of Oz. N Engl J Med 2001;344:1630-1632.
- Letters to the Editor - N Engl J Med, Vol. 345, No. 17, pp. 1276-1282
- http://skepdic.com/placebo.html

**Lottery (Math).**

The government of Chandan conducts a lottery every week. You can buy a card for one dollar listing the numbers from 1 to
49. You can cross out 3, 4, 5, or 6 numbers and submit your card.

What is the probability to become a winner if you cross out 3 numbers? Or 4? Or 5? Or 6 numbers?

Of course other suggestions you may have would be very welcome! Several of you have contacted me with great proposals for topics, and if anyone else still wants to do so as well, please get in touch with me!

As announced in class, the absolutely firm deadline for handing in the midterm papers is at noon on Friday, March 27.

Ingrid Daubechies and Shannon M. Hughes
*
Last modified: ###date###
*