Tiếng Anh và mức độ quan trọng đối với cuộc sống của học sinh, sinh viên Việt Nam.Khi nhắc tới tiếng Anh, người ta nghĩ ngay đó là ngôn ngữ toàn cầu: là ngôn ngữ chính thức của hơn 53 quốc gia và vùng lãnh thổ, là ngôn ngữ chính thức của EU và là ngôn ngữ thứ 3 được nhiều người sử dụng nhất chỉ sau tiếng Trung Quốc và Tây Ban Nha (các bạn cần chú ý là Trung quốc có số dân hơn 1 tỷ người). Các sự kiện quốc tế , các tổ chức toàn cầu,… cũng mặc định coi tiếng Anh là ngôn ngữ giao tiếp.
Probability and Computing
Randomized Algorithms and Probabilistic Analysis
Michael Mitzenmacher
Eli Upfal
Probability and Computing
Randomization and probabilistic techniques play an important role in modern com
puter science, with appl ications ranging from combinatorial optimization and machine
learning to communication networks and secure protocols.
This textbook is designed to accompany a one- or two - semester course for advanced
undergraduates or beginning graduate students in computer science and applied mathe
matics. It gives an excellent introduction to the probabil istic techniques and paradigms
used in the development of probabilistic algorithms and analyses . It assumes only an
elementary background in discrete mathematics and gives a rigorous yet accessible
treatment of the material, with numerous examples and applications.
The first half of the book covers core material , including random sampling, expec
tations, Markov's inequality, Chebyshev 's inequality, ChernotT bounds, bal ls-and-bins
models, the probabilistic method, and Markov chains . I n the second half, the authors
delve into more advanced topics such as continuous probability, applications of limited
i ndependence, entropy, Markov chain Monte Carlo methods. coupling, martingales,
and balanced allocations. With its comprehensive selection of topics, along with many
examples and exercises, this book is an indispensable teaching tool.
Michael Mitzenmacher is John L. Loeb Associate Professor in Computer Science at
Harvard University. He received his Ph . D. from the University of California. Berke
ley, in 1 996. Prior to joining Harvard in 1 999, he was a research staff member at Digital
Systems Research Laboratory in Palo Alto. He has received an NSF CAREER Award
and an Alfred P. S loan Research Fel lowship. In 2002, he shared the IEEE Information
Theory Society " B est Paper" Award for his work on error- correcting codes.
Eli Upfal is Professor and Chair of Computer Science at Brown University. He received
his Ph . D. from the Hebrew University, Jerusale m , Israe l . Prior to joining Brown in
1997 , he was a research staff member at the IBM research division and a professor at
the Weizm ann Institute of Science in I srael. His main research interests are randomized
computation and probabilistic analysis of algorithms, with applications to optimization
al gorithms, communication networks, parallel and distributed computing. and compu
tational biology.
Probability and Computing
Randomized Algorithms and
Probabilistic Analysis
Michael Mitzenmacher
Harrarr.l Uni1·ersitY
CAMBRIDGE
C:\Jl VE}{SJTY Pl{ESS
Eli Upfal
Brmm Unirersity
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt B u i l di ng , Trumpington Street. Cambridge. U n i ted Ki ngdom
CAMBRIDGE UNIVERSITY PRESS
The Edinburgh B u i ldi ng. Cambridge C B 2 2RU. U K
40 West 20th Street. New York . N Y IOCIII-421 1 . USA
4 7 7 Wi l l i amstown Road. Port M e l bourne. VIC 3207. Austra l i a
R u i z de Alarcon 1 3 . 2801 4 \1adrid. S p a i n
Dock House, The Waterfront. C a p e Town 8001 . South Africa
http://www.cambridge .org
© M i chael M i tzenmacher and El i l'pfal 2005
This book is in copyright . S ubject to statutory exception and
to the provi sions of relevant c o l l e ctive l i ce n s i n g agreements.
no reproduction of any part may take p l ace \\ ithout
the wri tten perm i s s i o n of Cambridge U n i versity Press.
First pub l i shed 2005
Printed i n the U n ited States of America
Type{ace Times 10. 5 /1 3 pt .
System A M S -TEX
[FH]
A catalog record for this book is available from the British Library.
LibrarY of Congress Cataloging in Publication data
M i tzenmacher, M ichael, 1969Probab i l ity and computi ng : rando m i zed al gorithms and probab i l i stic
analysis I M i chael M itze n m acher. E l i Upfa l .
p.
em.
Incl udes i ndex.
I S B N 0 -521-8354 0 - 2 ( a l k . paper)
I. Al gorithms. 2. Probabi l i ties. 3. Stochastic anal y s i s . I . U p fa l . Eli. 1954-. J l . Title.
QA2 74 . M 5 74
2005
518'.1 - dc22
ISBN 0 5 21 8 3 540 2 hardback
2004054540
Contents
page
Preface
xiii
Events and Probability
2
1.1
Application: Verifying Polynomial Identities
1.2
Axioms of Probability
1.3
Application: Verifying Matrix Multiplication
1.4
Application: A Randomized Min-Cut Algorithm
1.5
Exercises
Discrete Random Variables and Expectation
20
2.1
20
22
23
25
26
30
32
34
38
Random Variables and Expectation
2.1.1
Linearity of Expectations
2.1.2
Jensen's Inequality
2.2
The Bernoulli and Binomial Random Variables
2.3
Conditional Expectation
2.4
The Geometric Distribution
2.4.1
3
1
3
8
12
14
Example: Coupon Collector's Problem
2.5
Application: The Expected Run-Time of Quicksort
2.6
Exercises
Moments and Deviations
44
3.1
Markov's Inequality
3.2
Variance and Moments of a Random Variable
3.3
Chebyshev's Inequality
44
45
48
48
50
52
53
54
57
3.2.1
3.3.1
3.4
3.5
Example: Variance of a Binomial Random Variable
Example: Coupon Collector's Problem
Application: A Randomized Algorithm for Computing the Median
3.4.1
The Algorithm
3.4.2
Analysis of the Algorithm
Exercises
vii
C ONTENTS
4
Chernoff Bounds
61
4.1
Moment Generating Functions
4.2
Deriving and Applying Chernoff Bounds
61
63
63
67
67
69
71
72
73
78
83
4.2.1
Chernoff Bounds for the Sum of Poisson Trials
4.2.2
Example: Coin Flips
4.2.3
Application: Estimating a Parameter
4.3
Better Bounds for Some Special Cases
4.4
Application: Set Balancing
4.5* Application: Packet Routing in Sparse Networks
4.6
4.5.1
Permutation Routing on the Hypercube
4.5.2
Permutation Routing on the Butterfly
Exere ises
5 Balls, Bins, and Random Graphs
5.1
Example: The Birthday Paradox
5.2
Balls into Bins
5.3
5.2.1
The Balls-and-Bins Model
5.2.2
Application: Bucket Sort
The Poisson Distribution
5.3.1
5.4
90
Limit of the Binomial Distribution
The Poisson Approximation
5.4.1 * Example: Coupon Collector's Problem, Revisited
5.5
5.6
6
Application: Hashing
5.5.1
Chain Hashing
5.5.2
Hashing: Bit Strings
5.5.3
Bloom Filters
5.5.4
Breaking Symmetry
Random Graphs
5.6.1
Random Graph Models
5.6.2
Application: Hamiltonian Cycles in Random Graphs
5.7
Exercises
5.8
An Exploratory Assignment
90
92
92
93
94
98
99
1 04
1 06
1 06
1 08
1 09
1 12
112
1 12
1 13
1 18
1 24
The Probabilistic Method
1 26
6.1
The Basic Counting Argument
6.2
The Expectation Argument
1 26
1 28
1 29
1 30
131
133
1 33
1 34
1 34
1 35
6.3
6.4
6.5
6.2.1
Application: Finding a Large Cut
6.2.2
Application: Maximum Satisfiability
Derandomization Using Conditional Expectations
Sample and Modify
6.4.1
Application: Independent Sets
6.4.2
Application: Graphs with Large Girth
The Second Moment Method
6.5.1
Application: Threshold Behavior in Random Graphs
viii
CONTENTS
6.6
The Conditional Expectation Inequality
6.7
The Lovasz Local Lemma
6.8*
6.7.1
Application: Edge-Disjoint Paths
6.7.2
Application: Satistiability
Explicit Constructions
6.8.1
7
Application: A Satistiability Algorithm
6.9
Lovasz Local Lemma: The General Case
6.10
Exercises
:\Iarkov Chains and Random Walks
1 53
7. 1
15 3
156
159
1 63
166
167
173
1 74
176
177
182
7.2
Markov Chains: Definitions and Representations
7.1.1
Application: A Randomized Algorithm for 2-Satisfiability
7.1.2
Application: A Randomized Algorithm for 3-Satisfiability
Classification of States
7.2.1
7.3
7.4
Example: The Gambler's Ruin
Stationary Distributions
7.3.1
Example: A Simple Queue
Random Walks on Undirected Graphs
7.4.1
8
the Local Lemma
Application: An
7.5
Parrondo's Paradox
7.6
Exercises
Connectivity Algorithm
Continuous Distributions and the Poisson Process
1 88
8.1
1 88
188
191
193
1 94
196
1 97
1 99
20 1
204
205
207
210
212
213
216
216
219
Continuous Random Variables
8. 1. 1
8.1.2
8.2
8.3
Probability Distributions in R
Joint Distributions and Conditional Probability
The Uniform Distribution
8.2.1
Additional Properties of the Uniform Distribution
The Exponential Distribution
8.3.1
Additional Properties of the Exponential Distribution
8.3.2* Example: Balls and Bins with Feedback
8.4
8.5
8.6
The Poisson Process
8.4.1
Interarrival Distribution
8.4.2
Combining and Splitting Poisson Processes
8.4.3
Conditional Arrival Time Distribution
Continuous Time Markov Processes
Example: Markovian Queues
8.6.1
8.6.2
8.6.3
8.7
tJ
1 36
138
141
1 42
1 42
1 43
1 46
1 48
MIMI I Queue in Equilibrium
MIMI IlK Queue in Equilibrium
The Number of Customers in an lvfI MIx Queue
Exercises
Entropy, Randomness, and Information
225
9.1
The Entropy Function
9.2
Entropy and Binomial Coefficients
225
228
ix
CONT E NTS
9.3
Entropy: A Measure of Randomness
9.4
Compression
9.5* Coding: Shannon's Theorem
9.6
10
Exercises
The Monte Carlo Method
25 2
10.1
The Monte Carlo Method
10.2
Application: The DNF Counting Problem
25 2
255
255
257
259
263
265
267
270
10.2.1
The Na"ive Approach
10.2.2
A Fully Polynomial Randomized Scheme for DNF Counting
10.3
From Approximate Sampling to Approximate Counting
10.4
The Markov Chain Monte Carlo Method
10.4.1
11*
12
The Metropolis Algorithm
10.5
Exercises
10.6
An Exploratory Assignment on Minimum Spanning Trees
Coupling of Markov Chains
27 1
11.1
Variation Distance and Mixing Time
11.2
Coupling
27 1
274
275
276
277
278
28 1
282
286
289
11.2.1
Example: Shuffling Cards
11.2.2
Example: Random Walks on the Hypercube
11.2.3
Example: Independent Sets of Fixed Size
11.3
Application: Variation Distance Is Nonincreasing
11.4
Geometric Convergence
11.5
Application: Approximately Sampling Proper Colorings
11.6
Path Coupling
11.7
Exercises
Martingales
295
12.1
Martingales
12.2
Stopping Times
295
297
299
300
303
305
305
307
308
308
309
12.2.1
12.3
Example: A Ballot Theorem
Wald "s Equation
12.4
Tail Inequalities for Martingales
12.5
Applications of the Azuma-Hoeffding Inequality
12.6
13
230
234
237
245
12.5.1
General Formalization
12.5.2
Application: Pattern Matching
12.5.3
Application: Balls and Bins
12.5.4
Application: Chromatic Number
Exercises
Pairwise Independence and Universal Hash Functions
3 14
13.1
3 14
315
3 16
Pairwise Independence
13.1.1
Example: A Construction of Pairwise Independent Bits
13.1.2
Application: Derandomizing an Algorithm for Large Cuts
X
C ONTENTS
13.1.3
Example: Constructing Pairwise Independent Values Modulo
13.4
Application: Finding Heavy Hitters in Data Streams
13.5
Exercises
317
318
3 19
32 1
323
3 24
326
328
333
Balanced Allocations
336
14.1
336
336
34 1
344
344
345
345
a Prime
13.2
Chebyshev's Inequality for Pairwise Independent Variables
13.2.1
13.3
13.3.1
14 *
Application: Sampling Using Fewer Random Bits
Families of Universal Hash Functions
Example: A 2-Universal Family of Hash Functions
13.3.2
Example: A Strongly 2-Universal Family of Hash Functions
13.3.3
Application: Perfect Hashing
The Power of Two Choices
14.1.1
The Upper Bound
14.2
Two Choices: The Lower Bound
14.3
Applications of the Power of Two Choices
14.4
14.3.1
Hashing
14.3.2
Dynamic Resource Allocation
Exercises
Further Reading
349
Index
350
\·ore:
Asterisks indicate advanced material.
xi
Preface
\\.hy Randomness?
\\.hy should computer scientists study and use randomness? Computers appear to
�have far too unpredictably as it is ! Adding randomness would seemingly be a dis
.JJ\ antage, adding further complications to the already c hallenging task of efficiently
uti l i zi ng computers.
Science has learned in the last century to accept randomness as an essential com
r,)ncnt i n modeling and analyzing nature. In physics, for example, Newton ' s laws led
�l)ple to believe that the universe was a deterministic place ; given a big enough calcu
;..1tnr and the appropriate i nitial conditions, one could determine the location of planets
�cars from now. The development of quantum theory suggests a rather different view ;
the universe stil l behaves according t o laws, but the bac kbone o f these laws i s proba
�IIi stic. " God does not play dice with the universe" was Einstein ' s anecdotal obj ection
:,, modern quantum mechanics. Nevertheless, the prevail i ng theory today for subpar
:I,:k physics is based on random behavior and statistical laws, and randomness plays a
,I�n i ti cant role in almost every other field of science ranging from genetics and evolu
:11 111 i n biology to modeling price fluctuations in a free-market economy.
Computer science is no exception. From the highly theoretical notion of proba
�lli ... tic theorem proving to the very practical design of PC Ethernet cards, randomness
..1nJ probabilistic methods play a key role i n modern computer science . The last two
Jcl..'a des have witnessed a tremendous growth in the use of probabil ity theory in com
puting. Increasingly more advanced and sophisticated probabilistic techniques have
Xcn developed for use within broader and more chal lenging computer science appli
.: ..1ti ons. In this book , we study the fundamental ways in which randomness comes
h' hear on computer science: random ized algorithms and the probabilistic analysis of
..1l�orithms.
Randomized algorithms: Randomized algorithms are algorithms that make random
.:hnices during their execution . In practice, a randomized program would use values
�cncrated by a random number generator to decide the next step at several branches
xiii
PREFACE
of its execution . For example. the protocol implemented i n an Ethernet card uses ran
dom numbers to decide when it next tries to access the shared Ethernet communication
medium . The randomness i s useful for breaking symmetry, preventing different cards
from repeatedly accessing the medium at the same time. Other commonly used ap
plications of randomized algorithms include Monte Carlo simulations and primality
testing in cryptography. In these and many other i mportant applications, randomized
algorithms are significantly more efficient than the best known deterministic solutions.
Furthermore, i n most cases the randomi zed algorithms are also simpler and easier to
program .
These gains come at a price; the answer may have some probability of being incor
rect , or the efficiency i s guaranteed only with some probability. Although it may seem
unusual to design an algorithm that may be i ncorrect , if the probability of error is suf
ficiently small then the i mprovement in speed or memory requ irements may well be
worthwhi le .
Probabilistic analysis qf algorithms: Complexity theory tries t o classify computa
tion problems according to their computational complexity, i n particular di stinguish
ing between easy and hard problems. For example, complexity theory shows that the
Traveling Salesmen problem is NP-hard. It is therefore very unlikely that there is an
algorithm that can solve any instance of the Traveling Salesmen problem in time that
is subexponential in the number of cities. An embarrassing phenomenon for the clas
sical worst- case complexity theory is that the problems it classifies as hard to compute
are often easy to solve in practice. Probabili stic analysis gives a theoretical explanation
for this phenomenon. Although these problems may be hard to solve on some set of
pathological i nputs, on most inputs ( in particular, those that occur in real-life applica
tions) the problem i s actually easy to solve. More precisely, if we think of the i nput as
being randomly selected according to some probability di stribution on the collection of
all possible i nputs, we are very likely to obtain a problem i nstance that is easy to solve,
and i nstances that are hard to solve appear with relatively small probability. Probabilis
tic analysis of algorithms is the method of studying how algorithms perform when the
i nput is taken from a well-defined probabilistic space. As we will see, even NP-hard
problems might have al gorithms that are extremely efficient on almost all inputs.
The Book
This textbook is designed to accompany one- or two -semester courses for advanced
undergraduate or beginning graduate students in computer science and applied math
e matics. The study of randomized and probabilistic techniques in most leading uni
versities has moved from being the subj ect of an advanced graduate seminar meant
for theoreticians to being a regular course geared generally to advanced undergraduate
and beginning graduate students . There are a number of excellent advanced, research
oriented books on this subj ect , but there is a c lear need for an i ntroductory textbook .
\\·e hope that our book satisfies this need.
The textbook has developed from courses on probabi li stic methods in computer sci
ence taught at B rown (CS 1 5 5 ) and Harvard (CS 223) in recent years . The e mphasis
xiv
PREFA C E
1n these courses and in thi s textbook i s on the probabilistic techniques and paradigms,
not on particular applications. Each chapter of the book is devoted to one such method
)r technique . Techniques are clarified though examples based on analyzing random
lied algorithms or developing probabilistic analysis of algorithms on random i nputs.
\ lany of these examples are derived from problems in networking, reflecting a promi
nent trend in the networking field (and the taste of the authors).
The book contai ns fourteen chapters. We may view the book as being divided into
: \\O parts, where the first part (Chapters 1-7 ) compri ses what we believe i s core mate
nal. The book assumes only a basic fami liarity with probability theory, equivalent to
\\hat is covered in a standard course on di screte mathematics for computer scientists.
Chapters 1-3 review this elementary probability theory while introduci ng some i nter
c .... ting applications . Topics covered include random sampling, expectation, Markov ' s
mequality, variance, and Chebyshev's inequality. If the class has sufficient background
m probability, then these chapters can be taught q uickly. We do not suggest skipping
them, however, because they i ntroduce the concepts of random ized algorithms and
rrobabilistic analysis of algorithms and also contain several examples that are used
throughout the text .
Chapters 4-7 cover more advanced topics, including C hernoff bounds, balls-and
t"lins models, the probabili stic method, and Markov chains. The material in these chap
:cr.... is more challenging than in the initial chapters. Sections that are particularly chal
:cnging ( and hence that the in structor may want to con sider skipping ) are marked with
.1n asteri sk. The core material i n the first seven chapters may constitute the bul k of a
--1uarter- or semester-long course, depending on the pace .
The second part of the book (Chapters 8-1 4 ) covers additional advanced material
that can be used either to fill out the basic course as necessary or for a more advanced
'econd course. These chapters are largely self- contained, so the instructor can choose
the topics best suited to the class. The chapters on continuous probability and en
tropy are perhaps the most appropriate for i ncorporating into the basi c course. Our
Introduction to continuous probability (Chapter 8) focuses on u niform and exponential
Jhtri butions, including examples from queueing theory. Our exami nation of entropy
·Chapter 9) shows how randomness can be measured and how entropy ari ses naturall y
1n t he context o f randomness extraction, compression, and coding.
Chapters 10 and 1 1 cover the Monte Carlo method and coupling, respectively; these
.:hapters are cl osely related and are best taught together. Chapter 1 2, on martingales,
,_·, )\ ers important i ssues on dealing with dependent random variables, a theme that con
tinues in a different vein in Chapter 13 ' s development of pairwise i ndependence and
Jerandomization. Finally, the chapter on balanced allocations (Chapter 1 4 ) covers a
t,)pic close to the authors ' hearts and ties in nicely with ChapterS's analysi s of balls
.tnJ-bi ns problems.
The order of the subjects, especial ly in the fi rst part of the book . corresponds to
�heir relative importance in the algorithmic literature . Thus, for example, the study
,1f Chernoff bounds precedes more fundamental probability concepts such as Markov
,_·hains . However, instructors may choose to teach the chapters in a different order. A
,_·nurse with more emphasis on general stochastic processes, for example, may teach
\ larkov chains (Chapter 7 ) i mmediately after Chapters 1-3, following with the chapter
\
XV
PREFACE
on balls, bins, and random graphs (Chapter 5, omitting the Hamiltonian cycle exam
ple) . Chapter 6 on the probabili stic method could then be skipped, following instead
with continuous probability and the Poisson process (Chapter 8). The material from
Chapter 4 on Chernoff bounds, however, is needed for most of the remaining material .
Most ofthe exercises i n the book are theoreticaL but we have included some program
ming exercises - i ncluding two more extensive exploratory assignments that require
some programming. We have found that occasional programming exercises are often
helpful i n reinforci ng the book's ideas and in adding some variety to the course.
We have decided to restrict the material in this book to methods and techniques based
on rigorous mathematical analysis ; with few exceptions. all claims in this book are fol
lowed by full proofs. Obviously, many extremely useful probabilistic methods do not
fall within this strict category. For example, i n the i mportant area of Monte Carlo meth
ods, most practical solutions are heuristics that have been demonstrated to be effective
and efficient by experimental evaluatio n rather than by rigorous mathematical analy
sis. We have taken the view that, i n order to best apply and understand the strengths
and weaknesses of heuristic methods, a firm grasp of underly i ng probability theory and
rigorous techniques - as we present in this book - is necessary. We hope that students
will appreciate this point of view by the end of the course .
Acknowledgments
Our first thanks go to the many probabilists and computer scientists who developed the
beautiful material covered in thi s book . \\le chose not to overload the textbook with
numerous references to the original papers. Instead. we provide a reference list that
includes a number of excellent books giv i ng bac kground material as well as more ad
vanced di scussion of the topi cs cm·ered here.
The book owes a great deal to the comments and feedback of students and teaching
assi stants who took the courses CS 1 55 at Brown and CS 223 at Harvard. In particu
lar we wish to thank Ari s Anagnostopoulos. Eden Hochbau m , Rob Hunter, and Adam
Kirsch. all of whom read and commented on early drafts of the book .
Speci al thanks to Dick Karp. who u sed a draft of the book in teaching CS 1 74 at
Berkeley during fall 2003 . H i s early comments and corrections were most valuable i n
i mproving the manuscript . Peter Bartlett taught C S 1 74 at Berkeley i n spring 2004,
also providing many corrections and useful comments.
We thank our colleagues who carefully read parts of the manuscript , pointed out
many errors, and suggested important i mprovements in content and presentation: Artur
Czumaj , Alan Frieze, Claire Kenyon . Joe Marks, Sal i l Vadhan , Eric Vigoda , and the
anonymous reviewers who read the manuscript for the publi sher.
We also thank Rajeev Matwani and Prabhakar Raghavan for allowing us to use some
of the exercises in their excellent book Randomized Algorithms.
We are grateful to Lauren Cowles of Cambridge University Press for her editorial
help and advice in preparing and organizing the manuscript .
Writing of this book was supported i n part by NSF ITR Grant no. CCR-0 1 2 1 1 54.
xvi
C HAPTE R ONE
Events and Probability
Thi s chapter introduces the notion of randomized algorithms and reviews some basic
L"oncepts of probability theory i n the context of analyzing the performance of simple
randomized algorithms for verifying algebraic identities and fi nding a minimum cut-set
in a graph .
1 . 1 . Application : Verifying Polynomial Identities
Computers can sometimes makes mistakes, due for example to incorrect programming
or hardware failure. It would be useful to have simple ways to double-check the results
of computations . For some problems, we can use randomness to efficiently verify the
L"orrectness of an output .
Suppose we have a program that multiplies together monomial s. Consider the prob
km of verifying the following identity, which might be output by our program :
(x + l)(x- 2)(x + 3)(x- 4)(x + S)(x- 6 )
;1 x6- 7x3 + 25.
There i s an easy way to verify whether the identity i s correct: multiply together the
terms on the left-hand side and see if the resulting polynomial matches the right-hand
.... ide . In this example, when we multiply all the constant terms on the left , the result
Joes not match the constant term on the right , so the identity cannot be valid. More
generally, given two pol ynomials F(x) and G(x), we can verify the identity
F(x) � G(x)
')
by converting the two polynomials to their canonical forms (2:..::�1=0 c;x;); two polyno
mials are equivalent if and only if all the coefficients in their canonical forms are equal.
From this point on let us assume that , as in our example, F(x) is given as a product
F(x) = TI�1=1(x- a;) and G(x) is given in its canonical form . Transforming F(x) to
its canonical form by consecutively multiplying the i th monomial with the product of
the first i - 1 monomials requires B(d2) multiplications of coefficients. We assume i n
1
EV ENTS AND PROBABILITY
what follows that each multiplication can be performed in constant time, although if
the products of the coefficients grow large then it could conceivably require more than
constant time to add and multiply numbers together.
So far, we have not said anything particularly i nteresting. To check whether the
computer program has multipli ed monomials together correctly, we have suggested
multiplying the monomials together again to check the result . Our approach for check
ing the program is to write another program that does essential ly the same thing we
expect the first program to do. This is certainly one way to double- check a program :
write a second program that does the same thing. and make sure they agree. There
are at least two problems with this approach , both stemming from the idea that there
should be a difference between checki ng a given answer and recomputing i t . First , if
there is a bug in the program that multiplies monomials. the same bug may occur in the
checking program . ( Suppose that the checki ng program was written by the same per
son who wrote the original program!) Second. it stands to reason that we would like
to check the answer i n less time than it takes to try to solve the original problem all
over agai n .
Let us instead uti lize randomness to obtai n a faster method to verify the identity. We
informally explain the algorithm and then set up the formal mathematical framework
for analyzing the algorithm .
Assume that the maximum degree, or the largest exponent oLr. in F(x ) and G ( x ) is
d . The algorithm chooses an integer r uniformly at random in the range { 1 , . . . . l OOd } ,
where b y " uniformly at random" w e mean that all integers are equally l ikely to be
chose n . The algorithm then computes the values F(r) and G(r) . If F(r) f. G (r) the
algorithm decides that the two polynomials are not equivalent , and if F(r) = G (r) the
algorithm decides that the two polynomial s are equivalent .
Suppose that in one computation step the algorithm can generate an i nteger cho
sen uniformly at random i n the range { 1 .
l OOd } . Computing the values of F(r) and
G ( r ) can be done in O ( d ) time. which is faster than computing the canonical form of
F( r ) . The randomized algorithm . however. may give a wrong answer.
How can the al gorithm give the wrong answer?
If F( .r ) = G( .r). then the algorithm gives the correct answer, since it will fi nd that
F(r) = G (r) for any val ue of r.
If F(x ) :=/=- G (x ) and F ( r ) f. G (r ). then the algorithm gives the correct answer since
it has found a case where F( x ) and G ( x ) disagree. Thu s , when the algorithm decides
that the two polynomials are not the same. the answer is always correct .
I f F(x) :=/=- G ( x ) and F(r ) = G (r ) . the algorithm gives the wrong answer. In
other words, it is possible that the algorithm decides that the two polynomials are the
same when they are not . For this error to occur, r must be a root of the equation
F(x ) - G (x ) = 0. The degree of the polynomial F(x) - G ( x ) is no larger than d
and, by the fundamental theorem of algebra , a polynomial of degree up to d has no
more than d roots . Thus, if F( x ) :=/=- G ( x ), then there are no more than d values in the
range { 1 , . . . , lOOd } for which F(r) = G (r ) . Since there are I OOd values in the range
{ 1 , . . . . l OO d } , the chance that the algorithm chooses such a value and returns a wrong
answer is no more than 1 / 100.
. . . •
2
1 . 2 A XIOMS OF PROBAB IL ITY
1.2. Axioms of Probability
We turn now to a formal mathematical setting for analyzing the randomized algorithm .
Any probabilistic statement must refer to the underlying probability space.
Definition 1.1: A probability space has three components:
1.
2.
3.
a sample space Q , which is the set of all possible outcomes of the random process
modeled by the probability space;
a family of sets F representing the allovvable events, where each set in F is a subset
of the sample space Q; and
a probability function Pr: F-+ R satisfying Definition 1.2.
.\n element of Q is called a simple or elementary event .
In the randomized algorithm for verifying pol ynomial identities, the sample space
j.., the set of integers { 1 , . . . , l OO d } . Each choice of an integer r in this range is a simple
c\ ent .
Definition 1.2: A probability function is any function Pr :
tr1llowing conditions:
1.
2.
3.
F
-+
R that sati.�fies the
for any event E , 0 _:::: Pr ( E ) _:::: 1;
Pr ( Q ) = 1; and
for any finite or countably infinite sequence of pairwise mutually disjoint events
£1, E 2 , £ 3 , . . . ,
Pr
(U )
i::0:1
£,
=
L Pr ( £ ,)
i::0:1
In most of this book we will use discrete probability spaces. I n a discrete probability
.... pace the sample space Q is finite or countably infinite, and the family F of allow
..ihle events consists of al l subsets of Q . In a discrete probabil ity space, the probabil ity
function is uniquely defined by the probabilities of the si mple events.
Again , in the randomized algorithm for verifying polynomial identities , each choice
'1f an integer r is a simple event . Since the algorithm chooses the integer uniforml y at
random , al l simple events have equal probability. The sample space has l OOd simple
\?\ ents, and the sum of the probabilities of al l simple events must be 1. Therefore each
,jmple event has probability 1/ 1 00 d .
Because events are sets, w e use standard set theory notation t o express combinations
,1f events. We write £1 n £2 for the occurrence of both £1 and E2 and write £1 U £2
fl)r the occurrence of either £1 or £2 ( or both) . For example, suppose we roll two dice.
If £1 is the event that the first die i s a 1 and £ 2 is the event that the second die is a 1 ,
then £1 n £ 2 denotes the event that both dice are 1 while £1 U £2 denotes the event that
.1t l east one of the two dice l ands on 1 . S i milarly, we write £1 E2 for the occurrence
,1f an event that is in £1 but not in £ 2 . With the same dice example, £1
£2 consists
,)f the event where the first die is a 1 and the second die is not . We use the notation E
-
3
-
EVENTS A N D PROBABIL ITY
as shorthand for Q
E: for example, if E is the event that we obtain an even number
when rol ling a die, then E is the event that we obtain an odd number.
Definition 1 .2 yields the fol l owing obvious lemma.
Lemma 1.1: For any nvo events E1 and E2 .
Proof· From the defi nition ,
Pr ( E I )
Pr ( E 2 )
Pr( £1
U
=
=
)=
n E 2 ) ) + Pr ( n
Pr ( E 2 - ( E 1 n E 2 ) ) + Pr ( E 1 n
Pr ( £ 1 - ( £ 1 n E :J) + Pr ( E2 -
Pr ( E I - ( E I
.
)'
(
)
n
) ) + Pr ( E I
n
E2 ) .
•
The lem ma easily fol lows .
A consequence of Definition 2 i s known as the union bound. Although it is very sim
ple. it is tremendously u sefu l .
Lemma 1.2: For an_v finite or countably infinite sequence of events E 1 , E 2,
• • •
,
Notice that Lemma 1.2 differs from the third part of Definition 1.2 in that Definition 1 .2
i s an equality and requires the events to be pairwise mutually disjoint.
Lemma 1 . 1 can be generalized to the fol lowing equality, often referred to as the
inclusion-exclusion principle.
Lemma 1.3: Ler E1
Pr
• • • • •
E 1 1 be any n events. Then
( LJ ) t
r=l
£,
=
1=!
+
L Pr( E ; n Ej )
Pr(
Pr(
... + (
1)1
it
The proof of the inclusion-exclusion principle is left as Exercise 1.7.
We showed before that the only case in whic h the algorithm may fail to give the cor
rect answer is when the two input polynomials F(x ) and G ( x ) are not equivalent ; the
algorithm then gives an incorrect answer if the random number it chooses is a root of
the polynomial F(x) - G (x). Let E represent the event that the algorithm failed to
give the correct answer. The elements of the set corresponding to E are the roots of the
4
1 . 2 AX IOMS OF PROB A B I L ITY
pol ynomial F ( x ) - G (x ) that are i n the set of integers { 1 , . . . , l OOd } . Since the poly
nomial has no more than d roots i t follows that the event E i ncludes no more than d
-;imp1e events, and therefore
< -- =
Pr (algorithm fai ls) = Pr ( £ ) -
d
l OOd
1
1 00
-.
It may seem unusual to have an algorithm that can return the wrong answer. It may
hdp to thi nk of the correctness of an algorithm as a goal that we seek to optimi ze in
conjuncti on with other goals. In designing an al gorithm, we generally seek to mini
mize the number of computational steps and the memory requi red. Someti mes there is
;.1 trade-off: there may be a faster algorithm that u ses more memory or a s l ower algo
rithm that uses less memory. The randomized al gorithm we have presented gives a
trade-off between correctness and speed. Allowin g al gori thm s that may give an incor
rect answer ( but in a systematic way ) expands the trade-off space avai lable in designing
..1lgori thm s . Rest assured, however, that not all randomized al gorithms give i ncorrect
.mswers, as we shal l see.
For the algorith m j ust described, the algori thm gives the correct answer 99% of the
t i me even when the polynomial s are not equivalent . Can we improve this probability ?
One way i s t o choose the random number r from a larger range o f integers . If o u r sam
rk space is the set of i ntegers { 1 , . . . , lOOOd } , then the probabil ity of a v.Tong answer
1" at most 1 / 1 000 . At some point , however. the range of \·al ues we can use is limited
�y the precision available on the machine on which we nm the al gorithm .
Another approach is to repeat the algorithm mu ltiple times. using different random
\ al ues to test the identity. The property we use here is that the algorithm has a one-sided
t rmr. The algorithm may be wrong only when it outputs that the two pol ynomials are
'-'LIUivalent . If any run yields a number r such that F ( r ) =f. G (r ) , then the polynomi
..d., are not equivalent . Thus, if we repeat the algorithm a number of times and find
Fir)# G (r ) in at least one round of the algorith m , we know that F (x ) and G (x ) are
n�1t equivalent . The algorithm outputs that the two polynomials are equivalent only if
there is equality for all runs.
I n repeating the algorithm we repeatedly choose a random number in the range
l . . . . . 1 OOd } . Repeatedly choosing random numbers according to a given distribution
> general l y referred to as sampling. In this case, we can repeatedly choose random
�l umbers in the range { 1 , . . . , 1 00d } in two ways: we can sample either with replac ement
, 'r 1ritlzout replacement. Sampling with replacement means that we do not remember
·,\ hich numbers we have already tested; each time we run the algorithm . we choose
..: n umber uniformly at random from the range { 1 , . . . , 1 00d } regardless of previous
..:hLlices. so there is some chance we will choose an r that we have chosen on a previ
. 'Lh run . Sampling without replacement means that , once we have chosen a number r,
-.\ e do not allow the number to be chosen on subsequent runs : the number chosen at a
;I\ en iteration is uniform over al l previously unselected numbers .
Let us first consider the case where sampl ing is done with replacement . Assume
:h;1t we repeat the algorithm k times, and that the input polynomials are not equiva
.cnt . What is the probabi lity that in all k iterations our random sampling from the set
l . . . . . l OOd } yields roots of the polynomial F(x ) - G (x ) , resulting in a wrong output
5
E V ENTS AND PROBABILITY
by the algorithm? I f k = 1 , we know that this probability is at most d/ 1 00d = 1 / 1 00 .
If k = 2 , it seems that the probability that the first iteration finds a root is 1 / 1 00 and
the probability that the second iteration finds a root is 1 / 1 00, so the probability that
both iterations fi nd a root is at most 0/ 1 00) 2 . Generalizing, for any k, the probability
of choosing roots for k iterations wou l d be at most ( 1 / 1 00) k .
To formalize this, we introduce the notion of independence.
Definition
1.3:
Tw o events E and F are i ndependent �f and only if
Pr ( En F )
=
Pr ( E ) Pr ( F ) .
·
More generally. events E1 , E 2, . . . , Ek are mutually independent if and only if, for any
subset!� [ L k ] ,
Pr
(n )
iEI
£
=
n Pr ( £1 )
iEI
If our algorithm samples with replacement then in each iteration the algorithm chooses
a random number uniformly at random from the set { 1 , . . . , 1 00d } , and thus the choice i n
o n e iteration is independent o f the choices i n previous iterations. For the case where the
polynomials are not equivalent , let £1 be the event that , on the i th run of the algorithm,
we choose a root r1 such that F(r1 ) - G (r1 ) = 0 . The probability that the algorithm
returns the wrong answer is given by
S ince Pr ( £1 ) is at most d / 1 00d and since the events £ 1 , £2 , . . . , E k are independent ,
the probability that the algorithm gives the wrong answer after k iterations is
k
Pr ( £Jn £_,n
- . . .n £k· ) =
k
d
=
n Pr ( £·I ) < n -I OOd
i=l
-
i=l
( )
1 k
.
1 00
-
The probability of making an error is therefore at most exponentially small in the num
ber of trials .
Now let us consider the case where sampling is done without replacement . I n this
case the probability of choosing a given number i s conditioned on the events of the
previous iterations.
Definition 1.4: The conditional probabi lity that event E occurs given that event F
occurs is
Pr ( En F )
Pr ( E I F ) =
.
Pr ( F )
The cond itiona l probabilit_v is well-defined only zf Pr ( F ) > 0.
Intuitively, w e are looking for the probabi lity o f En F within the set o f events defined
by F. Because F defines our restricted sample space, we normalize the probabilities
by dividing by Pr ( F ) , so that the sum of the probabilities of all events is 1. When
Pr ( F) > 0, the definition can also be written in the useful form
Pr ( E I F ) Pr (F )
6
=
Pr ( £n F ) .
1.2 AX IOMS OF PRO B A B I L IT Y
�otice that , when E and F are independent and Pr (F ) =/=- 0, we have
Pr ( E I F )
=
Pr ( E
nF)
=
Pr (F )
Pr ( E ) Pr (F )
Pr(F )
=
Pr ( £ ) .
This is a property that conditional probabi lity should have ; intuitively, if two events are
mdependent, then information about one event should not affect the probability of the
'econd event .
Again assume that we repeat the algorithm k times and that the input polynomials
.m� not equivalent . What is the probability that in all the k iterations our random sam
pling from the set { 1 , . . . , l OO d } yields roots of the polynomial F (x ) - G (x ) , resulting
m a wrong output by the algorithm?
As in the analysis with replacement, we let Ei be the event that the random num
l'ler ri chosen in the i th iteration of the algorithm is a root of F ( x ) - G ( x ) ; again, the
probabi lity that the algorithm returns the wrong answer is given by
Pr ( £1
n £2 n · · · n Ek )·
-\pplying the definition of conditional probability, we obtain
..md repeating this argument gives
Pr ( E 1
=
n E2 n ·
·
·
n Ek )
Pr ( E 1 ) · Pr ( E 2 I E J ) · Pr ( E3 I E 1
n E 2 ) · · · Pr ( E k I
E1
n E 2 n · · · n E k -I).
Can we bound Pr ( Ei I £1 n £2 n · · · n £1_1)? Recal l that there are at most d values
for which F (r) - G (r) = 0; if trials 1 through j - 1 < d have found j - 1 of them,
rhen when sampling without replacement there are only d - (j - 1 ) values out of the
lOOd - ( j - 1 ) remaining choices for which F (r) - G (r) = 0. Hence
r
Pr ( E·J
1
£1
<
n £ 2 n . . · n E·J -J ) -
d-
(}-
1)
1 00 d - ( j - 1 ) '
..md the probability that the algorithm gives the wrong answer after k _:s d iterations is
t"lounded by
Pr ( £1
n
£2
n · · · n £,.)
"
k
d - ( j - 1)
- n
. 1 OOd - ( j - 1 )
<
j=l
<
-
(-)
1
1 00
k
.
Because (d- ( j - 1 ) )/( l OOd- ( j - 1 ) ) < dj lOOd when j > 1 , our bounds on the prob
..ibility of making an error are actual ly slightly better without replacement . You may
..1l so notice that , if we take d + 1 samples without replacement and the two polynomi
..il s are not equivalent, then we are guaranteed to find an r such that F (r) - G (r) =/=- 0 .
Thus. i n d + 1 iterations w e are guaranteed t o output the correct answer. However, com
2
puting the value of the polynomial at d + 1 points takes 0) (d ) time u sing the standard
..1pproach , which is no faster than fi nding the canonical form deterministically.
Since sampling without replacement appears to give better bounds on the probabil
ity of error, why would we ever want to consider sampling with repl acement? In some
7
- Xem thêm -