Tài liệu Probability and computing randomized algorithms and probabilistic analysis

.PDF

366

347

104

sarykim Báo vi phạm

Tải xuống 104

Mô tả:

Tiếng Anh và mức độ quan trọng đối với cuộc sống của học sinh, sinh viên Việt Nam.Khi nhắc tới tiếng Anh, người ta nghĩ ngay đó là ngôn ngữ toàn cầu: là ngôn ngữ chính thức của hơn 53 quốc gia và vùng lãnh thổ, là ngôn ngữ chính thức của EU và là ngôn ngữ thứ 3 được nhiều người sử dụng nhất chỉ sau tiếng Trung Quốc và Tây Ban Nha (các bạn cần chú ý là Trung quốc có số dân hơn 1 tỷ người). Các sự kiện quốc tế , các tổ chức toàn cầu,… cũng mặc định coi tiếng Anh là ngôn ngữ giao tiếp.

Probability and Computing Randomized Algorithms and Probabilistic Analysis Michael Mitzenmacher Eli Upfal Probability and Computing Randomization and probabilistic techniques play an important role in modern com puter science, with appl ications ranging from combinatorial optimization and machine learning to communication networks and secure protocols. This textbook is designed to accompany a one- or two - semester course for advanced undergraduates or beginning graduate students in computer science and applied mathe matics. It gives an excellent introduction to the probabil istic techniques and paradigms used in the development of probabilistic algorithms and analyses . It assumes only an elementary background in discrete mathematics and gives a rigorous yet accessible treatment of the material, with numerous examples and applications. The first half of the book covers core material , including random sampling, expec tations, Markov's inequality, Chebyshev 's inequality, ChernotT bounds, bal ls-and-bins models, the probabilistic method, and Markov chains . I n the second half, the authors delve into more advanced topics such as continuous probability, applications of limited i ndependence, entropy, Markov chain Monte Carlo methods. coupling, martingales, and balanced allocations. With its comprehensive selection of topics, along with many examples and exercises, this book is an indispensable teaching tool. Michael Mitzenmacher is John L. Loeb Associate Professor in Computer Science at Harvard University. He received his Ph . D. from the University of California. Berke ley, in 1 996. Prior to joining Harvard in 1 999, he was a research staff member at Digital Systems Research Laboratory in Palo Alto. He has received an NSF CAREER Award and an Alfred P. S loan Research Fel lowship. In 2002, he shared the IEEE Information Theory Society " B est Paper" Award for his work on error- correcting codes. Eli Upfal is Professor and Chair of Computer Science at Brown University. He received his Ph . D. from the Hebrew University, Jerusale m , Israe l . Prior to joining Brown in 1997 , he was a research staff member at the IBM research division and a professor at the Weizm ann Institute of Science in I srael. His main research interests are randomized computation and probabilistic analysis of algorithms, with applications to optimization al gorithms, communication networks, parallel and distributed computing. and compu tational biology. Probability and Computing Randomized Algorithms and Probabilistic Analysis Michael Mitzenmacher Harrarr.l Uni1·ersitY CAMBRIDGE C:\Jl VE}{SJTY Pl{ESS Eli Upfal Brmm Unirersity PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt B u i l di ng , Trumpington Street. Cambridge. U n i ted Ki ngdom CAMBRIDGE UNIVERSITY PRESS The Edinburgh B u i ldi ng. Cambridge C B 2 2RU. U K 40 West 20th Street. New York . N Y IOCIII-421 1 . USA 4 7 7 Wi l l i amstown Road. Port M e l bourne. VIC 3207. Austra l i a R u i z de Alarcon 1 3 . 2801 4 \1adrid. S p a i n Dock House, The Waterfront. C a p e Town 8001 . South Africa http://www.cambridge .org © M i chael M i tzenmacher and El i l'pfal 2005 This book is in copyright . S ubject to statutory exception and to the provi sions of relevant c o l l e ctive l i ce n s i n g agreements. no reproduction of any part may take p l ace \\ ithout the wri tten perm i s s i o n of Cambridge U n i versity Press. First pub l i shed 2005 Printed i n the U n ited States of America Type{ace Times 10. 5 /1 3 pt . System A M S -TEX [FH] A catalog record for this book is available from the British Library. LibrarY of Congress Cataloging in Publication data M i tzenmacher, M ichael, 1969Probab i l ity and computi ng : rando m i zed al gorithms and probab i l i stic analysis I M i chael M itze n m acher. E l i Upfa l . p. em. Incl udes i ndex. I S B N 0 -521-8354 0 - 2 ( a l k . paper) I. Al gorithms. 2. Probabi l i ties. 3. Stochastic anal y s i s . I . U p fa l . Eli. 1954-. J l . Title. QA2 74 . M 5 74 2005 518'.1 - dc22 ISBN 0 5 21 8 3 540 2 hardback 2004054540 Contents page Preface xiii Events and Probability 2 1.1 Application: Verifying Polynomial Identities 1.2 Axioms of Probability 1.3 Application: Verifying Matrix Multiplication 1.4 Application: A Randomized Min-Cut Algorithm 1.5 Exercises Discrete Random Variables and Expectation 20 2.1 20 22 23 25 26 30 32 34 38 Random Variables and Expectation 2.1.1 Linearity of Expectations 2.1.2 Jensen's Inequality 2.2 The Bernoulli and Binomial Random Variables 2.3 Conditional Expectation 2.4 The Geometric Distribution 2.4.1 3 1 3 8 12 14 Example: Coupon Collector's Problem 2.5 Application: The Expected Run-Time of Quicksort 2.6 Exercises Moments and Deviations 44 3.1 Markov's Inequality 3.2 Variance and Moments of a Random Variable 3.3 Chebyshev's Inequality 44 45 48 48 50 52 53 54 57 3.2.1 3.3.1 3.4 3.5 Example: Variance of a Binomial Random Variable Example: Coupon Collector's Problem Application: A Randomized Algorithm for Computing the Median 3.4.1 The Algorithm 3.4.2 Analysis of the Algorithm Exercises vii C ONTENTS 4 Chernoff Bounds 61 4.1 Moment Generating Functions 4.2 Deriving and Applying Chernoff Bounds 61 63 63 67 67 69 71 72 73 78 83 4.2.1 Chernoff Bounds for the Sum of Poisson Trials 4.2.2 Example: Coin Flips 4.2.3 Application: Estimating a Parameter 4.3 Better Bounds for Some Special Cases 4.4 Application: Set Balancing 4.5* Application: Packet Routing in Sparse Networks 4.6 4.5.1 Permutation Routing on the Hypercube 4.5.2 Permutation Routing on the Butterfly Exere ises 5 Balls, Bins, and Random Graphs 5.1 Example: The Birthday Paradox 5.2 Balls into Bins 5.3 5.2.1 The Balls-and-Bins Model 5.2.2 Application: Bucket Sort The Poisson Distribution 5.3.1 5.4 90 Limit of the Binomial Distribution The Poisson Approximation 5.4.1 * Example: Coupon Collector's Problem, Revisited 5.5 5.6 6 Application: Hashing 5.5.1 Chain Hashing 5.5.2 Hashing: Bit Strings 5.5.3 Bloom Filters 5.5.4 Breaking Symmetry Random Graphs 5.6.1 Random Graph Models 5.6.2 Application: Hamiltonian Cycles in Random Graphs 5.7 Exercises 5.8 An Exploratory Assignment 90 92 92 93 94 98 99 1 04 1 06 1 06 1 08 1 09 1 12 112 1 12 1 13 1 18 1 24 The Probabilistic Method 1 26 6.1 The Basic Counting Argument 6.2 The Expectation Argument 1 26 1 28 1 29 1 30 131 133 1 33 1 34 1 34 1 35 6.3 6.4 6.5 6.2.1 Application: Finding a Large Cut 6.2.2 Application: Maximum Satisfiability Derandomization Using Conditional Expectations Sample and Modify 6.4.1 Application: Independent Sets 6.4.2 Application: Graphs with Large Girth The Second Moment Method 6.5.1 Application: Threshold Behavior in Random Graphs viii CONTENTS 6.6 The Conditional Expectation Inequality 6.7 The Lovasz Local Lemma 6.8* 6.7.1 Application: Edge-Disjoint Paths 6.7.2 Application: Satistiability Explicit Constructions 6.8.1 7 Application: A Satistiability Algorithm 6.9 Lovasz Local Lemma: The General Case 6.10 Exercises :\Iarkov Chains and Random Walks 1 53 7. 1 15 3 156 159 1 63 166 167 173 1 74 176 177 182 7.2 Markov Chains: Definitions and Representations 7.1.1 Application: A Randomized Algorithm for 2-Satisfiability 7.1.2 Application: A Randomized Algorithm for 3-Satisfiability Classification of States 7.2.1 7.3 7.4 Example: The Gambler's Ruin Stationary Distributions 7.3.1 Example: A Simple Queue Random Walks on Undirected Graphs 7.4.1 8 the Local Lemma Application: An 7.5 Parrondo's Paradox 7.6 Exercises Connectivity Algorithm Continuous Distributions and the Poisson Process 1 88 8.1 1 88 188 191 193 1 94 196 1 97 1 99 20 1 204 205 207 210 212 213 216 216 219 Continuous Random Variables 8. 1. 1 8.1.2 8.2 8.3 Probability Distributions in R Joint Distributions and Conditional Probability The Uniform Distribution 8.2.1 Additional Properties of the Uniform Distribution The Exponential Distribution 8.3.1 Additional Properties of the Exponential Distribution 8.3.2* Example: Balls and Bins with Feedback 8.4 8.5 8.6 The Poisson Process 8.4.1 Interarrival Distribution 8.4.2 Combining and Splitting Poisson Processes 8.4.3 Conditional Arrival Time Distribution Continuous Time Markov Processes Example: Markovian Queues 8.6.1 8.6.2 8.6.3 8.7 tJ 1 36 138 141 1 42 1 42 1 43 1 46 1 48 MIMI I Queue in Equilibrium MIMI IlK Queue in Equilibrium The Number of Customers in an lvfI MIx Queue Exercises Entropy, Randomness, and Information 225 9.1 The Entropy Function 9.2 Entropy and Binomial Coefficients 225 228 ix CONT E NTS 9.3 Entropy: A Measure of Randomness 9.4 Compression 9.5* Coding: Shannon's Theorem 9.6 10 Exercises The Monte Carlo Method 25 2 10.1 The Monte Carlo Method 10.2 Application: The DNF Counting Problem 25 2 255 255 257 259 263 265 267 270 10.2.1 The Na"ive Approach 10.2.2 A Fully Polynomial Randomized Scheme for DNF Counting 10.3 From Approximate Sampling to Approximate Counting 10.4 The Markov Chain Monte Carlo Method 10.4.1 11* 12 The Metropolis Algorithm 10.5 Exercises 10.6 An Exploratory Assignment on Minimum Spanning Trees Coupling of Markov Chains 27 1 11.1 Variation Distance and Mixing Time 11.2 Coupling 27 1 274 275 276 277 278 28 1 282 286 289 11.2.1 Example: Shuffling Cards 11.2.2 Example: Random Walks on the Hypercube 11.2.3 Example: Independent Sets of Fixed Size 11.3 Application: Variation Distance Is Nonincreasing 11.4 Geometric Convergence 11.5 Application: Approximately Sampling Proper Colorings 11.6 Path Coupling 11.7 Exercises Martingales 295 12.1 Martingales 12.2 Stopping Times 295 297 299 300 303 305 305 307 308 308 309 12.2.1 12.3 Example: A Ballot Theorem Wald "s Equation 12.4 Tail Inequalities for Martingales 12.5 Applications of the Azuma-Hoeffding Inequality 12.6 13 230 234 237 245 12.5.1 General Formalization 12.5.2 Application: Pattern Matching 12.5.3 Application: Balls and Bins 12.5.4 Application: Chromatic Number Exercises Pairwise Independence and Universal Hash Functions 3 14 13.1 3 14 315 3 16 Pairwise Independence 13.1.1 Example: A Construction of Pairwise Independent Bits 13.1.2 Application: Derandomizing an Algorithm for Large Cuts X C ONTENTS 13.1.3 Example: Constructing Pairwise Independent Values Modulo 13.4 Application: Finding Heavy Hitters in Data Streams 13.5 Exercises 317 318 3 19 32 1 323 3 24 326 328 333 Balanced Allocations 336 14.1 336 336 34 1 344 344 345 345 a Prime 13.2 Chebyshev's Inequality for Pairwise Independent Variables 13.2.1 13.3 13.3.1 14 * Application: Sampling Using Fewer Random Bits Families of Universal Hash Functions Example: A 2-Universal Family of Hash Functions 13.3.2 Example: A Strongly 2-Universal Family of Hash Functions 13.3.3 Application: Perfect Hashing The Power of Two Choices 14.1.1 The Upper Bound 14.2 Two Choices: The Lower Bound 14.3 Applications of the Power of Two Choices 14.4 14.3.1 Hashing 14.3.2 Dynamic Resource Allocation Exercises Further Reading 349 Index 350 \·ore: Asterisks indicate advanced material. xi Preface \\.hy Randomness? \\.hy should computer scientists study and use randomness? Computers appear to �have far too unpredictably as it is ! Adding randomness would seemingly be a dis .JJ\ antage, adding further complications to the already c hallenging task of efficiently uti l i zi ng computers. Science has learned in the last century to accept randomness as an essential com r,)ncnt i n modeling and analyzing nature. In physics, for example, Newton ' s laws led �l)ple to believe that the universe was a deterministic place ; given a big enough calcu ;..1tnr and the appropriate i nitial conditions, one could determine the location of planets �cars from now. The development of quantum theory suggests a rather different view ; the universe stil l behaves according t o laws, but the bac kbone o f these laws i s proba �IIi stic. " God does not play dice with the universe" was Einstein ' s anecdotal obj ection :,, modern quantum mechanics. Nevertheless, the prevail i ng theory today for subpar :I,:k physics is based on random behavior and statistical laws, and randomness plays a ,I�n i ti cant role in almost every other field of science ranging from genetics and evolu :11 111 i n biology to modeling price fluctuations in a free-market economy. Computer science is no exception. From the highly theoretical notion of proba �lli ... tic theorem proving to the very practical design of PC Ethernet cards, randomness ..1nJ probabilistic methods play a key role i n modern computer science . The last two Jcl..'a des have witnessed a tremendous growth in the use of probabil ity theory in com puting. Increasingly more advanced and sophisticated probabilistic techniques have Xcn developed for use within broader and more chal lenging computer science appli .: ..1ti ons. In this book , we study the fundamental ways in which randomness comes h' hear on computer science: random ized algorithms and the probabilistic analysis of ..1l�orithms. Randomized algorithms: Randomized algorithms are algorithms that make random .:hnices during their execution . In practice, a randomized program would use values �cncrated by a random number generator to decide the next step at several branches xiii PREFACE of its execution . For example. the protocol implemented i n an Ethernet card uses ran dom numbers to decide when it next tries to access the shared Ethernet communication medium . The randomness i s useful for breaking symmetry, preventing different cards from repeatedly accessing the medium at the same time. Other commonly used ap plications of randomized algorithms include Monte Carlo simulations and primality testing in cryptography. In these and many other i mportant applications, randomized algorithms are significantly more efficient than the best known deterministic solutions. Furthermore, i n most cases the randomi zed algorithms are also simpler and easier to program . These gains come at a price; the answer may have some probability of being incor rect , or the efficiency i s guaranteed only with some probability. Although it may seem unusual to design an algorithm that may be i ncorrect , if the probability of error is suf ficiently small then the i mprovement in speed or memory requ irements may well be worthwhi le . Probabilistic analysis qf algorithms: Complexity theory tries t o classify computa tion problems according to their computational complexity, i n particular di stinguish ing between easy and hard problems. For example, complexity theory shows that the Traveling Salesmen problem is NP-hard. It is therefore very unlikely that there is an algorithm that can solve any instance of the Traveling Salesmen problem in time that is subexponential in the number of cities. An embarrassing phenomenon for the clas sical worst- case complexity theory is that the problems it classifies as hard to compute are often easy to solve in practice. Probabili stic analysis gives a theoretical explanation for this phenomenon. Although these problems may be hard to solve on some set of pathological i nputs, on most inputs ( in particular, those that occur in real-life applica tions) the problem i s actually easy to solve. More precisely, if we think of the i nput as being randomly selected according to some probability di stribution on the collection of all possible i nputs, we are very likely to obtain a problem i nstance that is easy to solve, and i nstances that are hard to solve appear with relatively small probability. Probabilis tic analysis of algorithms is the method of studying how algorithms perform when the i nput is taken from a well-defined probabilistic space. As we will see, even NP-hard problems might have al gorithms that are extremely efficient on almost all inputs. The Book This textbook is designed to accompany one- or two -semester courses for advanced undergraduate or beginning graduate students in computer science and applied math e matics. The study of randomized and probabilistic techniques in most leading uni versities has moved from being the subj ect of an advanced graduate seminar meant for theoreticians to being a regular course geared generally to advanced undergraduate and beginning graduate students . There are a number of excellent advanced, research oriented books on this subj ect , but there is a c lear need for an i ntroductory textbook . \\·e hope that our book satisfies this need. The textbook has developed from courses on probabi li stic methods in computer sci ence taught at B rown (CS 1 5 5 ) and Harvard (CS 223) in recent years . The e mphasis xiv PREFA C E 1n these courses and in thi s textbook i s on the probabilistic techniques and paradigms, not on particular applications. Each chapter of the book is devoted to one such method )r technique . Techniques are clarified though examples based on analyzing random lied algorithms or developing probabilistic analysis of algorithms on random i nputs. \ lany of these examples are derived from problems in networking, reflecting a promi nent trend in the networking field (and the taste of the authors). The book contai ns fourteen chapters. We may view the book as being divided into : \\O parts, where the first part (Chapters 1-7 ) compri ses what we believe i s core mate nal. The book assumes only a basic fami liarity with probability theory, equivalent to \\hat is covered in a standard course on di screte mathematics for computer scientists. Chapters 1-3 review this elementary probability theory while introduci ng some i nter c .... ting applications . Topics covered include random sampling, expectation, Markov ' s mequality, variance, and Chebyshev's inequality. If the class has sufficient background m probability, then these chapters can be taught q uickly. We do not suggest skipping them, however, because they i ntroduce the concepts of random ized algorithms and rrobabilistic analysis of algorithms and also contain several examples that are used throughout the text . Chapters 4-7 cover more advanced topics, including C hernoff bounds, balls-and t"lins models, the probabili stic method, and Markov chains. The material in these chap :cr.... is more challenging than in the initial chapters. Sections that are particularly chal :cnging ( and hence that the in structor may want to con sider skipping ) are marked with .1n asteri sk. The core material i n the first seven chapters may constitute the bul k of a --1uarter- or semester-long course, depending on the pace . The second part of the book (Chapters 8-1 4 ) covers additional advanced material that can be used either to fill out the basic course as necessary or for a more advanced 'econd course. These chapters are largely self- contained, so the instructor can choose the topics best suited to the class. The chapters on continuous probability and en tropy are perhaps the most appropriate for i ncorporating into the basi c course. Our Introduction to continuous probability (Chapter 8) focuses on u niform and exponential Jhtri butions, including examples from queueing theory. Our exami nation of entropy ·Chapter 9) shows how randomness can be measured and how entropy ari ses naturall y 1n t he context o f randomness extraction, compression, and coding. Chapters 10 and 1 1 cover the Monte Carlo method and coupling, respectively; these .:hapters are cl osely related and are best taught together. Chapter 1 2, on martingales, ,_·, )\ ers important i ssues on dealing with dependent random variables, a theme that con tinues in a different vein in Chapter 13 ' s development of pairwise i ndependence and Jerandomization. Finally, the chapter on balanced allocations (Chapter 1 4 ) covers a t,)pic close to the authors ' hearts and ties in nicely with ChapterS's analysi s of balls .tnJ-bi ns problems. The order of the subjects, especial ly in the fi rst part of the book . corresponds to �heir relative importance in the algorithmic literature . Thus, for example, the study ,1f Chernoff bounds precedes more fundamental probability concepts such as Markov ,_·hains . However, instructors may choose to teach the chapters in a different order. A ,_·nurse with more emphasis on general stochastic processes, for example, may teach \ larkov chains (Chapter 7 ) i mmediately after Chapters 1-3, following with the chapter \ XV PREFACE on balls, bins, and random graphs (Chapter 5, omitting the Hamiltonian cycle exam ple) . Chapter 6 on the probabili stic method could then be skipped, following instead with continuous probability and the Poisson process (Chapter 8). The material from Chapter 4 on Chernoff bounds, however, is needed for most of the remaining material . Most ofthe exercises i n the book are theoreticaL but we have included some program ming exercises - i ncluding two more extensive exploratory assignments that require some programming. We have found that occasional programming exercises are often helpful i n reinforci ng the book's ideas and in adding some variety to the course. We have decided to restrict the material in this book to methods and techniques based on rigorous mathematical analysis ; with few exceptions. all claims in this book are fol lowed by full proofs. Obviously, many extremely useful probabilistic methods do not fall within this strict category. For example, i n the i mportant area of Monte Carlo meth ods, most practical solutions are heuristics that have been demonstrated to be effective and efficient by experimental evaluatio n rather than by rigorous mathematical analy sis. We have taken the view that, i n order to best apply and understand the strengths and weaknesses of heuristic methods, a firm grasp of underly i ng probability theory and rigorous techniques - as we present in this book - is necessary. We hope that students will appreciate this point of view by the end of the course . Acknowledgments Our first thanks go to the many probabilists and computer scientists who developed the beautiful material covered in thi s book . \\le chose not to overload the textbook with numerous references to the original papers. Instead. we provide a reference list that includes a number of excellent books giv i ng bac kground material as well as more ad vanced di scussion of the topi cs cm·ered here. The book owes a great deal to the comments and feedback of students and teaching assi stants who took the courses CS 1 55 at Brown and CS 223 at Harvard. In particu lar we wish to thank Ari s Anagnostopoulos. Eden Hochbau m , Rob Hunter, and Adam Kirsch. all of whom read and commented on early drafts of the book . Speci al thanks to Dick Karp. who u sed a draft of the book in teaching CS 1 74 at Berkeley during fall 2003 . H i s early comments and corrections were most valuable i n i mproving the manuscript . Peter Bartlett taught C S 1 74 at Berkeley i n spring 2004, also providing many corrections and useful comments. We thank our colleagues who carefully read parts of the manuscript , pointed out many errors, and suggested important i mprovements in content and presentation: Artur Czumaj , Alan Frieze, Claire Kenyon . Joe Marks, Sal i l Vadhan , Eric Vigoda , and the anonymous reviewers who read the manuscript for the publi sher. We also thank Rajeev Matwani and Prabhakar Raghavan for allowing us to use some of the exercises in their excellent book Randomized Algorithms. We are grateful to Lauren Cowles of Cambridge University Press for her editorial help and advice in preparing and organizing the manuscript . Writing of this book was supported i n part by NSF ITR Grant no. CCR-0 1 2 1 1 54. xvi C HAPTE R ONE Events and Probability Thi s chapter introduces the notion of randomized algorithms and reviews some basic L"oncepts of probability theory i n the context of analyzing the performance of simple randomized algorithms for verifying algebraic identities and fi nding a minimum cut-set in a graph . 1 . 1 . Application : Verifying Polynomial Identities Computers can sometimes makes mistakes, due for example to incorrect programming or hardware failure. It would be useful to have simple ways to double-check the results of computations . For some problems, we can use randomness to efficiently verify the L"orrectness of an output . Suppose we have a program that multiplies together monomial s. Consider the prob km of verifying the following identity, which might be output by our program : (x + l)(x- 2)(x + 3)(x- 4)(x + S)(x- 6 ) ;1 x6- 7x3 + 25. There i s an easy way to verify whether the identity i s correct: multiply together the terms on the left-hand side and see if the resulting polynomial matches the right-hand .... ide . In this example, when we multiply all the constant terms on the left , the result Joes not match the constant term on the right , so the identity cannot be valid. More generally, given two pol ynomials F(x) and G(x), we can verify the identity F(x) � G(x) ') by converting the two polynomials to their canonical forms (2:..::�1=0 c;x;); two polyno mials are equivalent if and only if all the coefficients in their canonical forms are equal. From this point on let us assume that , as in our example, F(x) is given as a product F(x) = TI�1=1(x- a;) and G(x) is given in its canonical form . Transforming F(x) to its canonical form by consecutively multiplying the i th monomial with the product of the first i - 1 monomials requires B(d2) multiplications of coefficients. We assume i n 1 EV ENTS AND PROBABILITY what follows that each multiplication can be performed in constant time, although if the products of the coefficients grow large then it could conceivably require more than constant time to add and multiply numbers together. So far, we have not said anything particularly i nteresting. To check whether the computer program has multipli ed monomials together correctly, we have suggested multiplying the monomials together again to check the result . Our approach for check ing the program is to write another program that does essential ly the same thing we expect the first program to do. This is certainly one way to double- check a program : write a second program that does the same thing. and make sure they agree. There are at least two problems with this approach , both stemming from the idea that there should be a difference between checki ng a given answer and recomputing i t . First , if there is a bug in the program that multiplies monomials. the same bug may occur in the checking program . ( Suppose that the checki ng program was written by the same per son who wrote the original program!) Second. it stands to reason that we would like to check the answer i n less time than it takes to try to solve the original problem all over agai n . Let us instead uti lize randomness to obtai n a faster method to verify the identity. We informally explain the algorithm and then set up the formal mathematical framework for analyzing the algorithm . Assume that the maximum degree, or the largest exponent oLr. in F(x ) and G ( x ) is d . The algorithm chooses an integer r uniformly at random in the range { 1 , . . . . l OOd } , where b y " uniformly at random" w e mean that all integers are equally l ikely to be chose n . The algorithm then computes the values F(r) and G(r) . If F(r) f. G (r) the algorithm decides that the two polynomials are not equivalent , and if F(r) = G (r) the algorithm decides that the two polynomial s are equivalent . Suppose that in one computation step the algorithm can generate an i nteger cho sen uniformly at random i n the range { 1 . l OOd } . Computing the values of F(r) and G ( r ) can be done in O ( d ) time. which is faster than computing the canonical form of F( r ) . The randomized algorithm . however. may give a wrong answer. How can the al gorithm give the wrong answer? If F( .r ) = G( .r). then the algorithm gives the correct answer, since it will fi nd that F(r) = G (r) for any val ue of r. If F(x ) :=/=- G (x ) and F ( r ) f. G (r ). then the algorithm gives the correct answer since it has found a case where F( x ) and G ( x ) disagree. Thu s , when the algorithm decides that the two polynomials are not the same. the answer is always correct . I f F(x) :=/=- G ( x ) and F(r ) = G (r ) . the algorithm gives the wrong answer. In other words, it is possible that the algorithm decides that the two polynomials are the same when they are not . For this error to occur, r must be a root of the equation F(x ) - G (x ) = 0. The degree of the polynomial F(x) - G ( x ) is no larger than d and, by the fundamental theorem of algebra , a polynomial of degree up to d has no more than d roots . Thus, if F( x ) :=/=- G ( x ), then there are no more than d values in the range { 1 , . . . , lOOd } for which F(r) = G (r ) . Since there are I OOd values in the range { 1 , . . . . l OO d } , the chance that the algorithm chooses such a value and returns a wrong answer is no more than 1 / 100. . . . • 2 1 . 2 A XIOMS OF PROBAB IL ITY 1.2. Axioms of Probability We turn now to a formal mathematical setting for analyzing the randomized algorithm . Any probabilistic statement must refer to the underlying probability space. Definition 1.1: A probability space has three components: 1. 2. 3. a sample space Q , which is the set of all possible outcomes of the random process modeled by the probability space; a family of sets F representing the allovvable events, where each set in F is a subset of the sample space Q; and a probability function Pr: F-+ R satisfying Definition 1.2. .\n element of Q is called a simple or elementary event . In the randomized algorithm for verifying pol ynomial identities, the sample space j.., the set of integers { 1 , . . . , l OO d } . Each choice of an integer r in this range is a simple c\ ent . Definition 1.2: A probability function is any function Pr : tr1llowing conditions: 1. 2. 3. F -+ R that sati.�fies the for any event E , 0 _:::: Pr ( E ) _:::: 1; Pr ( Q ) = 1; and for any finite or countably infinite sequence of pairwise mutually disjoint events £1, E 2 , £ 3 , . . . , Pr (U ) i::0:1 £, = L Pr ( £ ,) i::0:1 In most of this book we will use discrete probability spaces. I n a discrete probability .... pace the sample space Q is finite or countably infinite, and the family F of allow ..ihle events consists of al l subsets of Q . In a discrete probabil ity space, the probabil ity function is uniquely defined by the probabilities of the si mple events. Again , in the randomized algorithm for verifying polynomial identities , each choice '1f an integer r is a simple event . Since the algorithm chooses the integer uniforml y at random , al l simple events have equal probability. The sample space has l OOd simple \?\ ents, and the sum of the probabilities of al l simple events must be 1. Therefore each ,jmple event has probability 1/ 1 00 d . Because events are sets, w e use standard set theory notation t o express combinations ,1f events. We write £1 n £2 for the occurrence of both £1 and E2 and write £1 U £2 fl)r the occurrence of either £1 or £2 ( or both) . For example, suppose we roll two dice. If £1 is the event that the first die i s a 1 and £ 2 is the event that the second die is a 1 , then £1 n £ 2 denotes the event that both dice are 1 while £1 U £2 denotes the event that .1t l east one of the two dice l ands on 1 . S i milarly, we write £1 E2 for the occurrence ,1f an event that is in £1 but not in £ 2 . With the same dice example, £1 £2 consists ,)f the event where the first die is a 1 and the second die is not . We use the notation E - 3 - EVENTS A N D PROBABIL ITY as shorthand for Q E: for example, if E is the event that we obtain an even number when rol ling a die, then E is the event that we obtain an odd number. Definition 1 .2 yields the fol l owing obvious lemma. Lemma 1.1: For any nvo events E1 and E2 . Proof· From the defi nition , Pr ( E I ) Pr ( E 2 ) Pr( £1 U = = )= n E 2 ) ) + Pr ( n Pr ( E 2 - ( E 1 n E 2 ) ) + Pr ( E 1 n Pr ( £ 1 - ( £ 1 n E :J) + Pr ( E2 - Pr ( E I - ( E I . )' ( ) n ) ) + Pr ( E I n E2 ) . • The lem ma easily fol lows . A consequence of Definition 2 i s known as the union bound. Although it is very sim ple. it is tremendously u sefu l . Lemma 1.2: For an_v finite or countably infinite sequence of events E 1 , E 2, • • • , Notice that Lemma 1.2 differs from the third part of Definition 1.2 in that Definition 1 .2 i s an equality and requires the events to be pairwise mutually disjoint. Lemma 1 . 1 can be generalized to the fol lowing equality, often referred to as the inclusion-exclusion principle. Lemma 1.3: Ler E1 Pr • • • • • E 1 1 be any n events. Then ( LJ ) t r=l £, = 1=! + L Pr( E ; n Ej ) Pr( Pr( ... + ( 1)1 it The proof of the inclusion-exclusion principle is left as Exercise 1.7. We showed before that the only case in whic h the algorithm may fail to give the cor rect answer is when the two input polynomials F(x ) and G ( x ) are not equivalent ; the algorithm then gives an incorrect answer if the random number it chooses is a root of the polynomial F(x) - G (x). Let E represent the event that the algorithm failed to give the correct answer. The elements of the set corresponding to E are the roots of the 4 1 . 2 AX IOMS OF PROB A B I L ITY pol ynomial F ( x ) - G (x ) that are i n the set of integers { 1 , . . . , l OOd } . Since the poly nomial has no more than d roots i t follows that the event E i ncludes no more than d -;imp1e events, and therefore < -- = Pr (algorithm fai ls) = Pr ( £ ) - d l OOd 1 1 00 -. It may seem unusual to have an algorithm that can return the wrong answer. It may hdp to thi nk of the correctness of an algorithm as a goal that we seek to optimi ze in conjuncti on with other goals. In designing an al gorithm, we generally seek to mini mize the number of computational steps and the memory requi red. Someti mes there is ;.1 trade-off: there may be a faster algorithm that u ses more memory or a s l ower algo rithm that uses less memory. The randomized al gorithm we have presented gives a trade-off between correctness and speed. Allowin g al gori thm s that may give an incor rect answer ( but in a systematic way ) expands the trade-off space avai lable in designing ..1lgori thm s . Rest assured, however, that not all randomized al gorithms give i ncorrect .mswers, as we shal l see. For the algorith m j ust described, the algori thm gives the correct answer 99% of the t i me even when the polynomial s are not equivalent . Can we improve this probability ? One way i s t o choose the random number r from a larger range o f integers . If o u r sam rk space is the set of i ntegers { 1 , . . . , lOOOd } , then the probabil ity of a v.Tong answer 1" at most 1 / 1 000 . At some point , however. the range of \·al ues we can use is limited �y the precision available on the machine on which we nm the al gorithm . Another approach is to repeat the algorithm mu ltiple times. using different random \ al ues to test the identity. The property we use here is that the algorithm has a one-sided t rmr. The algorithm may be wrong only when it outputs that the two pol ynomials are '-'LIUivalent . If any run yields a number r such that F ( r ) =f. G (r ) , then the polynomi ..d., are not equivalent . Thus, if we repeat the algorithm a number of times and find Fir)# G (r ) in at least one round of the algorith m , we know that F (x ) and G (x ) are n�1t equivalent . The algorithm outputs that the two polynomials are equivalent only if there is equality for all runs. I n repeating the algorithm we repeatedly choose a random number in the range l . . . . . 1 OOd } . Repeatedly choosing random numbers according to a given distribution > general l y referred to as sampling. In this case, we can repeatedly choose random �l umbers in the range { 1 , . . . , 1 00d } in two ways: we can sample either with replac ement , 'r 1ritlzout replacement. Sampling with replacement means that we do not remember ·,\ hich numbers we have already tested; each time we run the algorithm . we choose ..: n umber uniformly at random from the range { 1 , . . . , 1 00d } regardless of previous ..:hLlices. so there is some chance we will choose an r that we have chosen on a previ . 'Lh run . Sampling without replacement means that , once we have chosen a number r, -.\ e do not allow the number to be chosen on subsequent runs : the number chosen at a ;I\ en iteration is uniform over al l previously unselected numbers . Let us first consider the case where sampl ing is done with replacement . Assume :h;1t we repeat the algorithm k times, and that the input polynomials are not equiva .cnt . What is the probabi lity that in all k iterations our random sampling from the set l . . . . . l OOd } yields roots of the polynomial F(x ) - G (x ) , resulting in a wrong output 5 E V ENTS AND PROBABILITY by the algorithm? I f k = 1 , we know that this probability is at most d/ 1 00d = 1 / 1 00 . If k = 2 , it seems that the probability that the first iteration finds a root is 1 / 1 00 and the probability that the second iteration finds a root is 1 / 1 00, so the probability that both iterations fi nd a root is at most 0/ 1 00) 2 . Generalizing, for any k, the probability of choosing roots for k iterations wou l d be at most ( 1 / 1 00) k . To formalize this, we introduce the notion of independence. Definition 1.3: Tw o events E and F are i ndependent �f and only if Pr ( En F ) = Pr ( E ) Pr ( F ) . · More generally. events E1 , E 2, . . . , Ek are mutually independent if and only if, for any subset!� [ L k ] , Pr (n ) iEI £ = n Pr ( £1 ) iEI If our algorithm samples with replacement then in each iteration the algorithm chooses a random number uniformly at random from the set { 1 , . . . , 1 00d } , and thus the choice i n o n e iteration is independent o f the choices i n previous iterations. For the case where the polynomials are not equivalent , let £1 be the event that , on the i th run of the algorithm, we choose a root r1 such that F(r1 ) - G (r1 ) = 0 . The probability that the algorithm returns the wrong answer is given by S ince Pr ( £1 ) is at most d / 1 00d and since the events £ 1 , £2 , . . . , E k are independent , the probability that the algorithm gives the wrong answer after k iterations is k Pr ( £Jn £_,n - . . .n £k· ) = k d = n Pr ( £·I ) < n -I OOd i=l - i=l ( ) 1 k . 1 00 - The probability of making an error is therefore at most exponentially small in the num ber of trials . Now let us consider the case where sampling is done without replacement . I n this case the probability of choosing a given number i s conditioned on the events of the previous iterations. Definition 1.4: The conditional probabi lity that event E occurs given that event F occurs is Pr ( En F ) Pr ( E I F ) = . Pr ( F ) The cond itiona l probabilit_v is well-defined only zf Pr ( F ) > 0. Intuitively, w e are looking for the probabi lity o f En F within the set o f events defined by F. Because F defines our restricted sample space, we normalize the probabilities by dividing by Pr ( F ) , so that the sum of the probabilities of all events is 1. When Pr ( F) > 0, the definition can also be written in the useful form Pr ( E I F ) Pr (F ) 6 = Pr ( £n F ) . 1.2 AX IOMS OF PRO B A B I L IT Y �otice that , when E and F are independent and Pr (F ) =/=- 0, we have Pr ( E I F ) = Pr ( E nF) = Pr (F ) Pr ( E ) Pr (F ) Pr(F ) = Pr ( £ ) . This is a property that conditional probabi lity should have ; intuitively, if two events are mdependent, then information about one event should not affect the probability of the 'econd event . Again assume that we repeat the algorithm k times and that the input polynomials .m� not equivalent . What is the probability that in all the k iterations our random sam pling from the set { 1 , . . . , l OO d } yields roots of the polynomial F (x ) - G (x ) , resulting m a wrong output by the algorithm? As in the analysis with replacement, we let Ei be the event that the random num l'ler ri chosen in the i th iteration of the algorithm is a root of F ( x ) - G ( x ) ; again, the probabi lity that the algorithm returns the wrong answer is given by Pr ( £1 n £2 n · · · n Ek )· -\pplying the definition of conditional probability, we obtain ..md repeating this argument gives Pr ( E 1 = n E2 n · · · n Ek ) Pr ( E 1 ) · Pr ( E 2 I E J ) · Pr ( E3 I E 1 n E 2 ) · · · Pr ( E k I E1 n E 2 n · · · n E k -I). Can we bound Pr ( Ei I £1 n £2 n · · · n £1_1)? Recal l that there are at most d values for which F (r) - G (r) = 0; if trials 1 through j - 1 < d have found j - 1 of them, rhen when sampling without replacement there are only d - (j - 1 ) values out of the lOOd - ( j - 1 ) remaining choices for which F (r) - G (r) = 0. Hence r Pr ( E·J 1 £1 < n £ 2 n . . · n E·J -J ) - d- (}- 1) 1 00 d - ( j - 1 ) ' ..md the probability that the algorithm gives the wrong answer after k _:s d iterations is t"lounded by Pr ( £1 n £2 n · · · n £,.) " k d - ( j - 1) - n . 1 OOd - ( j - 1 ) < j=l < - (-) 1 1 00 k . Because (d- ( j - 1 ) )/( l OOd- ( j - 1 ) ) < dj lOOd when j > 1 , our bounds on the prob ..ibility of making an error are actual ly slightly better without replacement . You may ..1l so notice that , if we take d + 1 samples without replacement and the two polynomi ..il s are not equivalent, then we are guaranteed to find an r such that F (r) - G (r) =/=- 0 . Thus. i n d + 1 iterations w e are guaranteed t o output the correct answer. However, com 2 puting the value of the polynomial at d + 1 points takes 0) (d ) time u sing the standard ..1pproach , which is no faster than fi nding the canonical form deterministically. Since sampling without replacement appears to give better bounds on the probabil ity of error, why would we ever want to consider sampling with repl acement? In some 7

- Xem thêm -

Tài liệu Probability and computing randomized algorithms and probabilistic analysis

Tài liệu liên quan

Tài liệu vừa đăng

Tài liệu xem nhiều nhất