[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

664.0. "Shuffling algorithms" by EAGLE1::DANTOWITZ (David ...) Mon Feb 09 1987 13:56

	Has anyone done analysis on shuffling algorithms?

	For example, given an array of N elements what is an
	efficient and effective method to shuffle the elements?


	Given a function ran(X) that returns an integer in the range
	[0,X) I use the following shuffle:



	Initialize each element in your destination array to a value
	that is not present in the source array (call it sen).

        For each element in the array choose a number in [0,X).  If that
	entry is filled in the destination array then search forward
	for a free entry.  (On each iteration through the loop the search
	direction is reversed.)


        direction := 1;

	Loop: I from 0 to X-1

	   index := ran(x);

           while dest[index] <> sen do
             BEGIN
               index := index + direction;

               if index < 0 then index := x-1;   { wrap around }

               if index = x then index := 0;     { wrap around }
             END;

           dest[index] := source[i];

           direction := - direction;

        end loop;

T.R	Title	User	Personal Name	Date	Lines
664.1		CLT::GILBERT	eager like a child	`Mon Feb 09 1987 14:32`	16
	I've always liked the following approach. As in .0, let ran(m) return a random integer (with a uniform distribution) in the range 0..m-1. Let swap(x,y) exchange the values of x and y. ! Algorithm to perform an in-place shuffle of an n-element array, ! x[1..n]. ! for i := n downto 1 do swap (x[i], x[ran(i+1)]); Note that care must be used in creating the ran(m) function -- it's tempting to use a linear-congruential random number generator, and just take that integer modulo m. Instead, you should convert the number to a floating-point number in the range [0..1), multiply by m, and take the integer part of the product. This has been discussed elsewhere.
664.2	Peter, you're a card!	SQM::HALLYB	Are all the good ones taken?	`Mon Feb 09 1987 15:01`	7
	Any reason why you use ran(i+1) instead of ran(n+1) in the .1 algorithm? It seems to me that restricting the target range would somehow bias the deck -- e.g., card 1 has two chances to swap for card 51 but only one chance to swap for card 52. But this is just off the top of my head, it may all work out... "double-down" John
664.3	Ante Up!	COMET::ROBERTS	Dwayne Roberts	`Mon Feb 09 1987 15:57`	20
	(To borrow and modify Peter's example in .1:) ! Algorithm to perform an in-place shuffle of an n-element array, ! x[1..n]. ! for i := 1 to n do swap (x[i], x[1+ran(n)]); RE .2: A randomly shuffled deck of cards is not made more random by re-shuffling it, so it shouldn't make any difference if a card is swapped once or a thousand times. Note that the APL language has a built-in dyadic operator "?" (pronounced "deal") that does a shuffle. For example, the statement DECK _ 52 ? 52 would create a 52-element array named "DECK" with the integers from 1 to 52 "randomly" distributed within it.
664.4	Shuffle by sorting	JON::MORONEY	Legalize Liberty	`Mon Feb 09 1987 17:15`	13
	One shuffling algorithm I used follows. Create 2 arrays, A[N] and B[N]. N is the size of the "deck". Step 1: Fill array A with integers from 1 to N. Doesn't matter if they're in order. Step 2: Fill array B with random numbers, using your favorite random number generator that returns a real x such that 0 <= x < 1.0. Step 3: Using your favorite sorting algorithm with 1 modification, sort array B. The modification is, whenever you swap 2 numbers B[I] and B[J], you also swap A[I] and A[J]. Step 4: When the numbers in B are sorted, those in A are shuffled. -Mike
664.5		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Mon Feb 09 1987 19:26`	12
	The method presented in .0 does not give all permutations an equal chance of being selected. For example, the second number you move has a 2/n chance of going immediately after the first one, when this chance should be 1/(n-1). The method in .4 requires unnecessary time and space, since the method Gilbert presented in .1 achieves the same effect quickly and gives an equal probability to each permutation, subject to the precision of the random-number operation. -- edp
664.6	Skinning cats	CLT::GILBERT	eager like a child	`Mon Feb 09 1987 21:35`	25
	There was a typo in .1 -- it should read: for i := n downto 1 do swap (x[i], x[1+ran(i)]); I find this easiest to prove correct. If the loop order is changed: for i := 1 to n do swap (x[i], x[1+ran(i)]); it's still correct, and you need not know n before processing. Dwayne's algorithm in .3: for i := 1 to n do swap (x[i], x[1+ran(n)]); may well produce a better shuffle for standard (linear-congruential based) random number generators. I've used Mike's algorithm in .4 when shuffling large data files (prepend random numbers, and feed them to a sort utility). Mike's note "Shuffling by Sorting" reminds me of a sort algorithm proposed by Phil Emma -- "Sorting by Shuffling": Check whether the array is in order, and if not, swap two randomly selected elements, and repeat (an early version of the algorithm swapped the two elements only if they were out of order, but that bordered on the practical). Out of curiousity (and because we're getting good at calculating expectations), what's the expected number of swaps to sort three elements? How about 4 elements?
664.7	Looks random, but...	CHOVAX::YOUNG	Back from the Shadows Again,	`Mon Feb 09 1987 21:42`	19
	It is easy to demonstrate that the method listed in .0 is not 'fair'. This kind of 'lookaside' shuffling tends to result in clumps of cards early on. Thus cards the are near each other in the beginning and of the same residue modulo 2 tend to be near each other when done. The method mentioned in .1 was for years my preferred method of doing shuffles, and all that time I believed it to be fair. Then while trying to solve a related problem, I ran several tests to 'prove' that it was fair. Much to my chagrin very clear patterns in the redistribution showed up. I would now say that it is only marginaly fairer than the .0 method. The method mentioned in .4 is new to me, and has great appeal. It is clearly a fair shuffle, and I would bet that it has optimal performance for any fair shuffle. Very clever, this is definitely the method that I plan to use in the future. -- Barry
664.8	Efficiency Counts	COMET::ROBERTS	Dwayne Roberts	`Mon Feb 09 1987 22:01`	6
	.4 should shuffle very well; however, it is only as efficient as the sort used. What's the fastest sort? nlogn? The methods that Peter and I advance are _linear_. There could be a significant difference with large values of n.
664.9	random thoughts...	CHOVAX::YOUNG	Back from the Shadows Again,	`Mon Feb 09 1987 22:54`	23
	Re .8: I would maintain that getting a correct answer was more important than getting an incorrect one efficiently. As for the efficiency of sorting, it is usually considered to be O(Nlog(N)), though there are some radix sorts that are aguably O(N). It has just occured to me though that a fair algorithim with almost O(N) time could be designed by taking the original algorithim in .0 and using the more standard seperate lookaside lists, instead of the simpler 'in-deck' lookaside algorithim. This involves building a seperate list for every 1..N. Then scan up the main array, adding lists with a single element to the output list, skipping those with no elements, and sorting (.4) those with more than one element. This is a fair algorithim, and runs in average time: O(N + N + (N^.5)log(N^.5)) --> O(N) Hows that? (I learn something new every day in this conference 8^) -- Barry
664.10		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Tue Feb 10 1987 08:54`	13
	Re .7: I would like to know more about the problems you found with the method Gilbert presented in .1. Given a truly random random-number generator, Gilbert's procedure gives each possible permutation an exactly equal chance of occurring. The only chance for failure should be in the random-number generator. What did you use, and what sensitivities do the various methods presented in this topic have to the failings of common generators? -- edp
664.11	.1 visits small subset of permutations	MATRIX::ROTH		`Tue Feb 10 1987 12:28`	12
	I agree with .10 in principle, that each permutation is equally likely. However, in practice there are usually vastly fewer random number states available than permutations. For example, shuffling 52 cards there are around 8.052E+67 permutations, but with a 32 bit state word there are only 4.295E+09 random numbers. So you can only visit 1.0/(1.875E+58) of the possible permutations. Given that multiplicative congruential generators are notorious for falling on hyperplanes and so on, such use of the generator may cause problems. Just a guess... - Jim
664.12		MATRIX::ROTH		`Tue Feb 10 1987 12:31`	5
	Of course the comments in .11 apply equally well to the idea of sorting an array of random numbers, but there the 'bad behaviour' of the generator may be less of a problem. - Jim
664.13		CHOVAX::YOUNG	Back from the Shadows Again,	`Tue Feb 10 1987 13:26`	19
	Re .10: I may be dense but it is not apparent to me that Peter's algorithim "gives each possible permutation an exactly equal chance of occuring." If you could (or anyone else), would you post an argument, or proof of this? My observations had nothing to do with the validity of the random number generators. I came to my conclusions by running some simple tests that measured the redistribution results. These tests used several different random-number generators, and most came from Knuth and had very good figures from the spectral test. I will try to recreate these tests, and then submit the results here. While the end result of Peter's method may indeed be subtle, that does not mean that it is fair. Many subtle random-number generators for instance, turned out to have poor distributions. -- Barry
664.14		SQM::HALLYB	Are all the good ones taken?	`Tue Feb 10 1987 14:42`	23
	> I may be dense but it is not apparent to me that Peter's algorithim > "gives each possible permutation an exactly equal chance of occuring." > If you could (or anyone else), would you post an argument, or proof One simple argument goes like this, using a 52-card deck as an example. Let J represent any card in the deck. Clearly card J has a 1/52 chance of ending up in position 52, since that would occur only at the outset (n=52) and once placed there the card doesn't move. (Peter's algorithm) Now we ask ourselves the chance that card J has of ending up in position 51. For this to happen (a) J may not be card 52, and (b) J must be selected for position 51. Recall from probability that P(A & B) = P(A) * P(B\|A). Here, P(a) = 51/52, clearly the probability that card J is _not_ card 52 and P(b) = 1/51, since no matter where J is in the remainder of the deck, it has an even chance of being selected therefrom. Since there are 51 cards to choose from we get 1/51. Multiplying those together gives us the probability 1/52 that J ends up as card 51. Proceed inductively to demonstrate that all ppositions have an even chance of ending up with card J. This easily generalizes to decks of other than 52 cards. The algorithm appears to be as reliable as the random number generator. John
664.15	Just wiggle a little and new ones appear	MODEL::YARBROUGH		`Wed Feb 11 1987 09:08`	14
	> ... each permutation is equally > likely. However, in practice there are usually vastly fewer random number > states available than permutations. For example, shuffling 52 cards > there are around 8.052E+67 permutations, but with a 32 bit state > word there are only 4.295E+09 random numbers. So you can only > visit 1.0/(1.875E+58) of the possible permutations. Given that > multiplicative congruential generators are notorious for falling > on hyperplanes and so on, such use of the generator may cause problems. > You can get more permutations by NOT starting from the same original 'deck' of numbers each time, but retaining the current permutation as the starter for the next. There are any number of simple transformations of the original 'deck' that will produce a different sequence, with a very high probability that any two such sequences has no common permutations.
664.16	Oopps	CHOVAX::YOUNG	Back from the Shadows Again,	`Wed Feb 11 1987 12:51`	19
	Re .14: Well John, you have shown me the error of my obstinate ways. I can now see that Peter's method using the expression: Swap (Card[i], Card[1 + ran(i)]); is indeed fair. I was misled because of its similarity to the method that I had been using which was: Swap (Card[i], Card[1 + ran(n)]); which is NOT fair. Thus Peter's method does seem to be the fastest and simplest fair method. Like I said, I learn something new every day ... 8^) -- Barry
664.17		CLT::GILBERT	eager like a child	`Wed Feb 11 1987 17:54`	8
	Re .16: > I was misled because of its similarity to the method > that I had been using which was: > Swap (Card[i], Card[1 + ran(n)]); > which is NOT fair. Huh? It looks fair to me.
664.18	I want 52! possibilites, not just 2^32	VIDEO::OSMAN	and silos to fill before I feep, and silos to fill before I feep	`Wed Feb 11 1987 23:17`	14
	Dot something or other raised an interesting point, that there are "only" 2^32 maximum random numbers available from MTH$RANDOM (are there 2^32, or are many unreachable, but that's a different topic), but there are 52! desired permutations. So, how do we program VMS to "truly" shuffle a deck ? That is, how do we give an equal chance to every one of the 52! combinations coming up. Perhaps 2^32 is plenty enough shuffles that it's not worth pursuing, but certainly from an academic perspective it's an interesting question. /Eric
664.19	Carry results forward and repeat	MODEL::YARBROUGH		`Thu Feb 12 1987 09:32`	19
	Starting from the sequence 1..52 and a different random seed each time, you can in principle generate 2^31+1 permutations. (The period of the VAX RNG is chosen that way because it is prime. It's not guaranteed that these permutations are all distinct, although it is very probable.) Then it's time to recycle the random numbers. Next time, start each calculation with the last permutation generated by the previous cycle. This gives a different set of 2^32 permutations. Again, it's not guaranteed that there won't be two alike, but if there are, they will have been generated by a different seed. Repeat this process. It will duplicate a sequence of permutations only when the last permutation of the cycle is 1..52. I don't have the energy to figure out what the period of this process is. A group theory expert could tell you. However, if the period of the RNG is indeed prime as it is supposed to be, then there is a good chance that the entire sequence of 52! permutations can be generated this way, and any subsequence can be found from the appropriate seed and initial permutation.
664.20		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Thu Feb 12 1987 09:43`	21
	Re .18: Find a number, a, relatively prime to 52! such that (a-1) is divisible by four and not eight, and by the first powers but not the second powers of 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 39, 41, 43, and 47. Take another number, c, relatively prime to 52!, and your random number generator is Xnext = (a*Xcurrent + c) mod 52!. The potency of this generator is 25, which is enormous, and I would bet it passes the spectral test. However, since we would be using every bit of the generated number, patterns would show up, so it might be better to use "mod 52!^n", where n is some integer greater than one. Use this generator in Gilbert's algorithm: Given a generated number, r[0], let x[i] be the quotient and r[i+1] be the remainder of dividing r[i] by 52!^n/P(52,i+1), as i goes from 0 to 51, and where P(m,n) is the number of permutations of n objects selected from m objects. Return the x's. -- edp
664.21	A reference seen on this...	ENGINE::ROTH		`Wed Feb 25 1987 08:39`	14
	Re .20 - that's pretty elegant. However, there are probably cheaper ways of getting more 'state' into a random number generator, for example an additive generator based on a primitive trinomial will have an astronomical period and only requires a few integer adds per number. I know such generators have been proposed for encryption, but don't have the list handy (many are based on Mersenne primes). What brings this to mind is that I happened to come across this paper yesterday while browsing around for something else - "Faster Methods for Random Sampling", J. Vitter CACM, July, 1984 - Jim
664.22	Typo alert!	ZFC::DERAMO	Daniel V. D'Eramo	`Tue Dec 01 1987 12:40`	11
	Re: .19 2^31 + 1 is not prime; however, 2^31 - 1 is. Re: .20 You do not need to include 39 in the list of numbers, because 3 and 13 are already in the list. (39 = 3 * 13). It's inclusion is not incorrect, it is just redundant. Dan
664.23	Shake well before using.	ZFC::DERAMO	Daniel V. D'Eramo	`Tue Dec 01 1987 18:51`	29
	The last swap of X[1] with itself in .1 can be avoided by for i := n downto 2 swap(X[i], X[random integer in {1,...,i}]) Instead of verifying this by calculating probabilities as in .14, I like to think of this algorithm as putting X[1] through X[n] into a bag, shaking them up, and WITHOUT LOOKING reaching in and taking one. That goes into X[n]. Now repeat this, with the bag now holding i=n-1 elements, and put the randomly selected one into X[i] = X[n-1]. Repeat for i=n-2,...,2 -- that is, until there is just one element left in the bag. Put that one into X[1]. The "bag" at any one point in the loop is {X[1],...,X[i]} and the swap & decrement of i takes a random element out of the bag and puts it into its final resting place in the array. If you did this with a real deck of cards and a real paper bag, then it seems obvious that all of the 52! possible permutations are equally likely [if you shake the bag well enough. :^)]. I would suggest that both this method from .1 [as well as the fair method in .4] could be used to test your favorite random number generator. Shuffle <1,2,3,4,5> a few thousand times and record how many times each permutation comes up. Then run your favorite statistical test on the results. Dan