[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

205.0. "Cryptography problem" by AURORA::HALLYB () Fri Jan 04 1985 10:53

Newsgroups: net.crypt
Path: decwrl!decvax!mcnc!unc!ulysses!mhuxr!ihnp4!ihuxi!trough
Subject: Crypt uniqueness
Posted: Wed Jan  2 19:25:43 1985

Suppose an encryption algorithm encodes an n byte string as another n byte
string, using an m byte key.  Further, suppose that the 2**(8*m) encodings are
distributed "uniformly" over the n byte output space (uniform as in "random
uniform", since the algorithm is evidently too complicated to analyze).  Now,
how likely is it that two keys yield the same encoded text?  The Birthday
Paradox (you only need about 24 people to have a 50% chance of shared
birthdays) applies here; I don't have the exact relation, but the 50%
probability point is roughly proportional to the square root of the size of the
space being sampled.  Thus, there should be 50% chance of duplicate keys if the
key length (m) is about half of the text length (n).  I'd be interested if
anyone can reproduce the math here accurately. 

				Chris Scussel
				AT&T Bell Labs

T.R	Title	User	Personal Name	Date	Lines
205.1		SPRITE::OSMAN		`Fri Jan 04 1985 13:32`	4
	As a side question: Is the VAX password encryption algorithm such that someone can demonstrate TWO passwords that both work ?
205.2		TURTLE::GILBERT		`Fri Jan 04 1985 17:16`	26
	I'm confused about what probability was being requested, so I've decided to rephrase the question. Given 2^(8m) random numbers [one per key] uniformly distributed in the range of 1 thru 2^(8n) [the encoded text strings, as numbers], what is the probability that some pair of random numbers are equal? That is, what chance is there for the existance of two keys that encrypt a given input string the same way? Also, for what value of m is this probability roughly 1/2? .. Note that the probability of a given pair of keys encrypting the given text the same way is 1 in 2^(8n). I believe the VMS password encryption algorithm can give the same encrypted value for two different passwords (even if the algorithm uses the same 'salt' in each). Seeing these questions together makes me think that someone is looking for two different passwords that both work. If this is the case, please note that VMS first compresses a password into an eight-byte string by adding bytes together, in a wrap-around fashion, and then does massive crunching on these 64 bits. It's fairly easy to find two passwords that produce the same wrap-around value; methinks something like "HHHHHHHH" and "$$$$$$$$$$$$$$$$" will do it.
205.3		ORPHAN::BRETT		`Sun Jan 06 1985 09:02`	5
	Obviously the VAX p/w generator has this property, since there is 2^64 results of encryption, and 38^31 p/w's (approx. 2^160). /Bevin
205.4		ADVAX::J_ROTH		`Sun Jan 06 1985 23:29`	31
	Let M = (28)m and N = (28)n. The probability of M things selected from a set of N things all being distinct is P = (N!/M!)/(N*M)... the standard 'birthday paradox'. For N and M large Stirling's asymptotic approximation for N! can be used: N! = SQRT(2PIN)(N/E)*N, LOG(N!) = .5LOG(2PIN) + NLOG(N) - N Also, for small X, LOG(1-X) = -X - X2/2 - X3/3 - ... where X will be equal to M/N. Plugging in, dropping terms in LOG(1-M/N) beyond second order, and simplifying brings LOG(P) = -(M2)/(2N) so for a probability of .5, M = SQRT(-2NLN(.5)) = 1.177*SQRT(N) ... very close to the SQRT(N) estimate mentioned. More terms could be included in the series for LOG and N! to give a more accurate answer for smaller N and M, or a ratio M/N not << 1, but this is the basic idea. - Jim
205.5		TAV02::NITSAN		`Mon Jan 07 1985 00:38`	4
	Ofcourse, if you use "block encryption" on long text, you can add some feed back mechanism (e.g. xor previous output with next input or so). This will prevent similar inputs from producing the same outputs, and is easy to reconstruct.
205.6		ORPHAN::BRETT		`Mon Jan 07 1985 16:11`	4
	HHHHHHHH and $$$$$$$$$$$$$$$$ do indeed act as the same p/w on VMS. /Bevin
205.7		SPRITE::OSMAN		`Tue Jan 08 1985 12:08`	22
	It turns out, it's very easy to find duplicate VMS passwords. Your password is stored encrypted into 64 bits, which is far less than necessary to uniquely distinguish between all possiblities. In particular, the password is first subjected to the following compaction into 8 bytes. Suppose the password is SWORDFISH_SCALES_FOR_SALE. Passwords are expanded by adding the ascii values into a quad_word, likewise: S W O R D F I S H _ S C A L E S _ F O R _ S A L + E _______________ a b c d e f g h In this example, d is "R" + "C" + "R". Hence with the above password, you could, for instance, successfully log in by making the R two letters less, namely P, and increase the C and R by one, making them D and S. Therefore with password SWORDFISH_SCALES_FOR_SALE, you can just as "easily" log in with "SWOPDFISH_SDALES_FOS_SALE". Said another way, these two passwords are equivalent, illustrating that there aren't as many passwords for a thief to try as one might think !
205.8		TURTLE::GILBERT		`Tue Jan 08 1985 18:41`	4
	Even after the simple reduction of the password to a quadword and the "salt' added in, I believe the hash function may still cause some different quadwords to hash to the same value, as there's no apparent reason for the hash function to just happen to provide a one-to-one mapping between quadwords.
205.9		SPRITE::OSMAN		`Wed Jan 09 1985 12:21`	11
	If you set your password to PATCHWORK, then you can alternatively type WATCHWORD and it will still work ! If you set your password to GEOLOGIST, then you can alternatively type NEOLOGISM and it will still work ! If you set your password to BLUNDERER, then you can alternatively type PLUNDERED and it will still work ! If you set your password to DENOUNCER, then you can alternatively type RENOUNCED and it will still work !
205.10		AURORA::HALLYB		`Wed Jan 09 1985 18:01`	4
	My goodness, Eric, you must have spent the whole morning pecking away! :-) John
205.11	interesting matching passwords ?	SIERRA::OSMAN	and silos to fill before I feep, and silos to fill before I feep	`Fri Apr 11 1986 17:07`	16
	So far, the only duplicate VMS passwords we've seen are ones that come out the same modulo quadword. Hence they'll necessarily come out the same when the quadword is then encoded. Can anyone find two strings that are DIFFERENT modulo quadword, but encode to the same thing ? I've processed my 80000-word on-line dictionary, and no two of those words fit the bill. However, it may be that longer passwords will prove more fruitful. /Eric p.s. Yeah, right John. As a matter of fact, I just got finished, so I didn't see your note until now. Phew, do my fingers ache ! :])
205.12		CLT::GILBERT	Juggler of Noterdom	`Fri Apr 11 1986 23:38`	14
	The hashing algorithm gives no guarantee that two different quadwords will definitely give two different hashed values. However, it does spread hash them fairly well. Suppose we started trying input quadwords in sequence; we get random quadwords out. Could someone familiar with the birthday problem determine how many random numbers in the range 0 to 2^64-1 should be chosen to have a 50% chance of a collision? Offhand, I'd guess it's rather large, and that over 2^32 random quadwords must be considered before there's a 50% chance of a collision. Note that it would only take about 32000 megabytes to store 2^32 quadwords. - Gilbert
205.13		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Mon Apr 14 1986 16:58`	12
	Re .12: > Could someone familiar with the birthday problem determine how many > random numbers in the range 0 to 2^64-1 should be chosen to have a 50% > chance of a collision? Solve for n: gamma(2^64+1)/(gamma(2^64+1-n)*2^(64n)) = .5. By taking the natural log of both sides, it should be fairly simple to solve the problem with numerical methods and an approximation for ln(gamma(x)). -- edp
205.14		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Tue Apr 15 1986 09:17`	49
	Re .12, .13: Unfortunately, the formula I gave involved 2^64+1-n which is difficult to solve numerically. The most significant digit of solution for n is in the position of the tenth digit of 2^64, which makes the numbers too far apart to handle on most calculators. However, another way to write the formula gives a very simple solution. The probability of no matches after one guess is (1-0/2^64). The probability of no matches after two guesses is (1-0/2^64)(1-1/2^64). After n guesses, the probability of no matches is (1-0/2^64)(1-1/2^64)(1-2/2^64) . . . (1-(n-1)/2^64). We want this probability to equal one-half, so i = n-1 product (1-i/2^64) = .5. i = 0 Taking the natural logarithm of both sides yields i = n-1 sum ln(1-i/2^64) = ln .5. i = 0 For small values of x, ln 1+x is approximately x. For the values we will be using, the approximation is good to at least ten places, according to my HP-41, so we can rewrite the equation as: i = n-1 sum -i/2^64 = ln .5. i = 0 The summation of integers from 0 to some number is well-known, so we have: -((n-1)^2+(n-1))/(2*2^64) = ln .5. With a little algebra, this is equivalent to: n^2 + n - 2^65 ln 2 = 0. (Note the change from ln .5 to ln 2, with a corresponding sign change.) The quadratic equation is simple to solve; it tells us n is 5,056,937,541. -- edp
205.15		ENGINE::ROTH		`Wed Apr 16 1986 22:25`	6
	See the analysis in .4; it reaches the same conclusion. Directly expanding the product as in .14 is a nice idea, for some strange reason it didn't occur to me at the time. Either way rests on using the behaviour of log near 1 for these small probabilities. - Jim
205.16	Used old problem to break in new CPU	SQM::HALLYB	Are all the good ones taken?	`Sat Jan 24 1987 16:25`	31
	I thought I'd verify edp's results (.14) since I was a little concerned about accumulated roundoffs, given the large number of trials required. After 6 days of CPU time (using H-floating arithmetic) on an 8700, I obtained the following numbers. The data below answers the question "how many (N) distinct guesses should I make in order to have probability (p) of guessing a given hashed password, assuming perfectly flat hashing: p N .... .............. .001 192,124,822 .01 608,926,881 .1 1,971,577,271 .2 2,869,241,009 .3 3,627,531,229 .4 4,341,214,012 .5 5,056,937,540 edp/J_ROTH estimated: 5,056,937,541 .6 5,814,220,606 .7 6,664,739,783 .8 7,705,697,797 .9 9,216,853,900 .95 10,512,992,586 .99 13,034,599,788 .999 15,964,059,241 Probably an off-by-1 in the code. With this many H-floating calculations there was the potential for a really gross amount of accumulated roundoff errors, too. Interesting that the results of both the H-floating multiplication and approximation of x ~= ln(1+x) came out so close, if not exact. John
205.17		ENGINE::ROTH		`Sun Jan 25 1987 10:02`	49
	I compared the other entries in John's table - close indeed! - Jim .001 192,124,822 -- 192124821.92 .01 608,926,881 -- 608926881.23 .1 1,971,577,271 -- 1971577271.03 .2 2,869,241,009 -- 2869241008.63 .3 3,627,531,229 -- 3627531228.91 .4 4,341,214,012 -- 4341214011.75 .5 5,056,937,540 -- 5056937540.69 .6 5,814,220,606 -- 5814220606.06 .7 6,664,739,783 -- 6664739783.83 .8 7,705,697,797 -- 7705697797.50 .9 9,216,853,900 -- 9216853901.24 .95 10,512,992,586 -- 10512992586.66 .99 13,034,599,788 -- 13034599789.54 .999 15,964,059,241 -- 15964059242.89 #include stdio #include math main() { double p[15] = { .001, .01, .1, .2, .3, .4, .5, .6, .7, .8, .9, .95, .99, .999 }; char jh[] = { ".001 192,124,822", ".01 608,926,881", ".1 1,971,577,271", ".2 2,869,241,009", ".3 3,627,531,229", ".4 4,341,214,012", ".5 5,056,937,540", ".6 5,814,220,606", ".7 6,664,739,783", ".8 7,705,697,797", ".9 9,216,853,900", ".95 10,512,992,586", ".99 13,034,599,788", ".999 15,964,059,241" }; FILE fp; int i; fp = fopen("crypto.dat", "w"); for (i=0; i<14; i++) fprintf(fp, "%s -- %15.2f\n", jh[i], sqrt(-pow(2.0, 65.0)*log(1.0-p[i]))); fclose(fp); }
205.18	for accuracy phreaks	CLT::GILBERT	eager like a child	`Sun Jan 25 1987 11:15`	16
	For additional accuracy, you could take (from note 205.14): n-1 sum ln(1-i/2^64) = ln .5 i=0 And since: ln(1+x) = x - x^2/2 + x^3/3 - x^4/4 + x^5/5 - x^6/6 + ... You could use Eric's technique with the better approximation: ln(1+x) = x - x^2/2 Has anybody tried to find two different quadwords that hash to the same result?
205.19		ENGINE::ROTH		`Sun Jan 25 1987 14:07`	7
	Re .-1, good point. The only reason I didn't use Eric's approx was because 2^-64 is already about .5E-19 and small compared to the digits of significance in JH's data. - Jim
205.20	How to generate congruent passwords	DENTON::AMARTIN	Alan H. Martin	`Fri Jan 30 1987 19:12`	11
	Re .18: >Has anybody tried to find two different quadwords that hash to the same result? Are you forgetting that topic 114 of CLOSET::HACKERS (q.v.) tells how to generate a congruent password from any other password of 16 characters or less? (Not to mention containing sufficient information to be able to solve the problem for longer passwords, no doubt)? After all, you participated in the discussion. /AHM
205.21		CLT::GILBERT	eager like a child	`Sat Jan 31 1987 23:15`	7
	>Has anybody tried to find two different quadwords that hash to the same result? >Are you forgetting that topic 114 of CLOSET::HACKERS (q.v.) tells how ... Of course not. I persuaded VMS to not use a CRC for the password hash algorithm (as was used in V1 and V2 of VMS), and am quite familiar with the the code. The persuasion, by the way, dumped passwords at 9600 baud.
205.22	you're confusing two different things	VIDEO::OSMAN	and silos to fill before I feep, and silos to fill before I feep	`Mon Feb 02 1987 11:26`	25
	You're confusing the issue here. The VMS password encryption scheme is done in TWO steps: 1) Take user's clear-text and fold it into eight bytesm merely adding each successive character into next byte, and wrap after every eight. 2) Use CRC to encode the resulting eight-byte value. It's only step 1) in which I demonstrated that any short password produces the same folded form as same password appended with UUUUUUUUUUUUUUUUVVVVVVVV. (16 U's andd 8 V's) As far as I know, no one has discovered any two DIFFERENT folded forms that the CRC in step 2 maps to the same thing. In fact, I vaguely recall reading that the CRC has the mathematical property that no two folded forms will CRC to the same thing. I verified (as mentioned either in here or in HACKERS, I forget which), that from an 80000-word english dictionary, no two words CRC to the same thing. /Eric
205.23		JON::MORONEY	Legalize Liberty	`Mon Feb 02 1987 15:55`	13
	Lets reword the challenge: Find 2 VMS passwords such that: 1) if you set your password to the first, you can sign on using the second; (that is, the encrypted form stored in the UAF file gives the same value) 2) "folding" the passwords (step 1 of .22) gives 2 different values. This reduces the problem to the following: Find 2 64 bit values, which when run through the second part of the VMS password encryption algorithm (part 2 of .22) give the same 64 bit value as output. -Mike
205.24	"insufficient data"	SSDEVO::LARY		`Mon Feb 02 1987 16:28`	12
	Interesting to note that in all this discussion the actual hashing function has never been given. I assume that it isn't CRC, since the CRC instruction only produces 32 bits. I also assume that it isn't in the nature of a closely held secret, since depending on that kind of stuff for security never works. My best guess would be that it is the DES 8-byte-munge algorithm, in which case the compute and memory requirements of the problem are more than I can deal with. What is it?
205.25	Not CRC (anymore), Not DES (too slow)	TLE::BRETT		`Mon Feb 02 1987 22:16`	4
	Its a Purdy polynomial - basically a polynomial of order 2^64 with only a few coefficients that are large - computed modulo 2^64. /Bevin
205.26	Background on password encryption	ENGINE::ROTH		`Tue Feb 03 1987 09:45`	5
	See the August 1974 issue of Communications of the ACM, particularly A High Security Login Procedure, by G. B. Purdy - Jim
205.27		TLE::BRETT		`Wed Feb 04 1987 13:07`	4
	Thanks Jim, I didn't know such an article existed and VMS's p/w hash has always intrigued me! /Bevin
205.28	I've got a copy of the VMS code	VIDEO::OSMAN	and silos to fill before I feep, and silos to fill before I feep	`Mon Feb 09 1987 10:29`	5
	I've got the VMS source code for the password encoding. It's only a page or two at the most. I'll be glad to post it or forward it if folks think that's "kosher". /Eric
205.29		CLT::GILBERT	eager like a child	`Mon Feb 09 1987 14:20`	4
	I'd say it's "kosher" to distribute the sources -- the security does not rely on a 'secret algorithm'. However, I'll ask that you not post the source code here (it's a bit largish) -- the hashing of the quadword can be described in just a couple equations.