[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

900.0. "Color Mapping. Any idea?" by TRN01::CARUSO (Angelo Caruso - SWAS Genoa - Italy) Fri Jul 08 1988 14:06

    This is really a graphics problem, but perhaps can excite ingenuity
    of somebody math oriented...
    
    The problem is mapping in an optimal way a lot of points (about
    one million at most) of a tri-dimensional space to a smaller set
    (typically 16 or 256) of points. Optimal could mean that the sum
    of the distance squared from each point to the point that approximate
    it is minimal.
    
    I don't understand if could be possible to find a direct solution,
    perhaps an heuristic algorithm is more feasible.
    
    Any idea around ?

T.R	Title	User	Personal Name	Date	Lines
900.1		CTCADM::ROTH	If you plant ice you'll harvest wind	`Fri Jul 08 1988 15:59`	4
	I had mailed a routine to you some time ago, but your node does seem notoriously hard to reach. - Jim
900.2		ZFC::DERAMO	For all you do, disk bugs for you.	`Fri Jul 08 1988 19:12`	2
	Re .-1, is it short enough to post here? Or maybe a few of us can forward it using NMAIL. :-)
900.3		CLT::GILBERT		`Tue Jul 12 1988 11:52`	26
	It sounds like an interesting problem. I'd be tempted to try the following approach: Choose 16 (or 256) initially selected points as follows: Choose a point (from amoung the million) that is farthest from any initially selected point. Repeat. Now 'relax' the selected points. For each of the 16 (256) points, determine the 'force' on it by summing the 'forces' from each of the points (amoung the million) that map to it. Each of these forces is simply the distance (or distance squared) between the points, expressed as a vector pointing away from the 'selected' point. Use this total force to determine the direction (and amount) by which each selecetd point should move. Move the 16 (256) selected points, and redetermine to which selected point each (of the million) point is closest. Repeat the relaxation & movement steps. Surely, there's a better way.
900.4	color map algorithm	CTCADM::ROTH	If you plant ice you'll harvest wind	`Tue Jul 12 1988 15:33`	55
	The technique I actually used was to take robust statistics on the gamut of colors by building a KD tree of heirarchically nested bounding boxes in R3 around the image colors. When 16 (or 256) of these have been allocated, you can take the centroid of the points within each box as a first estimate of the color map. To quickly map image colors (a 5122 image has 1/4 million pixels) to the nearest neighbors you can subdivide the color cube into 323 subcubes, initially unoccupied. Then each image pixel is quantized into one of the subcubes (which gives resolution of 15 bits by itself), and a list of 'reachable colors' is created for that cube if one doesn't already exist for it. A color map entry is reachable from within a subcube if it is closer to some point within the cube than any other color. The set of reachable colors includes the color that is closest to the center of the subcube, and in addition any other colors such that the plane bisecting the space between any pair of colors in the list intersects the subcube. On account of the convexity of a cube this happens when not all vertices of the subcube are on the same side of the bisecting plane. The KD tree is used to select an initial set of candidates within a spherical shell around the center of the cube; this takes O(log n) to make the search. Then the cutting plane is used to quickly reduce the tentative set to the true reachable points. The spherical shell has only to be the thickness of the length of the major diagonal; this is easy to incorporate into the basic KD tree search. Statistically only 2 or 3 points are reachable from each subcube, so the technique is an economical way of doing the nearest neighbor search. Relaxation can optionally be used to find a local optimum for the color map. To do this, a pass thru the image is made and the centroid of all image colors that hit each color map entry is accumulated. Then the color map entry is updated to that centroid and the process is repeated. This is referred to as the 'Basic Isodata Algorithm' by pattern recognition workers, but has been rediscovered many times. Diffusive dither is used on the final image mapping; it adds the benifits of halftoning to the process. It simulates the heat (diffusion) equation by dispersing any mapping error on a pixel to the adjacent pixels below and to the side. Some precautions have to be taken because the image colors don't necessarily lie in the convex hull of the assigned color map when doing this. Diffusive dither is very effective - even with only 16 colors images for the VT340 come out amazingly well. - Jim
900.5	sounds like a job for ... cluster analysis	ZFC::DERAMO	Supersedes all previous personal names.	`Tue Jul 12 1988 19:17`	18
	>> The problem is mapping in an optimal way a lot of points (about >> one million at most) of a tri-dimensional space to a smaller set >> (typically 16 or 256) of points. Optimal could mean that the sum >> of the distance squared from each point to the point that approximate >> it is minimal. This sounds a lot like something called "Cluster Analysis". It usually starts with the values of k variables for n objects and clusters them into groups. One technique tries to minimize the sum of squared distance of each point from its cluster's centroid. Another tries to find tree-like hierarchies like you would see on a chart of species evolution. There are books (and statistical packages, probably not public domain, but who knows) available on the subject. Dan
900.6	oops, forgot to add	ZFC::DERAMO	Supersedes all previous personal names.	`Tue Jul 12 1988 19:19`	5
	I forgot to add that I don't think the methods I studied were intended for one million by three data sets. (The three is okay, the one million is too big.) Dan
900.7		ISTG::GOKHMAN	Boris the Bear	`Thu Aug 04 1988 17:12`	14
	Let me venture a guess, it IS a lot like a typical problem of pattern recognition (cluster analysis is one of the terms often used). If so, there are many algorithms to solve it. There has been a good deal of research in Computational Geometry, devoted to just this subject, partitioning of the multidimensional point set into the "nearest neighbor" clusters. In 2d look for the term(and algorithms on) Voronoy Polygons. Look for publications by Shamis from CMU, he published proofs that such clustering may be done in NlogN time for a set of N points by "divide and conquer" algorithms. Boris
900.8		CTCADM::ROTH	If you plant ice you'll harvest wind	`Sun Aug 07 1988 12:08`	36
	Re .-1 It is indeed a cluster analysis problem. I found pattern recognition textbooks to be helpful, and used some of the tools mentioned (kd trees, computational geometry, etc.) in the colormap routine. The following books give a good survey of useful results: Duda, R. O. and Hart, P. E. Pattern recognition and Scene Analysis Wiley, 1973 Tou, J. T. and Gonzalez, R. C. Pattern Recognition Principles Addison Wesley, 1974 For color map quantization proper, Heckberts SIGGRAPH paper is excellent. In addition a recent ACM TOMS paper discusses a variance based refinement in the initial clustering phase that looks good. I experimented with something similar to their idea also. Heckbert, P. Color image quantization for frame buffer display July 1982 SIGGRAPH proceedings Wan, S. J., Wong, K. M. and Prusinkiewicz An algorithm for multidimensional data clustering ACM TOMS June 1988 Clustering is important, but proper dithering and perceptually based error metrics are also very important. It is not necessarily true that an algorithm that minimizes the mean-squared error will give the best looking picture. - Jim
900.9		CTCADM::ROTH	If you plant ice you'll harvest wind	`Sun Aug 07 1988 12:18`	18
	� Look for publications by Shamis from CMU, he published proofs that � such clustering may be done in NlogN time for a set of N points � by "divide and conquer" algorithms. I believe it's Michael Shamos; his Yale PHD thesis was a starting point for much of the recent 'computational geometry' literature. Very little of the computational geometry papers are actually useful in the real world. Most are exercises in asymptotic analysis of algorithms and can't be easily applied in practice. For example, so-called "plane sweep" techniques look really nice on paper, but it is quite difficult to make such techniques work when any lines or faces are coincident, parallel to a certain coordinate axis, or if many lines, faces, etc intersect at a point. Still, there are a lot of good ideas to try... - Jim
900.10	Turbo Vector Quantization	MAMIE::BOTTOMS		`Tue Oct 18 1988 11:12`	12
	If I understand the problem it is also referred to as Vector Quantization. The approach suggested in the book on Image Processing by Gonzales and Wintz is to take the log of the histogram of points and then assign new vectors to the resulting histogram such that the greatest area under the curve is covered by at least one vector. Again, this does not cover the psycho-visual question totally but the log compression does help. It also helps in dealing with the busy work of making sure multiple new vectors are not assigned to the same original vector. Other variations of this approach use a "bin seeding" algorithm but it is essentially the same math with an easier algorithm to implement.