[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::winnt-clusters

Title:	WinNT-Clusters
Notice:	Info directories moved to DECWET::SHARE1$:[NT_CLSTR]
Moderator:	DECWET::CAPPELLOF

Created:	Thu Oct 19 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	863
Total number of notes:	3478

720.0. "DLM support for Wolfpack v1.0?" by ACISS1::TSUCHIYAMA (Gary Tsuchiyama @CPO 447-2812) Fri Mar 28 1997 17:06

    I just got back from a Wolfpack presentation from Microsoft and they
    indicated that they will have a distributed lock manager.  It was a
    little unclear what timeframe it will be released, but it definitely is
    in their plans.
    
    Can anyone confirm and dispute this claim?
    
    Thanks,
    gary

T.R	Title	User	Personal Name	Date	Lines
720.1		CSC32::HOEPNER	A closed mouth gathers no feet	`Fri Mar 28 1997 18:14`	9
	They said they would have a DLM for WOLFPACK? Hmmmm. The last presentation I heard was they had done an industry analysis and determined there was no need for a DLM. They indicated that with 'shared nothing' DLM is not necessary... Mary Jo
720.2	Vendor driven DLM.	ZPOPPN::CSNMURALI		`Sun Mar 30 1997 22:16`	7
	I agree with Mary.. that was what they said in the last presentation I attended... They also mentioned that software vendors might build DLM on their own.. (Oracle is doing so for their Oracle Parallel Server product, which runs on top of WOLPACK...) - Murali
720.3	Some more info on Wolfpack, Compaq and Tandem...	ACISS1::TSUCHIYAMA	Gary Tsuchiyama @CPO 447-2812	`Tue Apr 01 1997 15:39`	29
	I double checked my info with the Microsoft presentor and he claims there will be a DLM in the first release of Wolfpack. I was very surprised to hear this myself. BTW, Compaq and Tandem also did short presentations on their cluster solutions. Compaq stressed low cost and ease of deployment. They will have something called an Integrator which supposedly goes out and probes the hardware to make sure it's supported on their cluster and at the right rev levels. They will also have fibre channel by the end of summer '97. Tandem talked about several interesting contributions to clustering. They briefly mentioned "heartbeat" technology which just sounded like "I'm alive" polling. There was also discussion of ServerWare which automatically clones executables to be run on each processor as load demands. ServerWare is supposed to ship in a year. The most interesting technology from Tandem is ServerNet. With ServerNet Tandem claims a 300 mbyte/sec throughput with a 300 ns latency. The ServerNet architecture will be implemented on a single chip. The presentor also claims that ServerNet will be able to achieve 1.2gbyte/sec throughput "soon" after initial release. Tandem will also ship a fault tolerant 4 processor 200 mhz Pentium Pro server based on their S1000RM server. I would like to know how much of all of this (ie Microsoft, Compaq and Tandem) is marketeering and how much is reality. gary
720.4	Thanks for the info	DECWET::CAPPELLOF	My other brain is a polymer	`Tue Apr 01 1997 20:01`	25
	We have talked to the Microsoft Wolfpack program and product managers, and both say that Wolfpack V1.0 will not ship a DLM. At this point, their attitude is that someone will have to present a very good business case for WHY someone would want a DLM. (What applications would use it?) > Compaq stressed low cost and ease of deployment. They will > have something called an Integrator which supposedly goes out and > probes the hardware to make sure it's supported on their cluster and at > the right rev levels. We already have a similar tool for Digital Clusters called "cluivp" (CLUster Installation Verification Procedure). We will probably ship a version of that for wolfpack too. > With ServerNet Tandem > claims a 300 mbyte/sec throughput with a 300 ns latency. The ServerNet > architecture will be implemented on a single chip. ^^^^^^^ Notice that phrase "will be". Not quite ready yet, but it does look promising. Digital's Memory Channel provides similar throughput and latency. Memory Channel has been used by Digital's UNIX and VMS clusters for over a year now, and we're working on drivers for NT right now.
720.5	Thanks for the feedback. Who are the Wolfpack contacts?	ACISS1::TSUCHIYAMA	Gary Tsuchiyama @CPO 447-2812	`Thu Apr 03 1997 14:22`	14
	Hi Carl, Thanks for your reply. It's interesting how much message distortion occurs between corporate headquarters of any organization, in this case Microsoft, and the field. I'll make sure my local Microsoft contact is aware of the discrepancy and I'll suggest he verify his DLM claim. Could you perhaps give me the name of the Microsoft cluster people that he should contact? This is a Microsoft corporate cluster pitch that this guy is presenting from so Microsoft corporate will have to issue a correction to all their field offices I would assume. Regards, gary
720.6	Why a DLM?	DECWET::CAPPELLOF	My other brain is a polymer	`Thu Apr 03 1997 14:46`	8
	The Product Manager for Wolfpack at Microsoft is Mark Wood. Early in the product cycle (August 1996), MS said they WOULD ship a DLM with Wolfpack V1.0. Now they have changed their message to say they will NOT ship a DLM. If you have customers who express a need for a DLM, can you find out why they need one? What would they use it for?
720.7	Why not a DLM?	42376::BARKER	Careful with that AXP Eugene	`Mon Apr 07 1997 09:20`	15
	> Early in the product cycle (August 1996), MS said they WOULD ship a DLM > with Wolfpack V1.0. Now they have changed their message to say they > will NOT ship a DLM. > > If you have customers who express a need for a DLM, can you find out > why they need one? What would they use it for? Presumably so that the servers could access all disks simultaneously just like a VMScluster. Far better for load balancing and simpler for management too as there would be no need or concept of failing over a disk from one server to another as it would already be accessible. A better question might be. "What customers don't need a DLM?" Nigel
720.8		CSC32::HOEPNER	A closed mouth gathers no feet	`Mon Apr 07 1997 15:27`	40
	The following is a description of shared nothing -- a reason why Microsoft doesn't consider DLM a requirement. Remember that load balancing is not intended to be a feature until a later phase of Wolfpack. Shared Nothing Model In the shared nothing software model, each system within the cluster owns a subset of the resources of the cluster. Only one system may own and access a particular resource at a time, although, on a failure, another dynamically determined system may take ownership of the resource. In addition, requests from clients are automatically routed to the system that owns the resource. For example, if a client request requires access to resources owned by multiple systems, one system is chosen to host the request. The host system analyzes the client request and ships sub-requests to the appropriate systems. Each system executes the sub-request and returns only the required response to the host system. The host system assembles a final response and sends it to the client. A single system request on the host system describes a high-level function (e.g., a multiple data record retrieve) that generates a great deal of system activity (e.g., multiple disk reads) and the associated traffic does not appear on the cluster interconnect until the final desired data is found. By utilizing an application that is distributed over multiple clustered systems, such as a database, overall system performance is not limited by a single machines hardware limitations. The shared disk and shared nothing models can be supported within the same cluster. Some software can most easily exploit the capabilities of the cluster through the shared disk model. This software includes applications and services which require only modest (and read intensive) shared access to data as well as applications or workloads that are very difficult to partition. Applications which require maximum scalability should use the clusters shared nothing support.
720.9	DLM is largely a relic of the past	DECWET::LEES	Will, NTSG DECwest, Seattle	`Mon Apr 07 1997 16:28`	54
	>Presumably so that the servers could access all disks simultaneously just like >a VMScluster. Far better for load balancing and simpler for management too as >there would be no need or concept of failing over a disk from one server to >another as it would already be accessible. >A better question might be. "What customers don't need a DLM?" Let's not confuse the disk access mechanism from the data synchronization mechanism, although they are related. For the disk access models, there is the shared approach where the disk is phyically connected the systems via a storage bus, such as Scsi for Fibre Channel. There is also the shared nothing approach, where the disk is only connected to one or a small subset of the nodes, and the disk blocks are served out to the remaining nodes. Data access performance is important in shared nothing systems so a high speed bus of some kind may be used, such as FDDI or Memory Channel, to ship the blocks around. The benefit of shared nothing in my mind is that generic hardware (servers, network interconnects) scales better and less expensively than dedicated storage buses (disk controller subystems, storage interconects) at some cost in performance. In two models are equivalent, just implemented at different levels. A server provides blocks to clients over an interconnect. The server may be redundant and may support raid algorithms. Assuming that low level data sharing is available, what is the best way to synchonize access to those blocks? A DLM is certainly one way to appoach this. Partitioning the data might be another. Changing the kind of data being distributed might simplify the locking requirements depending on the application. Perhaps we need to rephase the question. Is a shared data model cluster where all noes can access the storage better than a two node failover based cluster? The answer is yes, assuming you want to change your filesystem, database and other disk intensive applications. Is a user-visible DLM the best model for distributed applications? Debatable. How many applications maintain disk block caches? Parallelizing a database or a filesystem is a major effort. Such a thing is difficult to implement, maintain and prove correct. Performance is also an elusive thing since poor lock design will result in lock contention. For some applications, a DLM is the right thing. Oracle Parallel Server comes with its own. What other application needs one? Distributed system toolkits such as ISIS may be a better choice for general application coarse grained parallel programming. Is the question, why shouldn't we make NT clusters be just like VMSclusters? I think the answer is that NT does most of what VMS did in a cheaper, simpler and more extensible way. There is no doubt that VMSclusters was state of the art for its time, but it also required propriety hardware, did not scale well, and was very intertwined with the lowest level operation of the system. True, NT does not have a shared system disk, but it has domains for a common administraton environment and LAN-based protocols for file sharing. Loosely coupled client-server architecture is much easier to extend than tightly coupled low-level synchronization protocols.
720.10		CSC32::HOEPNER	A closed mouth gathers no feet	`Mon Apr 07 1997 16:57`	4
	VMSClusters are great. However, when we start talking terabyte and pentabyte databases in a shared disk environment, the overhead from disk io in a cluster could be overwhelming.
720.11		MOVIES::WIDDOWSON	Rod OpenVMS Engineering. Project Rock	`Tue Apr 08 1997 04:47`	43
	ho hum. Microsoft have all their heaviweights lined up against OpenVMS-type filesystem's. Cutler (who left VMS before V4) goes (sic) 'nonlinear' when they are discussed. Jim Gray who has a huge credibility in the field of databases is proseletizing for shared nothing. For instance, when I challenged him about Oracle's OPS (the flagship shared everything model) he asserted that they actually prefer a explicit load balancing (eg shared nothing) mechanism for their TPC results. That week, someone else from micrsoft told me that the CTO of Oracle was a believer in shared nothing (I have not validated that, I just offer it as an example of the hearts and minds offensive) Technically, there are real concerns in the NT kernel group about the interaction of NT's Cachemanager and a shared everything filesystem. Indeed, the sharing mechanism needs to be very carefully thought through so that you don't thrash your caches. I have to admit that the Wolfpack Cluster people know their stuff and have really examined and thought through what clusters actually mean. So, politically and in marketing terms a shared everything environment is hard. Who has the better marketing, DIGITAL or Microsoft ? Now, Personally speaking I think that there is a place for shared everything, between the ultimate top end cluster and SMP. The important characteristic seems to be the immediate transportability of an environment from Node to Node (dynamic reconfiguration) and the flexibility that this gives the system manager. However, in an environment where every penny of expenditure has to be carefully examined. If Digital does this, what will not get done. This is why way back in .-n, Carl is asking for which customers would be able to take advantage of DLMs and of shared everything storage. So, this is not about the technology, but about the market. You, the people reading this, you are the one with the customer contacts and information which will or will not drive this forward. If you have information on the commercial potential of this, I urge you to contact Carl..
720.12	more ramblings	MPOS01::naiad.mpo.dec.com::mpos01::cerling	I'[email protected]	`Tue Apr 08 1997 11:27`	24
	I think, too, that much of the wailing over the lack of shared everything comes from the VMScluster-knowledgeable people because that is what they know. I have worked with all 4 clustering technologies from Digital (TOPS-20, VMS, UNIX, NT). I had to be persuaded because I understood the shared everything and the shared nothing was new. But having talked to hundreds of customers about everything vs. nothing, about the only people who bring up the idea of shared everything are the VMS people. Then I ask them how they design arrange their data for maximum performance in a VMScluster. Except for rare cases, they always assign the data to a non-shared disk. In other words, they make it shared nothing so they don't have to deal with the overhead of DLM. There are places for both; I'm not arguing that. I think, though that the shared nothing model is being lambasted too quickly. To expect V1 products to be as robust as VMSclusters is a little too much to expect. Shared nothing was the easiest way to introduce clusters to NT in a manner that is very effective for what it is trying to do - providing high-availability to data and services. If the market (other than just former VMScluster people) sees a need for shared everything, I am sure MS will do something about it. tgc
720.13	DLM and shared filesystem are different	DECWET::CAPPELLOF	My other brain is a polymer	`Tue Apr 08 1997 14:18`	14
	I hope that everyone contributing to this discussion also understands that we're talking about 2 separate things: A Distributed Lock Manager, which is one way to enable cooperating applications among nodes in a cluster. A shared filesystem, where several nodes in a cluster access common bits on common disks, and somehow coordinate access among themselves. (One way to coordinate access is to use a Distributed Lock Manager.) I think that eventually, you'll see Microsoft produce an API that allows applications on cluster nodes to cooperate among themselves. I doubt they'll do a shared filesystem, however.
720.14	Microsoft's response-No DLM now or ever.	ACISS1::TSUCHIYAMA	Gary Tsuchiyama @CPO 447-2812	`Wed Apr 16 1997 10:59`	10
	Gary: I exchanged email with Tom Phillips the Wolfpack Product Manager and you are correct we will not be supporting DLM in the first release of Wolfpack and not in any future version as well. We will be removing in references to DLM from our presentations. Dennis
720.15		CSC32::HOEPNER	A closed mouth gathers no feet	`Wed Apr 16 1997 15:54`	6
	I just love being right. Mary Jo
720.16	gloat on ...	BALTMD::LEARY		`Thu Apr 17 1997 22:01`	1

720.17		DECWET::CAPPELLOF	My other brain is a polymer	`Tue Apr 22 1997 21:29`	3
	re .15 Have you ever been wrong?
720.18		CSC32::HOEPNER	A closed mouth gathers no feet	`Wed Apr 23 1997 15:48`	4
	I thought so once, but I was mistaken... ;-} Mary Jo
720.19		DECWET::KOWALSKI	Spent Time. Borrowed time. Stole time. Did time.	`Wed Apr 23 1997 18:32`	4
	>I thought so once, but I was mistaken... ;-} Good thing we don't have to have an AI program analyze THAT one. /m
720.20		CSC32::HOEPNER	A closed mouth gathers no feet	`Wed Apr 23 1997 20:08`	6
	AI. Don't get me started on AI. Having to ship those containers by FEDEX to make sure they get there overnight... Mary Jo
720.21	A i is better than no i at all!	DECWET::KOWALSKI	Spent Time. Borrowed time. Stole time. Did time.	`Thu Apr 24 1997 11:25`	1

720.22	The true meaning of AI??	MPOS01::lurelap.mpo.dec.com::Cerling	Call on the Calvary	`Fri Apr 25 1997 12:03`	15
	I always had problems with the acronym AI. I was raised on a farm. AI (the farmers will know) has a significantly different meaning. The Artificial is the same, but the 'I' part never made any sense, until I started hearing about 'hybrid' computers. tgc p.s. for the non-farmers - AI is artificial insemination.
720.23	We got it, already	BALTMD::LEARY		`Fri Apr 25 1997 23:00`	1

720.24	What about cpu scalability ?	OTOU01::MAIN	Systems Integration-Canada,621-5078	`Thu May 01 1997 17:05`	24
	Perhaps someone can clarify something for me: Wolf Pack addesses availability (2 systems) .. no problem. Now, given a single large database with a cpu intensive application. With Wolf Pack and no DLM, all users access the db via a single node. Now after a few months, the cpu gets peaked out (say a quad P6) ..what are the options? How does one address cpu scalability in a Wolf Pack environment ? In an OpenVMS environment with DLM, one simply drops a new system into the cluster and load sharing begins ie. cpu resources are shared. You do not even need to reboot any other systems. Am I missing something here ? Thx in advance, / Kerry
720.25		TARKIN::LIN	Bill Lin	`Thu May 01 1997 17:21`	6
	re: .24 by OTOU01::MAIN >> What about cpu scalability ? >> Am I missing something here ? Are these rhetorical questions? ;-)
720.26		MOVIES::WIDDOWSON	Rod OpenVMS Engineering. Project Rock	`Fri May 02 1997 08:31`	45
	Words you would not expect to hear from OpenVMS engineering (that's me): "Shared Nothing is intrinsically more scaleable than shared everything" I don't know whether I believe this� but I have had enough conversations with enough people in enough companies to be able to talk the talk on both sides, today I shall talk for Microsoft: The conceptial difference between shared everything and shared nothing when applied to scaling is one of data-division as opposed to algorithmic division. The argument is that all data is susceptable to data division, so as your database scales to a terabyte and beyond you apply an a-priori split and use a TP monitor (& Viper) to channel the job to the processor needed. VI/SAM will serve to pass messages between nodes which are involved in a custom transaction. In a shared everything environment you are primarily constrained by the hardware. Right now the best general purpose (hence affordable) hardware means that in order to take advantage of clusters you can only have 16 member clusters (I know that OpenVMS supports 96, but they are serving out data and so you could claim that it is a software shared everything). The appeal is that by applying algorithmic decomposition you can deal with all the the data on all the nodes. Shared nothing means that caching is easier (especially a write-back environment), is easier to program to and has less communication costs than shared everything. But it does rely on up front static data division to work. Finally Microsoft have won the marketing campaign, have some of the worlds greatest scability experts working for them. And the technical considerations are extremely hard, despite being `immediately obvious'. �I really am agnostic in this battle (these days). I have fought this battle for too long to be anything else. Personally I do hope that theirsd a market in it for our (proven) skills in this field. I urge all DIGITAL people who are interested to: 1) Keep an open mind and 2) Read Greg Pfeister's book 3) Not get caught in techno-religious wars, but only in marketting ones. I shall enjoy watching this develop rod
720.27	But, but ...	OTOU01::MAIN	Systems Integration-Canada,621-5078	`Sun May 04 1997 09:47`	120
	Rod, Thx for the info - I have a few additional issues which hopefully either yourself or someone else here can expand on for me .. This whole debate seems to come down to "does a DLM add value to both the DB and timesharing (thin client/fat server) environments or just the DB environment or neither ? I would suggest both, but for the purposes of this note, will only address the DB environment. >>> The conceptial difference between shared everything and shared nothing when applied to scaling is one of data-division as opposed to algorithmic division. The argument is that all data is susceptable to data division, so as your database scales to a terabyte and beyond you apply an a-priori split and use a TP monitor (& Viper) to channel the job to the processor needed. VI/SAM will serve to pass messages between nodes which are involved in a custom transaction. .... Shared nothing means that caching is easier (especially a write-back environment), is easier to program to and has less communication costs than shared everything. But it does rely on up front static data division to work. >>> I am having a really hard time trying to understand why Microsoft says they dont need a DLM, yet Oracle is building a DLM into their new parallel server. Is Microsoft really not planning a DLM for SQL Server ? Note - arguably Oracle has many of the best DB type folks (who understand real production environments) in the world - not Microsoft, who only recently added row level locking to SQL Server.. It appears that the marketing folks from Microsoft are trying to convince the world they know production db systems better than anyone else, but, in an attemp to keep a somewhat open mind, perhaps someone could expand on the following concerns that I have with the "shared nothing" cluster: 1. Very little flexibility to adapt to reality of changing db work loads. As you stated, "But, it does rely on on up front static data division to work". Now, how many real life Cust db's or work loads remain static ? New tables, indexes are constantly being added, modified, dropped etc, depending on the changing requirements of their business. What works well in the lab, at installation and in theory are usually totally different from what happens when a ton of "no mercy" users are thrown into the equation. Soooo, if the workload changes, and a single node becomes a bottleneck, does this mean a cluster reconfig with down time ? Does it mean a db reorg ? The DB queries are at the mercy of the internal db optimizer, so does this not further complicate the real life performance expectations and trying to assign specific nodes to specific resources? Bottom line - How will the shared nothing cluster be adapted to changing workloads? 2. Load balancing is not part of Wolf Pack phase 1. Fine, but that in effect is saying that it is not scalable. Has Microsoft stated how they will do future load balancing (add scalability) without a DLM ? To expand, if a single cpu is designated to handle one type of workload, what happens when that one workload exceeds the CPU capacity of a single system? Buy a bigger system, ok, but what happens in an Intel Cluster when a quad system gets maxed out on cpu resources - very easy to do with complex db queries and/or thin client / fat server environment ? Adding more systems means nothing - you must rewrite or adapt the application ! Just ask any Customer which would be easier - rewrite an application (especially with the heat on) or just drop another system into the cluster. Remember that while large prod environments are usually concerned primarily (not exclusively) with performance than cost ... SW development and changes are big time cost (time and $'s), while HW is cheap - even with proprietary solutions. $10k are rounding errors in these environments ... Also, keep in mind that while there is overhead in a a shared everything environment, it typically is only a few % points of the overall load and the systems we have today are hugely more powerful than in days gone by. 3. Does the shared nothing cluster consider the implications of VLM (very large memory) DB's and high speed memory interconnects such as memory channel ? It appears that the shared nothing approach is designed to solve VLDB (big db's on disk) only. Big difference between VLM and VLDB. Using a DLM and VLM db (what Oracle is doing) and being able to directly address other systems memory, does this not seem a much more scalable db solution than shared nothing approaches? As a general statement, could it be stated : For the DB environment, the shared nothing proponents assume that Customers have a good understanding of their workload environments and how they will change. It assumes Customers have the expertise to change their application should a single CPU become a bottleneck. For these environments, shared nothing clusters are a good fit. The Oracle and DLM approach is that the workload will be shared across all systems - regardless of how the business workload changes. [I think Oracle are licking their chops right now ..] :-) Thx in advance, Regards, / Kerry
720.28		DECALP::KLAVINS	Ed Klavins, RTR Engineering	`Mon May 05 1997 04:13`	8
	> > I urge all DIGITAL people who are interested to: > 1) Keep an open mind and > 2) Read Greg Pfeister's book Certainly interested -- what is this book? ed
720.29	Still need applications	DECWET::CAPPELLOF	My other brain is a polymer	`Mon May 05 1997 16:49`	24
	.24 says: > Now after a few months, the cpu gets peaked out (say a quad P6) > > ..what are the options? Easy, drop in a quad AlphaServer 4100 instead. That's what Microsoft datacenter is thinking about. Good for us! > In an OpenVMS environment with DLM, one simply drops a new system into > the cluster and load sharing begins ie. cpu resources are shared. True for a timesharing load. How does it work with a client/server database setup? (NT's typical use.) Can you simply drop a CPU into a VMS cluster and have it start processing DB requests? What software is used? (RDB? Oracle?). The point is that VMS DB applications like RDB and Oracle have been taught to use DLM to coordinate access to shared data. There aren't any such NT applications (yet). SO at the moment, it's sort of a chicken and egg situation. No DLM -> no applications to take advantage of it. No DLM-enabled applications -> no need for DLM. We in NT Cluster engineering continue to consider the possibilities of DLM. But it's not like we can write one in a week. (And the VMS one is too "vms-like" to port easily.)
720.30	Pfister's book details	BIGUN::grange.cao.dec.com::eine::haverfieldp	Paul Haverfield @CAO	`Mon May 05 1997 22:35`	18
	In reply to .28: In Search of Clusters - The coming battle in lowly parallel computing by Gregory F. Pfister published by Prentice-Hall ISBN: 0-13-437625-0 and on the cover... 'It was originally conceived as a cluster of dogs battling a Savage Multiheaded Pooch and, of course, winning. However, it could also represent a system administrator's view of many clusters...' sound familiar ??
720.31		MOVIES::WIDDOWSON	Rod OpenVMS Engineering. Project Rock	`Tue May 06 1997 05:13`	38
	>I am having a really hard time trying to understand why Microsoft says >they dont need a DLM, yet Oracle is building a DLM into their new parallel >server. At a top level, by keeping all the information within on processor, all conflict of access is kept within that processor. Hence any `lock management' is non distributed. So a (realtively) heaviweight distributed lock is replaced by a loghtweight semaphore/mutex/whatnot > Is Microsoft really not planning a DLM for SQL Server ? I would be very suprised. > Adding more systems means nothing - you must rewrite or adapt the > application ! Just ask any Customer which would be easier - rewrite an > application (especially with the heat on) or just drop another system > into the cluster. Remember that while large prod environments are usually > concerned primarily (not exclusively) with performance than cost ... This is a two edged sword. In order to take advantage of `just slotting another node in', your application needs to have been written in a shared everything friendly fashion. If you are just taking an application which runs on one node, it is (conceptually) easier to just split the data up. Of course Microsfot have historically been happy to make application writers change their code and seem enthusiatic about making people write Viper in Visual basic... Finally, VLM. Interetsing arguements to be made again on both sides. But think that VLM tends to be addressable by a single node, and hence is intrinsically shared nothing. When memory becomes sharable between processors we usually think of SMP as being the solution, not clusters.. Just some thoughts. rod
720.32	The plot thickens ..	OTOU01::MAIN	Systems Integration-Canada,621-5078	`Tue May 06 1997 10:45`	70
	re: .29 - >>> Easy, drop in a quad AlphaServer 4100 instead. That's what Microsoft datacenter is thinking about. Good for us! >>> Not possible in an Intel based Cluster. You would have to replace all nodes. Of course that is ok to, assuming Cust wants to do this, simply because one node is overloaded cpu wise. The point remains that the application would have to be re-written or db reorganized should a single SMP node resources be exceeded. Given large complex queries in data warehousing environment, topping out a single Intel box will be relatively easy to do. >>> True for a timesharing load. How does it work with a client/server database setup? (NT's typical use.) >> Microsoft is rapidly moving towards adopting thin client - fat server environments. This is not unlike the timesharing of the old days. Citrix has gained a lot of attention recently by allowing thin clients to run app's ON THE SERVER and only send display updates to client. >>> Can you simply drop a CPU into a VMS cluster and have it start processing DB requests? What software is used? (RDB? Oracle?). >>> Yes to both. Numerous other app's also plug-n-play in this environment. >>> The point is that VMS DB applications like RDB and Oracle have been taught to use DLM to coordinate access to shared data. There aren't any such NT applications (yet). SO at the moment,it's sort of a chicken and egg situation. No DLM -> no applications to take advantage of it. >>> Bottom line is in the delivery of performance and availability. You can bet Oracle's marketing will be promoting large DB VLM numbers very soon after NT5 VLM Beta becomes available (even if informal) .. they will also likely be pointing out the advantages of their DLM. How will Wolf Pack scale in thin client / fat server cluster environment ? Answer - tbd .. you need to spread load across all systems and there is no database to partition, only a traditional timesharing load to distribute across all nodes - all of which can't be allowed to write over each other, so ? >>> No DLM-enabled applications -> no need for DLM. >>> Most LAN and UNIX based customer's and vendors do not understand the implications of clusters yet, so their simple answer is why? Just mention the word "data corruption" and see how fast they start to listen.. :-) The big guy's who have been there, done that, simply smile and continue on their merry way .. :-) Should be interesting .. especially as thin client, fat server really starts to heat up .. Regards, / Kerry
720.33	Is the emperor wearing clothes ? :-)	OTOU01::MAIN	Systems Integration-Canada,621-5078	`Tue May 06 1997 11:12`	36
	re: .31 - >>> Finally, VLM. Interetsing arguements to be made again on both sides. But think that VLM tends to be addressable by a single node, and hence is intrinsically shared nothing. >>> Big data warehousing applic's with real scalability issues to consider are using multiple VLM nodes and using memory channel technologies to directly access each others physical memory. This not only speeds up complex db queries, but also drastically reduces amount of expensive IO required to do lock management. At least that's my understanding of what the big guys are up to .. I suspect (not sure), that this is how Oracle is planning to do their 100,000 TPCM benchmark later this year (announced at NT symposium). >>> When memory becomes sharable between processors we usually think of SMP as being the solution, not clusters.. >>> See prev paragraph on VLM, but you raise an interesting point ie. in a Wolf Pack cluster where each node is handling a specific set of resources, a SMP node may not be a good solution. Reason ? In a SMP environment, if each cpu is accessing/updating a common area of memory, then there is going to be lot of cache invalidation activity between cpu's. This means the local cpu's will be going to main memory and/or disk much more often. We shall see .. Regards, / Kerry
720.34	Check the latest TechNet	MPOS01::naiad.mpo.dec.com::mpos01::cerling	I'[email protected]	`Tue May 06 1997 16:00`	14
	Just read the latest TechNet from MS re: SQL in a clustered environment. It isn't clear, but when they start talking about SQL becoming highly parallelized (ugly word, isn't it?) it seems to imply that they will be accessing a single SQL database from multiple nodes in a cluster. Maybe I misread, and of course it is way out in the future, but if you haven't read the latest TechNet, it might shed some other light on the situation. I tend to agree with Kerry; if MS expects to compete with Oracle on its ability to scale, MS will have to some up with some kind of DLM. I am really interested to see how MS will rise to the competition that Oracle will provide when NT5.0 comes out on Alpha. tgc
720.35		WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Wed May 07 1997 15:47`	11
	>>> The point is that VMS DB applications like RDB and Oracle have been taught to use DLM to coordinate access to shared data. There aren't any such NT applications (yet)... Actually, VMS applications simply know how to cope with multiple processes accessing the same data. They use the lock manager to cooperate. When you run them on a cluster, the lock manager becomes distributed, but the applications don't need to know anything about it. How do separate processes cooperate on a single node on NT? Why can't that scale to a cluster the same way it does in VMS?
720.36		MOVIES::WIDDOWSON	Rod OpenVMS Engineering. Project Rock	`Thu May 08 1997 01:05`	58
	>Actually, VMS applications simply know how to cope with multiple processes >accessing the same data. They use the lock manager to cooperate. When >you run them on a cluster, the lock manager becomes distributed, but the >applications don't need to know anything about it. In my experience a large majority of applications actually don't know or care about the locking when accessing data, but just use RMS which does the locking for them - effectively the applications punts responsibility to the filesystem/programming API. > How do separate processes cooperate on a single node on NT? That I cannot say. NT filesystems do export primitives to allow this to take place (range locks and oplocks), but I do not know whether they are used at all or whether people rely on general synchronisation primitives like semaphores and whatnot.. > Why can't that scale to a cluster the same way it does in VMS? In a word, because of write behind caching (actually any caching). When OpenVMS clusters came out the world was used to write through and no caching and so everyone knew that every $QIO was going to the oxide and codes and planned accordingly. Hence the delay in a QIO being serviced was predictable and large. These days every operating system relies on caching to provide performance. Application writers may only be vestigially aware of caching (one hopes that they understand the crash-consistency contraints). As soon as write behind caches hit OpenVMS clusters the time to service a IO changes from being predictable and slow to being unpredictable and anything between lightning fast (returned from memory) and excrutiatingly slow (remote flush and read from the oxide). Trust me, I was there during Spiralog development. The unpredictablbility depends on whether the last write was on this node or another one. So adding an n'th application to second node will not be anything like the scaling advantage that adding it to the same node (in which case all writes are to cache). This is not my argument, this is Tom Miller's (NT FS architect) and Rod Gamache's (NT cluster architect). Yes there are lots of arguments against this but you have to go down to the next level of detail and to some degree assume that one has control over WIN32. [To restate my position, I believe in shared everything, but I would not like to see us careening down a path which is "obvious to all of us", unprepared to argue through a well thought out and differing view. The needs of `dynamic load balancing' as enunciated earlier is (I hope) unasailable, but the `it's obviously scalable' is a harder sell), as is the as yet unmentioned `more obvious availability paradigm' As it turns out tomorrow is my last day here so it's not important to me, but for DEC's (sic) sake I urge that this high quality conversation continues rod]