| WHAT IS LSF JobScheduler?
-------------------------
Introduction
------------
Production job scheduling has been an integral part of mainframe data
processing operation for decades. With the emergence of distributed
computing along with UNIX and NT workstations and fileservers, the system
architecture has changed drastically, calling for a new approach to
production job scheduling.
LSF JobScheduler is a distributed production Job Scheduling product from
Platform Computing Corporation. It is a separately licensed and separately
priced component of LSF, the Load Sharing Facility, a general purpose
distributed computing system that unites a group of computers into a single
system in order to make better use of the resources on a network.
JobScheduler integrates heterogeneous servers into a virtual mainframe to
deliver high availability, robustness and ease-of-use. It provides the
functions of traditional mainframe job scheduler with transparent operation
across a network of heterogeneous UNIX and NT systems.
JobScheduler offers GUI input tools in addition to the standard command line
interface.
FEATURES
--------
Calendar and Event-Driven Scheduling
------------------------------------
In production data processing environments, jobs often need to be processed
repetitively and periodically according to user-defined calendars. Job
processing may also be conditional upon the occurrence of certain events
such as the arrival of a specific file or the availability of a data set.
Calendars can be defined in JobScheduler to drive periodic job processing.
Basic calendars can be combined using logic expressions to form more
sophisticated calendars. Calendars are independent of jobs; jobs can be
associated with calendars.
Job scheduling in JobScheduler can also be driven by arbitrarily configured
network-wide events. This can be used, for example, to detect a change in
the size of a file or the mount of a tape in order to trigger production
jobs.
Fault Tolerance
---------------
JobScheduler is designed to continue operating even if some of the servers
in the system are unavailable. A dynamic master succession algorithm
ensures that as long as one server is up the jobs will continue to be
scheduled on the remaining hosts. Even if the entire network goes down, no
jobs will be lost because all calendars, job records and events are logged
in a configured filesystem. When the system comes back up, it will recover
the state of the JobScheduler and continue operation.
If a server running a job goes down, JobScheduler can be configured to rerun
the job on another server. Additionally, a job that is terminated under
certain configured conditions can be automatically restarted by
JobScheduler. This allows transient error conditions to be overcome without
operator intervention.
Job Dependency, Pre-Processing, and Post-Processing
---------------------------------------------------
JobScheduler allows you to control a job's execution upon the completion,
failure, or start of other jobs. For example, you can configure the system
to start several main processing jobs only after a data preparation job has
completed, then to start the post-processing job after all the main
processing jobs are done.
You can also specify tasks to be executed before or after a executing a job.
This could be used to check the availability of a tape drive or the status
of a data set before starting a job.
Command Set and GUI Tools
-------------------------
JobScheduler provides a rich set of command line and GUI tools to define,
monitor and manage the workload using any desktop as the system console.
Typically you define your calendars and jobs together with any
interdependency using the GUI tools xbcal and xbsub . Once these are setup,
JobScheduler will ensure that jobs are run in accordance with the conditions
and policies specified.
You can keep close track of your jobs with JobScheduler using the GUI
program xlsbatch In addition to monitoring the status of job, the system
allows you to perform various operations on them, including:
o Termination, suspension, and resumption of each run of a job, as
well as removing the entire job from the system.
o Inspect the output of a running job.
o Look at the history of a repetitive job for all its run instances.
o Change any parameter of a job, including switching it from one
queue to another even while it is running.
o Inquire why a job has not been scheduled.
Automatic Load Balancing and Queues
-----------------------------------
With JobScheduler, you can target jobs to specific servers or you can allow
the system to match resource requirements of your jobs to the capabilities
of the servers. Jobs are dynamically scheduled to run on the best server
available. For example, you can submit a job indicating it requires 100
megabytes of temporary storage space before it starts. JobScheduler will
ensure that the server the job is run on satisfies the condition.
JobScheduler allows you to define various types of services by configuring
different queues. For each queue, you can specify a number of parameters
such as priority, load thresholds for job scheduling, limits on the number
of running jobs, time windows for job processing, and limits on job resource
consumption. You can also specify which queue a job should target or allow
JobScheduler to automatically select a queue based on the job's
requirements.
DEFINITIONS
-----------
Clusters
--------
Production job scheduling and load sharing in JobScheduler is based on
clusters.
A cluster is a named group of machines running the LSF server daemons. It
may contain a mixture of server types. One machine is configured as the
master for each cluster. It runs the master scheduler daemon, mbatchd.
The other servers run the slave execution server, sbatchd, which manages
jobs dispatched by the master scheduler. Each server also runs a Load
Information Manager daemon, lim. It monitors the availability of resources
and makes this information available to the master scheduler.
Each cluster has one or more JobScheduler administrators. An administrator
is a user account that has permission to change the JobScheduler
configuration and perform other maintenance functions. The administrator
decides how the servers are grouped together, including determining the
master server.
The master scheduler maintains the status of all entities defined in the
system including jobs, events, calendars, and queues.
Jobs
----
A job is a program or command that is scheduled to run in a specific
environment. A job may consist of a group of related jobs, for example, all
the programs and commands of a payroll process. You can run a job on any
server in the cluster. This allows you easy access to the resources
available on all server types. JobScheduler ensures that the executables
that make up your job are be able to run on the architecture of the target
server.
Each job is assigned a unique job identification number by the system. You
can associate your own job names to make referencing easier.
For a more comprehensive description of jobs see the next section "Defining
Job".
Events
------
Events drive the Production Job Scheduling system.
An event is a change or occurrence in the system, such as the arrival
"creation" of a specific file, a tape becoming on-line, a prior job
completing successfully, or a particular time , that can be used to trigger
jobs. The system responds to four types of events:
o Time events are defined by calendars.
o Job events are the starting and completion of other jobs.
o File events are changes in the files residing in accessible
filesystems.
o Site events are specific occurrences, such as a tape mount,
defined by the JobScheduler administrators for your system.
o When defining a job, it is possible to specify any combination of events
that must be satisfied before the job is considered as eligible for
execution.
Calendars
---------
A calendar consists of a sequence of time events during which a job can be
scheduled. Calendars are defined and manipulated independently of jobs so
that multiple jobs can share the same calendar. Each user can maintain a
private set of calendars, reference calendars of other users, or use the
calendars configured into the system. A calendar can be modified after it
has been created. Any new jobs which are associated with it will
automatically run according to the new definition.
Queues
------
Production job scheduling provides efficient, timely execution of resource
intensive jobs. When you submit a job, it is placed on a list of jobs called
a queue.
The JobScheduler system runs jobs from the queue based on the scheduled time
and when the appropriate resources are available. In JobScheduler, the
queues can have access to all the servers in your cluster.
Your job can run as soon as any suitable server becomes available. You do
not need to hunt around your network to find an idle server.
DEFINING JOBS
-------------
What is a Job?
--------------
A job is one or more processes which have been scheduled to execute on a
specific host or hosts within the cluster once certain conditions have been
satisfied. There are three basic types of jobs in an JobScheduler system:
o A calendar-dependent job. It runs periodically dependent on one of
the calendars defined in the system.
o A job dependent on another repetitive job. This job runs if the
job it depends upon has satisfied the starting conditions.
o A file or site "external" event dependent job. It runs when a
predefined event occurs, such as the size of a file exceeding a
certain limit or a tape being mounted.
When you submit a job to the system, you can make it dependent on one or
more of the conditions above.
JOB ATTRIBUTES
--------------
Job Name and JobID
------------------
You can assign a jobName to your job. It is used to identify a job for
manipulation purpose, such as job dependencies and job grouping. This name
does not have to be unique. Completely different jobs can be assigned the
same name. If you do not supply a name, the system uses the name of the
submitted command as the jobName. When the job is submitted, an integer
called the jobID is assigned to it. The jobID is unique throughout the
JobScheduler cluster.
Dependency Condition
--------------------
A job starts to run if and only if its dependency condition is TRUE. The
dependency condition is specified by a logical "Boolean" expression. The
minimum expression is jobID or jobName, which is equivalent to done "jobID".
An expression can consist of one or more of the reserved keywords,
identifiers, and logic operators.
Keywords
--------
You can select from seven reserved keywords:
calendar - To depend on a calendar, use the calendar condition.
started - To depend on a previous job that has started running or has already
finished, use the started condition.
done - To depend on a previous job that has finished successfully in the
DONE state, use the done condition
exit - To depend on a previous job that has finished in the EXIT state,
use the exit condition
ended - To depend on a previous job that has finished, use the ended
condition.
file - To depend on a change in the filesystem, use the file condition.
There are four file event functions defined:
o age
o arrival
o exist
o size
event - To depend on site "external" event, use the event condition.
Logic Operators
---------------
To create a more flexible dependency condition, you can use the
operators:
&& defined as the logical "AND"
|| defined as the logical "OR"
! defined as the logical "NOT"
( ) defined as a grouping operator
Job Queues
----------
You can obtain various types of services by selecting a specific queue. The
queue specifies a number of parameters such as priority, load thresholds for
job scheduling, limits on the number of running jobs, time windows for job
processing, and limits on job resource consumption. The default queues are
normally suitable to run most jobs for most users, but they may have a very
low priority or restrictive execution conditions to minimize interference
with other jobs.
If automatic queue selection is not satisfactory, you should choose the most
suitable queue for each job. The factors affecting your decision are user
access restrictions, size of the job, resource limits of the queue,
scheduling priority of the queue, active time windows of the queue, hosts
used by the queue, and the scheduling load conditions.
Job Status
----------
After a job is submitted to JobScheduler, it goes through a series
of state transitions until it completes its task.
Most jobs progress through only three states:
PEND: waiting in the queue.
RUN: dispatched to a host and running.
DONE: terminated normally.
A job remains pending until all conditions for its execution are met. The
basic conditions are defined when you submit the job. They can be one or a
combination of the following:
o The calendar the job depends on is active.
o The prior job is running or has completed in the manner expected.
o The file or site event the job depends on is active.
Other conditions may include queue and scheduling policies within the
cluster. A job may terminate abnormally for various reasons. Job
termination may happen from any state. An abnormally terminated job goes
into EXIT state. A job may terminate abnormally for a number or reasons.
Jobs may also be suspended at any time. A job can be suspended by its owner,
by the JobScheduler administrator, or by the system. There are three
different states for suspended jobs:
PSUSP
USUSP
SSUSP
In the latter case, SSUSP, the queue policy is the most important factor
determining if the system will suspend your job.
--------------------------------------------------------------------------
Brian MacDonald ([email protected]) WWW : http://www.platform.com
Platform Computing Corporation Phone : (416) 512-9587 ext. 306
--------------------------------------------------------------------------
|
| LSF PARTNERS
Value Added Resellers (VARs)
China:
Platform Software Ltd.
Suite 14C, Zhi Chun Mansion
118 Zhi Chun Road, Beijing 100083
China
Contact: Dr. Michael Wang
Tel: +86 10 6464-1493
Fax: +86 10 6464-1493
[email protected]
www.sp.net.edu.cn/platform or
www.platform.com
France:
Tethys S.A.
22-24, Rue Debertrand
91410 - Dourdan
France
Contact: Mr. William Chardin
Tel: +33 1 64 59 21 21
Fax: +33 1 64 59 21 20
[email protected]
Germany:
science+computing gmbh
Hagellocher Weg 71
D-72070 Tuebingen
Germany
Contact: Dr. Karsten Gaier
Tel: +49 7071-9457-0
Fax: +49 7071-945727
www.science-computing.uni-tuebingen.de
[email protected]
Italy:
Nice srl
Via Serra 33
14020 Camerano(AT)
Italy
Contact: Dr. Beppe Ugolotti
Tel:.+ 39 141 992400
Fax :+ 39 141 992400
[email protected]
Japan:
Daikin Industries, Ltd.
Electronic Systems Division
Tokyo Operacity-tower 12F, 20-1, 3-CHOME
Nishi-Shinjuku, Shinjuku-ku
Tokyo, 163-14, Japan
Contact: Mr. Shingo Tanaka
Tel: +81 3-5353-7812
Fax: +81 3-5353-7809
www.comtec.daikin.co.jp
[email protected]
Japan:
Platform Computing Japan
3-20-180 Nishiuraga-cho
Yokosuka-shi 239
Japan
Contact: Mr. Toshi Ikeda
Tel: +81 468 44 6515
Fax: +81 468 41 8441
[email protected]
Korea:
GreenBell Systems Inc.
121-210 4FI, C's Bldg 404-5
Seokyo-Dong, Mapo-Ku, Seoul
Korea
Contact: Mr. D.J. Choi
Tel: +82 2 325-2340
Fax : +82 2 325-9488
[email protected]
Norway:
Open Systems Consultants a.s.
St. Olavsgt. 24
N-0166 Oslo, Norway
Contact: Mr. Knut Vidar
Tel:.+ 47 2220-4050
Fax :+ 47 2220-0285
www.osc.no
[email protected]
Singapore:
PTC System Singapore PTE Ltd
No. 3 Irving Road #06-03
Irving Industrial Building, 369522
Singapore
Contact: Mr. Ken Chua
Tel: +65 28202555
Fax: +65 2823126
[email protected]
Sweden:
M&P HiTech Computing AB
Drottninggatan 33
S-111 51 Stockholm
Sweden
Contact: Dr. Magnus Persson
Tel: +46 8 10 67 76
Fax: +46 8 10 67 77
[email protected]
Switzerland:
Innovation GmbH
Carmenstrasse 45
Postfach
8030 Zurich
Switzerland
Contact: Mr. Brian Rees
Tel: +41 1 260 49 91
Fax: +41 1 260 49 99
www.innovation.ch/lsf/
[email protected]
Taiwan:
HwaCom Systems Inc.
8F-1, No. 81, Cheng-Teh Road, Sec.2
Taipei, Taiwan
ROC
Contact: Mr. Robert Liu
Tel:.+ 886 02 558-7575
Fax :+ 886 02 559-4407
[email protected]
United Kingdom:
Platform Computing Ltd.
16 City Business Centre, Hyde Street
Winchester, SO23 7TA
United Kingdom
Contact: Mr. John Pickup
Tel:.+ 44 1962 844041
Fax :+ 44 1962 844043
[email protected]
Northern California, U.S.A.:
Chord Systems, Inc.
2155 S. Bascom Avenue, Suite 106
Campbell, CA 95008
Contact: Mr. Scott McDonald
Tel: +1 (408) 866-4100
Fax: +1 (408) 559-2090
[email protected]
------------------------------------------------------------------------
Computer System Vendors
Digital Equipment Corporation
110 Spit Brook Road ZKO3-2/U20
Nashua, NH 03062
U.S.A.
Contact: Ms Beth Despres
Tel: +1 603 881-6004
Fax: +1 603 881-6059
[email protected]
Hewlett-Packard Company
3000 Waterview Parkway
Richardson, TX 75080
U.S.A.
Contact: Ms. Betty Van Houten
Tel: +1 214 497-4577
Fax: +1 214 497-3123
[email protected]
Silicon Graphics Inc
Mail Stop 580
2011 N. Shoreline Boulevard
Mountain View, CA 94043-1389
U.S.A.
Contact: Mr. Kumar Srikantan
Tel: +1 415 390-3122
Fax: +1 415 390-3562
[email protected]
Sun Microsystems Computer Company
Product Marketing Manager for HPC
2550 Garcia Avenue, MS MPK10-108
Mountain View, CA 94043-1100
Contact: Mr. Jamie Enns
Tel: +1 415 786-8068
FAX: +1 415 786-8390
[email protected]
--------------------------------------------------------------------------
Brian MacDonald ([email protected]) WWW : http://www.platform.com
Platform Computing Corporation Phone : (416) 512-9587 ext. 306
--------------------------------------------------------------------------
|