[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference hydra::axp-developer

Title:Alpha Developer Support
Notice:[email protected], 800-332-4786
Moderator:HYDRA::SYSTEM
Created:Mon Jun 06 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3722
Total number of notes:11359

3057.0. "Scandinavian Softline Technology Oy" by HYDRA::DORHAMER () Mon Jan 20 1997 17:02

T.RTitleUserPersonal
Name
DateLines
3057.1sent pointer to online docsHYDRA::DORHAMERMon Jan 20 1997 17:0426
3057.2more questionsHYDRA::DORHAMERWed Jan 22 1997 10:5238
3057.3use SO_KEEPALIVE ?HYDRA::DORHAMERWed Jan 22 1997 13:0424
3057.4SO_KEEPALIVE not sufficientHYDRA::DORHAMERThu Jan 23 1997 09:5933
3057.5checking with engineeringHYDRA::DORHAMERFri Jan 24 1997 10:264
    I have sent Hu Rui's questions to John Dustin in the UNIX networking
    group and also posted them in the Digital UNIX notes file (note 8578).
    
    Karen
3057.6response from engineeringHYDRA::DORHAMERFri Jan 24 1997 12:0942
        #1          24-JAN-1997 12:07:16.05                                 
    NEWMAIL
    From:   HYDRA::AXPDEVELOPER "[email protected]"
    To:     US3RMC::"[email protected]"
    CC:     AXPDEVELOPER
    Subj:   RE: Socket Read Return Value
    
    Hu Rui,
    
    I received the following response to your questions from one of our
    engineers.  Please let me know if this resolves your problem.
    
    Karen Dorhamer
    Alpha Developer Support
    
    > If I read from a non_blocking TCP socket. How can I know the remote side
    > has closed connection?
    
            you'll get back a return value of 0 from read(2) if TCP has
            determined the connection has been explicitly closed by the
            remote side.  You will never get back 0 from read(2) on a TCP
            socket for any other reason.  If it's non-blocking and the
            connection is still open, but there is simply no data, you'll
            get back a return value of -1, and errno will of been set to
            EWOULDBLOCK.  If there was data then read(2) will return
            the number of octets copied to your buffer.
    
            If the socket is blocking, you will block as long as the
            connection is still open and there is no data to read.
            Otherwise it's the same as for non-blocking (0 will be
            returned when connection is closed *and* you've already
            read all data received on the socket prior to the close).
            You could also get back -1 and errno EINTR if you are
            playing with signals in the process.
    
    > From read's return value or signal.
    
            If you have turned on async notification for the socket, you
            should also get a SIGIO when the connection is closed.  And
            if you are select(2)'ing on the socket, the socket will be
            both readable and writtable.
    
3057.7TCP/IP FAQHYDRA::DORHAMERFri Jan 24 1997 12:57978
        #2          24-JAN-1997 12:48:47.41                                 
    NEWMAIL
    From:   HYDRA::AXPDEVELOPER "[email protected]"
    To:     US3RMC::"[email protected]"
    CC:     AXPDEVELOPER
    Subj:   more socket info
    
    Hu Rui,
    
    One of the engineers from our Digital UNIX engineering group sent me
    the
    attached info sheet on TCP/IP.  Please see question 5 for more info.
    
    Karen Dorhamer
    Alpha Developer Support
    
    Subj:   Re:  socket question
    
    Karen,
    
    He is probably out of luck in many of the cases he has described,
    however, there are a few things he can do which may help, and which
    are outlined in the TCP/IP FAQ.   I have enclosed a copy from last
    March from the comp.protocols.tcp-ip usenet group.  It hasn't changed
    much recently so is as good as the latest version.
    
    Question 5 covers connections that have gone away.
    
    John
    ---------
    From
    nntpd.lkg.dec.com!pa.dec.com!decuac.dec.com!haven.umd.edu!purdue!lerc.nasa.
    gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!newsfee
    d.internetmci.com!ns.pilot.net!news2.pilot.net!wrs.com!wrs.com!gnn Thu
    Mar  7 16
    :57:47 1996
    Article 48463 of comp.protocols.tcp-ip:
    Path:
    nntpd.lkg.dec.com!pa.dec.com!decuac.dec.com!haven.umd.edu!purdue!lerc.nasa
    .gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!newsfe
    ed.internetmci.com!ns.pilot.net!news2.pilot.net!wrs.com!wrs.com!gnn
    >From [email protected] (George Neville-Neil)
    Newsgroups: comp.protocols.tcp-ip
    Subject: FAQ for March 1996
    Date: 1 Mar 96 16:32:54 GMT
    Organization: Wind River Systems, Inc.
    Lines: 877
    Message-ID: <[email protected]>
    NNTP-Posting-Host: loire.wrs.com
    Summary: FAQ
    Keywords: FAQ
    
    
    Hi Folks,
    
            Here is the latest FAQ.  Not many changes this month.
    
    Later,
    George
    
    Archive-name:tcp-ip/FAQ
    Last-modified:  1996/3/1
    Internet Protocol Frequently Asked Questions
    
    Maintained by: George V. Neville-Neil ([email protected])
    Contributions from:
    Ran Atkinson
    Mark Bergman
    Stephane Bortzmeyer
    Rodney Brown
    Dr. Charles E. Campbell Jr.
    Phill Conrad
    Alan Cox
    Rick Jones
    Jon Kay
    Jay Kreibrich
    William Manning
    Barry Margolin
    Jim Muchow
    Subu Rama
    W. Richard Stevens
    Version 3.2
    
    
    ************************************************************************
    
            The following is a list of Frequently Asked Questions, and
    their answers, for people interested in the Internet Protocols,
    including TCP, UDP, ICMP and others.  Please send all additions,
    corrections, complaints and kudos to the above address.  This FAQ will
    be posted on or about the first of every month.
    
            This FAQ is available for anonymous ftp from :
    ftp.netcom.com:/pub/gnn/tcp-ip.faq .  You may get it from my home page
    at
    ftp://ftp.netcom.com/pub/gnn/gnn.html
            You can read the FAQ in HTMl format on Netcom or from the
    mirror
    site http://web.cnam.fr/Network/TCP-IP/tcp-ip.html
    
    ************************************************************************
    Table of Contents:
    Glossary
    1) Are there any good books on IP?
    2) Where can I find example source code for TCP/UDP/IP?
    3) Are there any public domain programs to check the performance of an
    IP link?
    4) Where do I find RFCs?
    5) How can I detect that the other end of a TCP connection has
    crashed?  Can I use "keepalives" for this?
    6) Can the keepalive timeouts be configured?
    7) Can I set up a gateway to the Internet that translates IP
    addresses, so that I don't have to change all our internal addresses
    to an official network?
    8) Are there object-oriented network programming tools?
    9) What other FAQs are related to this one?
    10) What newsgroups contain information on networks/protocols?
    11) Van Jacobson explains TCP congestion avoidance.
    12) Can I use a single bit subnet?
    
    Glossary:
    
    I felt this should be first given the plethora of acronyms used in the
    rest of this FAQ.
    
    IP: Internet Protocol.  The lowest layer protocol defined in TCP/IP.
    This is the base layer on which all other protocols mentioned herein
    are built.  IP is often referred to as TCP/IP as well.
    
    UDP: User Datagram Protocol.  This is a connectionless protocol built
    on top of IP.  It does not provide any guarantees on the ordering or
    delivery of messages.  This protocol is layered on top of IP.
    
    TCP: Transmission Control Protocol.  TCP is a connection oriented
    protocol that guarantees that messages are delivered in the order in
    which they were sent and that all messages are delivered.  If a TCP
    connection cannot deliver a message it closes the connection and
    informs the entity that created it.  This protocol is layered on top
    of IP.
    
    ICMP:  Internet Control Message Protocol.  ICMP is used for
    diagnostics in the network.  The Unix program, ping, uses ICMP
    messages to detect the status of other hosts in the net.  ICMP
    messages can either be queries (in the case of ping) or error reports,
    such as when a network is unreachable.
    
    RFC: Request For Comment.  RFCs are documents that define the
    protocols used in the IP Internet.  Some are only suggestions, some
    are even jokes, and others are published standards.  Several sites in
    the Internet store RFCs and make them available for anonymous ftp.
    
    SLIP:  Serial Line IP.  An implementation of IP for use over a serial
    link (modem).  CSLIP is an optimized (compressed) version of SLIP that
    gives better throughput.
    
    Bandwidth:  The amount of data that can be pushed through a link in
    unit time.  Usually measured in bits or bytes per second.
    
    Latency:  The amount of time that a message spends in a network going
    from point A to point B.
    
    Jitter:  The effect seen when latency is not a constant.  That is, if
    messages experience a different latencies between two points in a
    network.
    
    RPC:  Remote Procedure Call.  RPC is a method of making network access
    to resource transparent to the application programmer by supplying a
    "stub" routine that is called in the same way as a regular procedure
    call.  The stub actually performs the call across the network to
    another computer.
    
    Marshalling:  The process of taking arbitrary data (characters,
    integers, structures) and packing them up for transmission across a
    network.
    
    MBONE: A virtual network that is a Multicast backBONE.  It is still a
    research prototype, but it extends through most of the core of the
    Internet (including North America, Europe, and Australia).  It uses IP
    Multicasting which is defined in RFC-1112.  An MBONE FAQ is available
    via anonymous ftp from: ftp.isi.edu" There are frequent broadcasts of
    multimedia programs (audio and low bandwidth video) over the MBONE.
    Though the MBONE is used for mutlicasting, the long haul parts of the
    MBONE use point-to-point connections through unicast tunnels to
    connect the various multicast networks worldwide.
    
    
    1) Are there any good books on IP?
    
    A) Yes.  Please see the following:
    
    Internetworking with TCP/IP Volume I
    (Principles, Protocols, and Architecture)
    Douglas E. Comer
    Prentice Hall 1991 ISBN 0-13-468505-9
    This volume covers all of the protocols, including IP, UDP, TCP, and
    the gateway protocols.  It also includes discussions of higher level
    protocols such as FTP, TELNET, and NFS.
    
    Internetworking with TCP/IP Volume II
    (Design, Implementation, and Internals)
    Douglas E. Comer / David L. Stevens
    Prentice Hall 1991  ISBN 0-13-472242-6
    
    Discusses the implementation of the protocols and gives numerous code
    examples.
    
    Internetworking with TCP/IP Volume III (BSD Socket Version)
    (Client - Server Programming and Applications)
    Douglas E. Comer / David L. Stevens
    Prentice Hall 1993  ISBN 0-13-474222-2
    
    This book discusses programming applications that use the internet
    protocols.  It includes examples of telnet, ftp clients and servers.
    Discusses RPC and XDR at length.
    
    TCP/IP Illustrated, Volume 1: The Protocols,
    W. Richard Stevens
    (c) Addison-Wesley, 1994  ISBN 0-201-63346-9
    
    An excellent introduction to the entire TCP/IP protocol suite,
    covering all the major protocols, plus several important applications.
    
    "TCP/IP Illustrated, Volume 2: The Implementation",
    by Gary R. Wright and W. Richard Stevens
    (c) Addison-Wesley, 1995
    ISBN 0-201-63354-X
    
    This is a complete, and lenthy, discussion of the internals of TCP/IP
    based on the Net/2 release of BSD.
    
    Unix Network Programming
    W. Richard Stevens
    Prentice Hall 1990  ISBN 0-13-949876
    
    An excellent introduction to network programming under Unix.
    
    The Design and Implementation of the 4.3 BSD Operating System
    Samuel J. Leffler, Marshall Kirk McKusick, Michael J. Karels, John S.
    Quarterman
    Addison-Wesley 1989  ISBN 0-201-06196-1
    
    Though this book is a reference for the entire operating system, the
    eleventh and twelfth chapters completely explain how the networking
    protocols are implemented in the kernel.
    
    Stevens, W. Richard, Unix Network Programming.  1990, Prentice-Hall.
    
    An excellent introduction to network programming under Unix.   Widely
    cited on the Usenet bulliten boards as the "best place to start" if you
    want to actually learn how to write Unix programs that communicate over
    a network.
    Rago, Steven A.  Unix System V. Network Programming.  1993,
    Addison-Wesley.
    
    A book that covers the same kinds of topics as W. Richard Stevens Unix
    Network Programming, but is more specific to Unix System V Release 4
    (SVR4), and so perhaps is more useful and up to date if you are
    working specifically with that implementation.  (Stevens book covers
    Unix System V release 3.x).  There is a much more extensive coverage
    of Streams in Rago's book; 4 chapters, where Stevens only provides a
    couple of subsections.  The design project at the end of the book is
    an implementation of SLIP.
    
    
    2)  Where can I find example source code for TCP/UDP/IP?
    
    A)  Code from the Internetworking with TCP/IP Volume III is available
    for anonymous ftp from:
    
    arthur.cs.purdue.edu:/pub/dls
    Code used in the Net-2 version of Berkeley Unix is available for
    anonymous ftp from:
    
    ftp.uu.net:systems/unix/bsd-sources/sys/netinet
    
    and
    
    gatekeeper.dec.com:/pub/BSD/net2/sys/netinet
    
    Code from Richard Steven's book is available on:
    ftp.uu.net:/published/books/stevens.*
    
    Example source code and libraries to make coding quicker is available
    in the Simple Sockets Library written at NASA.  The Simple Sockets
    Library makes sockets easy to use!  And, it comes as source code.  It
    has been tested on: Unix (SGI, DecStation, AIX, Sun 3, Sparcstation;
    version 2.02+: Solaris 2.1, SCO), VMS, and MSDOS (client only since
    there's no background there).  It is provided in source code form, of
    course, and sits atop Berkeley sockets and tcp/ip.
    
    You can order the "Simple Sockets Library" from
    
                               Austin Code Works
                              11100 Leafwood Lane
                           Austin, TX 78750-3464 USA
                             Phone (512) 258-0785
    
    Ask for the "SSL - The Simple Sockets Library".  Last I checked, they
    were asking $20 US for it.
    
    
    For DOS there is WATTCP.ZIP (numerous sites):
    
    WATTCP is a DOS TCP/IP stack derived from the NCSA Telnet program and
    much enhanced. It comes with some example programs and complete source
    code. The interface isn't BSD sockets but is well suited to PC type
    work. It is also written so that it can be used and memory
    allocation).
    
    3)  Are there any public domain programs to check the performance of
    an IP link?
    
    A)
    
    TTCP:  Available for anonymous ftp from....
    
    wuarchive.wustl.edu:/graphics/graphics/mirrors/sgi.com/sgi/src/ttcp
    
    On ftp.sgi.com are netperf (from Rick Jones at HP) and nettest
    (from Dave Borman at Cray).  ttcp is also availabel at ftp.sgi.com.
    
    You can get to the NetPerf home page via:
    
    http://www.cup.hp.com/netperf/NetperfPage.html
    
    There is suite of Bandwidth Measuring programs from [email protected].
    Available for anonymous ftp from ftp.netcom.com in
    ~ftp/gnn/bwmeas-0.3.tar.Z These are several programs that meausre
    bandwidth and jitter over several kinds of IPC links, including TCP
    and UDP.
    
    
    4) Where do I find RFCs?
    
    A)  This is the latest info on obtaining RFCs:
    Details on obtaining RFCs via FTP or EMAIL may be obtained by sending
    an EMAIL message to [email protected] with the message body
    help: ways_to_get_rfcs.  For example:
    
            To: [email protected]
            Subject: getting rfcs
    
            help: ways_to_get_rfcs
    
    The response to this mail query is quite long and has been omitted.
    
    RFCs can be obtained via FTP from DS.INTERNIC.NET, NIS.NSF.NET,
    NISC.JVNC.NET, FTP.ISI.EDU, WUARCHIVE.WUSTL.EDU, SRC.DOC.IC.AC.UK,
    FTP.CONCERT.NET, or FTP.SESQUI.NET.
    
    
    Using Web, WAIS, and gopher:
    
    Web:
    
    http://web.nexor.co.uk/rfc-index/rfc-index-search-form.html
    
    WAIS access by keyword:
    
    wais://wais.cnam.fr/RFC
    
    Excellent presentation with a full-text search too:
    
    http://www.cis.ohio-state.edu/hypertext/information/rfc.html
    
    With Gopher:
    
    gopher://r2d2.jvnc.net/11/Internet%20Resources/RFC
    gopher://muspin.gsfc.nasa.gov:4320/1g2go4%20ds.internic.net%2070%201%201/.ds/
    .internetdocs
    
    
    
    5) How can I detect that the other end of a TCP connection has crashed?
    Can I use "keepalives" for this?
    
    A) Detecting crashed systems over TCP/IP is difficult.  TCP doesn't
    require
    any transmission over a connection if the application isn't sending
    anything, and many of the media over which TCP/IP is used (e.g.
    ethernet)
    don't provide a reliable way to determine whether a particular host is
    up.
    If a server doesn't hear from a client, it could be because it has
    nothing
    to say, some network between the server and client may be down, the
    server
    or client's network interface may be disconnected, or the client may
    have
    crashed.  Network failures are often temporary (a thin ethernet will
    appear
    down while someone is adding a link to the daisy chain, and it often
    takes
    a few minutes for new routes to stabilize when a router goes down), and
    TCP
    connections shouldn't be dropped as a result.
    
    Keepalives are a feature of the sockets API that requests that an empty
    packet be sent periodically over an idle connection; this should evoke
    an
    acknowledgement from the remote system if it is still up, a reset if it
    has
    rebooted, and a timeout if it is down.  These are not normally sent
    until
    the connection has been idle for a few hours.  The purpose isn't to
    detect
    a crash immediately, but to keep unnecessary resources from being
    allocated
    forever.
    
    If more rapid detection of remote failures is required, this should be
    implemented in the application protocol.  There is no standard
    mechanism
    for this, but an example is requiring clients to send a "no-op" message
    every minute or two.  An example protocol that uses this is X Display
    Manager Control Protocol (XDMCP), part of the X Window System, Version
    11;
    the XDM server managing a session periodically sends a Sync command to
    the
    display server, which should evoke an application-level response, and
    resets the session if it doesn't get a response (this is actually an
    example of a poor implementation, as a timeout can occur if another
    client
    "grabs" the server for too long).
    
    6) Can the keepalive timeouts be configured?
    
    A) This varies by operating system.  There is a program that works on
    many Unices (though not Linux or Solaris), called netconfig, that
    allows one to do this and documents many of the variables.  It is
    available by anonymous FTP from
    
        cs.ucsd.edu:pub/csl/Netconfig/netconfig2.2.tar.Z
    
    In addition, Richard Stevens' TCP/IP Illustrated, Volume 1 includes a
    good discussion of setting the most useful variables on many
    platforms.
    
    7) Can I set up a gateway to the Internet that translates IP addresses,
    so
    that I don't have to change all our internal addresses to an official
    network?
    
    A) There's no general solution to this.  Many protocols include IP
    addresses in the application-level data (FTP's "PORT" command is the
    most
    notable), so it isn't simply a matter of translating addresses in the
    IP
    header.  Also, if the network number(s) you're using match those
    assigned
    to another organization, your gateway won't be able to communicate with
    that organization (RFC 1597 proposes network numbers that are reserved
    for
    private use, to avoid such conflicts, but if you're already using a
    different network number this won't help you).
    
    However, if you're willing to live with limited access to the Internet
    from
    internal hosts, the "proxy" servers developed for firewalls can be used
    as
    a substitute for an address-translating gateway. See the firewall FAQ.
    
    8) Are there object-oriented network programming tools?
    A) Yes, and one such system is called ACE (ADAPTIVE Communication
    Environment).  Here is how to get more information and the software:
    
    OBTAINING ACE
    
    An HTML version of this README file is available at URL
    http://www.cs.wustl.edu/~schmidt/ACE.html.  All software and
    documentation is available via both anonymous ftp and the Web.
    
    ACE is available for anonymous ftp from the ics.uci.edu (128.195.1.1)
    host in the gnu/C++_wrappers.tar.Z file (approximately .5 meg
    compressed).  This release contains contains the source code,
    documentation, and example test drivers for C++ wrapper libras.
    
    9) What other FAQs might you want to look in?
    comp.protocols.tcp-ip.ibmpc
       Aboba, Bernard D.(1994) "comp.protocols.tcp-ip.ibmpc Frequently
        Asked Questions (FAQ)" Usenet news.answers, available via
        file://ftp.netcom.com/pub/ma/mailcom/IBMTCP/ibmtcp.zip,
        57 pages.
    
    comp.protocols.ppp
       Archive-name: ppp-faq/part[1-8]
       URL: http://cs.uni-bonn.de/ppp/part[1-8].html
    
    comp.dcom.lans.ethernet
       ftp site: dorm.rutgers.edu, pub/novell/DOCS
       Ethernet Network Questions and Answers
       Summarized from UseNet group comp.dcom.lans.ethernet
    
    10) What other newsgroups deal with networking?
    
    comp.dcom.cabling       Cabling selection, installation and use.
    comp.dcom.isdn          The Integrated Services Digital Network
                            (ISDN).
    comp.dcom.lans.ethernet Discussions of the Ethernet/IEEE 802.3
                            protocols.comp.dcom.lans.fddi     Discussions
    of the FDDI protocol suite.
    comp.dcom.lans.misc     Local area network hardware and software.
    comp.dcom.lans.token-ring       Installing and using token ring
                                    networks.
    comp.dcom.servers       Selecting and operating data communications
                            servers.
    comp.dcom.sys.cisco     Info on Cisco routers and bridges.
    comp.dcom.sys.wellfleet Wellfleet bridge & router systems hardware &
                            software.
    comp.protocols.ibm      Networking with IBM mainframes.
    comp.protocols.iso      The ISO protocol stack.
    comp.protocols.kerberos The Kerberos authentication server.
    comp.protocols.misc     Various forms and types of protocol.
    comp.protocols.nfs      Discussion about the Network File System
                            protocol.
    comp.protocols.ppp      Discussion of the Internet Point to Point
                            Protocol.
    comp.protocols.smb      SMB file sharing protocol and Samba SMB
                            server/client.
    comp.protocols.tcp-ip   TCP and IP network protocols.
    comp.protocols.tcp-ip.ibmpc     TCP/IP for IBM(-like) personal
                                    computers.
    comp.security.misc      Security isuipment for the PC.
    comp.os.ms-windows.networking.misc      Windows and other networks.
    comp.os.ms-windows.networking.tcp-ip    Windows and TCP/IP networking.
    comp.os.ms-windows.networking.windows   Windows' built-in networking.
    comp.os.os2.networking.misc     Miscellaneous networking issues of
                                    OS/2.
    comp.os.os2.networking.tcp-ip   TCP/IP under OS/2.
    comp.sys.novell         Discussion of Novell Netware products.
    
    11) Van Jacobson explains TCP congestion avoidance.
    
    I've attached Van J's original posting on it (I seem to repost this
    every
    6 months or so).  If you want to see some real examples of this in
    action,
    take a look at Chapter 21 of my "TCP/IP Illustrated, Volume 1".
    
            Rich Stevens
    ---------------------------------------------------------------------------
    >From [email protected] Mon Apr 30 01:44:05 1990
    To: [email protected]
    Subject: modified TCP congestion avoidance algorithm
    Date: Mon, 30 Apr 90 01:40:59 PDT
    From: Van Jacobson <[email protected]>
    Status: RO
    
    This is a description of the modified TCP congestion avoidance
    algorithm that I promised at the teleconference.
    
    BTW, on re-reading, I noticed there were several errors in
    Lixia's note besides the problem I noted at the teleconference.
    I don't know whether that's because I mis-communicated the
    algorithm at dinner (as I recall, I'd had some wine) or because
    she's convinced that TCP is ultimately irrelevant :).  Either
    way, you will probably be disappointed if you experiment with
    what's in that note.
    
    First, I should point out once again that there are two
    completely independent window adjustment algorithms running in
    the sender:  Slow-start is run when the pipe is empty (i.e.,
    when first starting or re-starting after a timeout).  Its goal
    is to get the "ack clock" started so packets will be metered
    into the network at a reasonable rate.  The other algorithm,
    congestion avoidance, is run any time *but* when (re-)starting
    and is responsible for estimating the (dynamically varying)
    pipesize.  You will cause yourself, or me, no end of confusion
    if you lump these separate algorithms (as Lixia's message did).
    
    The modifications described here are only to the congestion
    avoidance algorithm, not to slow-start, and they are intended to
    apply to large bandwidth-delay product paths (though they don't
    do any harm on other paths).  Remember that with regular TCP (or
    with slow-start/c-a TCP), throughput really starts to go to hell
    when the probability of packet loss is on the order of the
    bandwidth-delay product.  E.g., you might expect a 1% packet
    loss rate to translate into a 1% lower throughput but for, say,
    a TCP connection with a 100 packet b-d p. (= window), it results
    in a 50-75% throughput loss.  To make TCP effective on fat
    pipes, it would be nice if throughput degraded only as function
    of loss probability rather than as the product of the loss
    probabilty and the b-d p.  (Assuming, of course, that we can do
    this without sacrificing congestion avoidance.)
    
    These mods do two things: (1) prevent the pipe from going empty
    after a loss (if the pipe doesn't go empty, you won't have to
    waste round-trip times re-filling it) and (2) correctly account
    for the amount of data actually in the pipe (since that's what
    congestion avoidance is supposed to be estimating and adapting to).
    
    For (1), remember that we use a packet loss as a signal that the
    pipe is overfull (congested) and that packet loss can be
    detected one of two different ways:  (a) via a retransmit
    timeout or (b) when some small number (3-4) of consecutive
    duplicate acks has been received (the "fast retransmit"
    algorithm).  In case (a), the pipe is guaranteed to be empty so
    we must slow-start.  In case (b), if the duplicate ack
    threshhold is small compared to the bandwidth-delay product, we
    will detect the loss with the pipe almost full.  I.e., given a
    threshhold of 3 packets and an LBL-MIT bandwidth-delay of around
    24KB or 16 packets (assuming 1500 byte MTUs), the pipe is 75%
    full when fast-retransmit detects a loss (actually, until
    gateways start doing some sort of congestion control, the pipe
    is overfull when the loss is detected so *at least* 75% of the
    packets needed for ack clocking are in transit when
    fast-retransmit happens).  Since the pipe is full, there's no
    need to slow-start after a fast-retransmit.
    
    For (2), consider what a duplicate ack means:  either the
    network duplicated a packet (i.e., the NSFNet braindead IBM
    token ring adapters) or the receiver got an out-of-order packet.
    The usual cause of out-of-order packets at the receiver is a
    missing packet.  I.e., if there are W packets in transit and one
    is dropped, the receiver will get W-1 out-of-order and
    (4.3-tahoe TCP will) generate W-1 duplicate acks.  If the
    `consecutive duplicates' threshhold is set high enough, we can
    reasonably assume that duplicate acks mean dropped packets.
    
    But there's more information in the ack:  The receiver can only
    generate one in response to a packet arrival.  I.e., a duplicate
    ack means that a packet has left the network (it is now cached
    at the receiver).  If the sender is limitted by the congestion
    window, a packet can now be sent.  (The congestion window is a
    count of how many packets will fit in the pipe.  The ack says a
    packet has left the pipe so a new one can be added to take its
    place.)  To put this another way, say the current congestion
    window is C (i.e, C packets will fit in the pipe) and D
    duplicate acks have been received.  Then only C-D packets are
    actually in the pipe and the sender wants to use a window of C+D
    packets to fill the pipe to its estimated capacity (C+D sent -
    D received = C in pipe).
    
    So, conceptually, the slow-start/cong.avoid/fast-rexmit changes
    are:
    
      - The sender's input routine is changed to set `cwnd' to `ssthresh'
        when the dup ack threshhold is reached.  [It used to set cwnd to
        mss to force a slow-start.]  Everything else stays the same.
    
      - The sender's output routine is changed to use an effective window
        of min(snd_wnd, cwnd + dupacks*mss)  [the change is the addition
        of the `dupacks*mss' term.]  `Dupacks' is zero until the rexmit
        threshhold is reached and zero except when receiving a sequence
        of duplicate acks.
    
    The actual implementation is slightly different than the above
    because I wanted to avoid the multiply in the output routine
    (multiplies are expensive on some risc machines).  A diff of the
    old and new fastrexmit code is attached (your line numbers will
    vary).
    
    Note that we still do congestion avoidance (i.e., the window is
    reduced by 50% when we detect the packet loss).  But, as long as
    the receiver's offered window is large enough (it needs to be at
    most twice the bandwidth-delay product), we continue sending
    packets (at exactly half the rate we were sending before the
    loss) even after the loss is detected so the pipe stays full at
    exactly the level we want and a slow-start isn't necessary.
    
    Some algebra might make this last clear:  Say U is the sequence
    number of the first un-acked packet and we are using a window
    size of W when packet U is dropped.  Packets [U..U+W) are in
    transit.  When the loss is detected, we send packet U and pull
    the window back to W/2.  But in the round-trip time it takes
    the U retransmit to fill the receiver's hole and an ack to get
    back, W-1 dup acks will arrive (one for each packet in transit).
    The window is effectively inflated by one packet for each of
    these acks so packets [U..U+W/2+W-1) are sent.  But we don't
    re-send packets unless we know they've been lost so the amount
    actually sent between the loss detection and the recovery ack is
    U+W/2+W-1 - U+W = W/2-1 which is exactly the amount congestion
    avoidance allows us to send (if we add in the rexmit of U).  The
    recovery ack is for packet U+W so when the effective window is
    pulled back from W/2+W-1 to W/2 (which happens because the
    recovery ack is `new' and sets dupack to zero), we are allowed
    to send up to packet U+W+W/2 which is exactly the first packet
    we haven't yet sent.  (I.e., there is no sudden burst of packets
    as the `hole' is filled.)  Also, when sending packets between
    the loss detection and the recovery ack, we do nothing for the
    first W/2 dup acks (because they only allow us to send packets
    we've already sent) and the bottleneck gateway is given W/2
    packet times to clean out its backlog.  Thus when we start
    sending our W/2-1 new packets, the bottleneck queue is as empty
    as it can be.
    
    [I don't know if you can get the flavor of what happens from
    this description -- it's hard to see without a picture.  But I
    was delighted by how beautifully it worked -- it was like
    watching the innards of an engine when all the separate motions
    of crank, pistons and valves suddenly fit together and
    everything appears in exactly the right place at just the right
    time.]
    
    Also note that this algorithm interoperates with old tcp's:  Most
    pre-tahoe tcp's don't generate the dup acks on out-of-order packets.
    If we don't get the dup acks, fast retransmit never fires and the
    window is never inflated so everything happens in the old way (via
    timeouts).  Everything works just as it did without the new algorithm
    (and just as slow).
    
    If you want to simulate this, the intended environment is:
    
        - large bandwidth-delay product (say 20 or more packets)
    
        - receiver advertising window of two b-d p (or, equivalently,
          advertised window of the unloaded b-d p but two or more
          connections simultaneously sharing the path).
    
        - average loss rate (from congestion or other source) less than
          one lost packet per round-trip-time per active connection.
          (The algorithm works at higher loss rate but the TCP selective
          ack option has to be implemented otherwise the pipe will go empty
          waiting to fill the second hole and throughput will once again
          degrade at the product of the loss rate and b-d p.  With
    selective
          ack, throughput is insensitive to b-d p at any loss rate.)
    
    And, of course, we should always remember that good engineering
    practise suggests a b-d p worth of buffer at each bottleneck --
    less buffer and your simulation will exhibit the interesting
    pathologies of a poorly engineered network but will probably
    tell you little about the workings of the algorithm (unless the
    algorithm misbehaves badly under these conditions but my
    simulations and measurements say that it doesn't).  In these
    days of $100/megabyte memory, I dearly hope that this particular
    example of bad engineering is of historical interest only.
    
     - Van
    -----------------
    *** /tmp/,RCSt1a26717   Mon Apr 30 01:35:17 1990
    --- tcp_input.c Mon Apr 30 01:33:30 1990
    ***************
    *** 834,850 ****
                                     * Kludge snd_nxt & the congestion
                                     * window so we send only this one
    !                                * packet.  If this packet fills the
    !                                * only hole in the receiver's seq.
    !                                * space, the next real ack will fully
    !                                * open our window.  This means we
    !                                * have to do the usual slow-start to
    !                                * not overwhelm an intermediate
    gateway
    !                                * with a burst of packets.  Leave
    !                                * here with the congestion window set
    !                                * to allow 2 packets on the next real
    !                                * ack and the exp-to-linear thresh
    !                                * set for half the current window
    !                                * size (since we know we're losing a
    !                                * the current window size).
                                     */
                                    if (tp->t_timer[TCPT_REXMT] == 0 ||
    --- 834,850 ----
                                     * Kludge snd_nxt & the congestion
                                     * window so we send only this one
    !                                * packet.
    !                                *
    !                                * We know we're losing at the current
    !                                * window size so do congestion
    avoidance
    !                                * (set ssthresh to half the current
    window
    !                                * and pull our congestion window back
    to
    !                                * the new ssthresh).
    !                                *
    !                                * Dup acks mean that packets have left
    the
    !                                * network (they're now cached at the
    receiver)
    !                                * so bump cwnd by the amount in the
    receiver
    !                                * to keep a constant cwnd packets in
    the
    !                                * network.
    
                                     */
                                    if (tp->t_timer[TCPT_REXMT] == 0 ||
    ***************
    *** 853,864 ****
                                    else if (++tp->t_dupacks ==
    tcprexmtthresh) {
                                            tcp_seq onxt = tp->snd_nxt;
    !                                       u_int win =
    !                                           MIN(tp->snd_wnd,
    tp->snd_cwnd) / 2 /
    !                                               tp->t_maxseg;
    
                                            if (win < 2)
                                                    win = 2;
                                            tp->snd_ssthresh = win *
    tp->t_maxseg;
    -
                                            tp->t_timer[TCPT_REXMT] = 0;
                                            tp->t_rtt = 0;
    --- 853,864 ----
                                    else if (++tp->t_dupacks ==
    tcprexmtthresh) {
                                            tcp_seq onxt = tp->snd_nxt;
    !                                       u_int win = MIN(tp->snd_wnd,
    !                                                       tp->snd_cwnd);
    
    +                                       win /= tp->t_maxseg;
    +                                       win >>= 1;
                                            if (win < 2)
                                                    win = 2;
                                            tp->snd_ssthresh = win *
    tp->t_maxseg;
                                            tp->t_timer[TCPT_REXMT] = 0;
                                            tp->t_rtt = 0;
    ***************
    *** 866,873 ****
                                            tp->snd_cwnd = tp->t_maxseg;
                                            (void) tcp_output(tp);
    !
                                            if (SEQ_GT(onxt, tp->snd_nxt))
                                                    tp->snd_nxt = onxt;
                                            goto drop;
                                    }
                            } else
    --- 866,879 ----
                                            tp->snd_cwnd = tp->t_maxseg;
                                            (void) tcp_output(tp);
    !                                       tp->snd_cwnd = tp->snd_ssthresh
    +
    !                                                      tp->t_maxseg *
    !                                                      tp->t_dupacks;
                                            if (SEQ_GT(onxt, tp->snd_nxt))
                                                    tp->snd_nxt = onxt;
                                            goto drop;
    +                               } else if (tp->t_dupacks >
    tcprexmtthresh) {
    +                                       tp->snd_cwnd += tp->t_maxseg;
    +                                       (void) tcp_output(tp);
    +                                       goto drop;
                                    }
                            } else
    ***************
    *** 874,877 ****
    --- 880,890 ----
                                    tp->t_dupacks = 0;
                            break;
    +               }
    +               if (tp->t_dupacks) {
    +                       /*
    +                        * the congestion window was inflated to
    account for
    +                        * the other side's cached packets - retract
    it.
    +                        */
    +                       tp->snd_cwnd = tp->snd_ssthresh;
                    }
                    tp->t_dupacks = 0;
    *** /tmp/,RCSt1a26725   Mon Apr 30 01:35:23 1990
    --- tcp_timer.c Mon Apr 30 00:36:29 1990
    ***************
    *** 223,226 ****
    --- 223,227 ----
                    tp->snd_cwnd = tp->t_maxseg;
                    tp->snd_ssthresh = win * tp->t_maxseg;
    +               tp->t_dupacks = 0;
                    }
                    (void) tcp_output(tp);
    
    >From [email protected] Mon Apr 30 10:37:36 1990
    To: [email protected]
    Subject: modified TCP congestion avoidance algorithm (correction)
    Date: Mon, 30 Apr 90 10:36:12 PDT
    From: Van Jacobson <[email protected]>
    Status: RO
    
    I shouldn't make last minute 'fixes'.  The code I sent out last
    night had a small error:
    
    *** t.c Mon Apr 30 10:28:52 1990
    --- tcp_input.c Mon Apr 30 10:30:41 1990
    ***************
    *** 885,893 ****
                             * the congestion window was inflated to
    account for
                             * the other side's cached packets - retract
    it.
                            */
    !                       tp->snd_cwnd = tp->snd_ssthresh;
                    }
    -               tp->t_dupacks = 0;
                    if (SEQ_GT(ti->ti_ack, tp->snd_max)) {
                            tcpstat.tcps_rcvacktoomuch++;
                            goto dropafterack;
    --- 885,894 ----
                             * the congestion window was inflated to
    account for
                             * the other side's cached packets - retract
    it.
                             */
    !                       if (tp->snd_cwnd > tp->snd_ssthresh)
    !                               tp->snd_cwnd = tp->snd_ssthresh;
    !                       tp->t_dupacks = 0;
                    }
                    if (SEQ_GT(ti->ti_ack, tp->snd_max)) {
                            tcpstat.tcps_rcvacktoomuch++;
                            goto dropafterack;
    
    12) Can I use a single bit subnet?
    
    A)  It would seem that the consensus is no.  The best citable answer
    follows.
    
    >From RFC1122:
          "3.3.6  Broadcasts
             Section 3.2.1.3 defined the four standard IP broadcast address
             forms:
               Limited Broadcast:  {-1, -1}
               Directed Broadcast:  {<Network-number>,-1}
               Subnet Directed Broadcast:
                                  {<Network-number>,<Subnet-number>,-1}
               All-Subnets Directed Broadcast: {<Network-number>,-1,-1}"
    
    All-Subnets Directed broadcasts are being deprecated in favor of IP
    multicast, but were very much defined at the time RFC1122 was written.
    Thus a Subnet Directed Broadcast to a subnet of all ones is not
    distinguishable from an All-Subnets Directed Broadcast.
    
    For those old systems that used all zeros for broadcast in IP
    addresses,
    a similar argument can be made against the subnet of all zeros.
    
    Also, for old routing protocols like RIP, a route to subnet zero
    is not distinguishable from the route to the entire network number
    (except possibly by context).
    
    Most of today's systems don't support variable length subnet masks
    (VLSM), and for such systems the above is true. However, all the major
    router vendors and *some* Unix systems (BSD 4.4 based ones) support
    VLSMs, and in that case the situation is more complicated :-)
    
    With VLSMs (necessary to support CIDR, see RFC 1519), you can utilize
    the
    address space more efficiently. Routing lookups are based on *longest*
    match, and this means that you can for instance subnet the class C net
    with a mask of 255.255.255.224 (27 bits) in addition to the subnet mask
    of 255.255.255.192 (26 bits) given above. You will then be able to use
    the addresses x.x.x.33 through x.x.x.62 (first three bits 001) and the
    addresses x.x.x.193 through x.x.x.222 (first three bits 110) with this
    new subnet mask. And you can continue with a subnet mask of 28 bits,
    etc.
    (Note also, by the way, that non-contiguous subnet masks are
    deprecated.)
    
    This is all very nicely covered in the paper by Havard Eidnes:
    
      Practical Considerations for Network Address using a CIDR Block
    Allocation
      Proceedings of INET '93
    
    This paper is available with anonymous FTP from
    
            aun.uninett.no:/pub/misc/eidnes-cidr.ps
    
    The same paper, with minor revisions, is one of the articles in the
    special Internetworking issue of Communications of the ACM (last month,
    I believe).
    
    > I have be told that some network equipment (Cisco I think was the
    vendor
    > named) will not correctly handle subnets that violated that standard.
    As far as I know cisco is one of the router vendors that *do* handle
    VLSMs correctly. Could you substantiate this claim?
    
    Steinar Haug, SINTEF RUNIT, University of Trondheim, NORWAY
    Email: [email protected]
    
    --
    George V. Neville-Neil          work: [email protected]    
    home:[email protected]
    NIC: GN82
    
    This signature kept blank due to the CDA.
3057.8more tcp and socket info from UNIX notes fileHYDRA::DORHAMERMon Jan 27 1997 10:2627
        #1          27-JAN-1997 10:24:19.80                                 
    NEWMAIL
    From:   HYDRA::AXPDEVELOPER "[email protected]"
    To:     US3RMC::"[email protected]"
    CC:     AXPDEVELOPER
    Subj:   more tcp and socket info
    
    Hu Rui,
    
    Here's another response to some of your questions:
    
    ---------------------------------------------------------------------------
    >We have not found any good detailed material that can answer my question.
    >How the TCP is implemented in Digital Unix?
    
    Digital follows the RFCs. See SPD, man netintro(7), inet(7), ip(7),tcp(7),
    udp(7), etc.. and also Network and Communications Overview.
    
    >Tell me currently my socket is in what state?
    
    You could do a "netstat -a[n]", from the shell, or even from a prog...
    
    ---------------------------------------------------------------------------
    
    Karen Dorhamer
    Alpha Developer Support
    
3057.9need strlen for XDRHYDRA::DORHAMERThu Jan 30 1997 09:0127
    From:   SMTP%"[email protected]"   30-JAN-1997 02:51:19.53
    To:     [email protected]
    CC:
    Subj:   Thanks for TCP socket help!
    
    Thank you very much for your continuing help in my TCP socket questions.
    Now we are able to deliver the products with confidence. Your confirmation
    of TCP socket read, non blocking mode return value is the most important
    reply I got. My code is very much depend on that.
    
    I have still another question about XDR.
    I use XDR to transfer structure through the socket. My structure is very
    complicated and of variable length. Inside there are several linked list.
    The list can be 1,000 nodes of 1 nodes. Currently I allocate a big enough
    buffer to store the XDR string. This is very unefficient when the list has
    only few value.
    Is there any strlen function for XDR so I can know the length of binary
    string.
    _________________________________________________
    Hu Rui
    R&D, SMS Unit    (ASAP code A60205)
    Scandinavian Softline Technology Oy
    Tulkinkuja 3   02600 ESPOO   Finland
    tel. +358-9-5495 6202  fax. +358-9-512 4629
    home tel. +358-9-2789426
    Internet: [email protected]  http://www.softline.fi/
    _________________________________________________
3057.10moved from note 3115.1 (misplaced note)HYDRA::DORHAMERMon Feb 03 1997 16:4114
        #1          30-JAN-1997 13:14:53.92                             
    NEWMAIL
        From:   HYDRA::AXPDEVELOPER "[email protected]"
        To:     HYDRA::AXPDEVELOPER
        CC:     AXPDEVELOPER
        Subj:   RE: FWD: Thanks for TCP socket help!
    
        Hu Rui,
    
        Can you use the sizeof operator to get the size of your structure? If 
        not, I'll check with engineering to find out the proper way to do this.
    
        Karen Dorhamer
        Alpha Developer Support
3057.11resendHYDRA::DORHAMERMon Feb 03 1997 16:4518
        #1           3-FEB-1997 16:44:57.75                                 
    NEWMAIL
    From:   HYDRA::AXPDEVELOPER "[email protected]"
    To:     SMTP%"[email protected]"
    CC:     AXPDEVELOPER
    Subj:   RE: Thanks for TCP socket help!
    
    Hu Rui,
    
    Sorry if you have not yet receiced a response to your last e-mail regarding
    XDR and strlen.  I may have sent the response to the wrong e-mail address.
    
    Can you use the sizeof operator to get the size of your structure?
    If not, I'll check with engineering to find out the proper way to do this.
    
    Karen Dorhamer
    Alpha Developer Support
    
3057.12further clarificationHYDRA::DORHAMERFri Feb 07 1997 12:4534
    From:   SMTP%"[email protected]"    4-FEB-1997 02:33:02.01
    To:     "[email protected]" <[email protected]>
    CC:     Kari Kailamaki <[email protected]>
    Subj:   Re: Thanks for TCP socket help!
    
    > Hu Rui,
    >
    > Sorry if you have not yet receiced a response to your last e-mail regarding
    > XDR and strlen.  I may have sent the response to the wrong e-mail address.
    
    No I had not received any thing, it might go to the wrong address.
    
    > Can you use the sizeof operator to get the size of your structure?
    
    Yes, I can and I am using this solution now.
    
    > If not, I'll check with engineering to find out the proper way to do this.
    
    But I want to know some simple solution if it exists.
    Currently what I did is, I estimate how many bytes I need, for string I
    allocate (strlen + 20) but this is not sharp solution. What I need is a
    function
    
    like that
    
    int
    xdrlen(xdrstring, ... structure definition)
    
    
    I want to know if DEC has coded it. I have read the whole network
    programming manu, but found nothing.
    
    Regards.
    
3057.13posted in Digital_UNIX notesHYDRA::DORHAMERFri Feb 07 1997 13:043
    I have posted his questions in Digital_UNIX note 8759.
    
    Karen
3057.14response from Digital_unix note 8759HYDRA::DORHAMERFri Feb 14 1997 16:4634
        #1          14-FEB-1997 16:44:18.94                                 
    NEWMAIL   
    From:   HYDRA::AXPDEVELOPER "[email protected]"
    To:     NM%US6RMC::"[email protected]"
    CC:     AXPDEVELOPER
    Subj:   XDR info
    
    Hu Rui,
    
    Attached is a response that I received regarding your questions about
    calculating the length of XDR data.  I hope this helps you out.
    
    Karen Dorhamer
    Alpha Developer Support
    
    I don't know of libc routines that let you estimate how long XDR data will
    be.  It would be fairly easy to write a new XDR module that merely counts
    the length of data to be encoded, and using the public domain RPC code would
    provide enough hints.  (That has a lot of 32/64 bit issues, so it doesn't
    replace our libc code.)
    
    Some sizes (I may not be exactly right):
    
    xdr_bytes, xdr_string:  The number of bytes, rounded up to next multiple 
    of 4, plus 4.
    xdr_char, xdr_short, xdr_int, xdr_long, xdr_float xdr_bool, xdr_enum:  
    4 bytes.
    xdr_longlong, xdr_hyper, xdr_double:    8 bytes
    
    Ah - here's an idea - try calling xdr_getpos before and after encoding
    something.  The difference is the number of bytes used.
    
    Me?  I'd just look at the messages sent via tcpdump and figure the length 
    and structure.