T.R | Title | User | Personal Name | Date | Lines |
---|
10014.1 | Three problems; one is due to threads for sure. | WTFN::SCALES | Despair is appropriate and inevitable. | Mon Jun 02 1997 15:21 | 46 |
| I believe that you are reporting three problems:
Problem #1: you are seeing a 9% degradation in performance between V3.2G and
V4.0B in the responsiveness of socket I/O when used in a
non-threaded program.
Problem #2: you are seeing an additional 30%+ degradation in performance on
V4.0B in the responsiveness of socket I/O when the program is
linked with the threads libraries.
Problem #3: you are seeing poor scaling characteristics when running multiple
instances of a multithreaded program which makes heavy use of
select().
Problem #1 is clearly unrelated to threads (perhaps someone who knows about the
I/O system will comment). The source of Problem #3 is unclear (someone should
make an attempt to diagnose it further).
Problem #2, contrary to your assertion, is entirely due to the new thread
scheduling model introduced in V4.0. Despite the fact that your code makes no
calls to the threading library, simply linking your program with the threads
library changes it from a "non-threaded program" into a "multithreaded program
which happens to use only one thread", which is completely different.
In our efforts to improve the integration between threaded programs and the
system, we sacrificed the performance of the degenerate case where a program
links in threads but doesn't use them. There are various overhead costs imposed
by your election to use threads. The presumption is that the benefits of using
threads will more than make up for the costs. However, in your test case, where
you are not actually using threads, you get none of the benefits, while you are
forced to pay all of the costs.
With special support from the Digital Unix kernel, we've moved thread scheduling
out into user mode (for the most part). This makes system calls more expensive
in terms of CPU usage, while relieving various reschedule-latency problems and
system scaling issues. It also makes the case of blocking (and unblocking) the
process (i.e., when there is no work to do) marginally more expensive; but,
working on the idea that there are a number of threads active, this would be an
atypical event. However, your test senario manages to hit both of these costs
head on -- your processes are consistently blocking in system calls leaving the
process with nothing else to do, so you incur the added cost of the thread
scheduling and the added latency of the process block/unblock.
Since your real application is multithreaded, you might want to consider
constructing a more representative (i.e., multithreaded) benchmark.
Webb
|
10014.2 | not best with threaded progrogram | TAEC::URAGO | | Tue Jun 03 1997 09:48 | 396 |
| Hi Webb,
from your previous note :
1- "In our efforts to improve the integration between threaded programs and the
system, we sacrificed the performance of the degenerate case where a program
links in threads but doesn't use them."
>>> You did it well .... (joke :-)
2- "However, in your test case, where you are not actually using threads, you
get none of the benefits, while you are forced to pay all of the costs."
>>> I returned to my desk and have writen the same kind of program but using
threads. The idea is still the same : packets of data are exchanged from one
process to another and are acknowledged to force synchronization. the number
of thread may be statically configured (#define MAX_THREADS).
I have done some tests using 3,6,12 and 24 threads on each side.
The results obtained are quite similare to the one obtained by my first
non-threaded program :-<
(programs, makefiles etc.. at the end of the note )
The following results are giving an idea of the number of packet/second
exchanged in the different configurations.
DUNIX V3.2G :
------------
3 threads 6 threads 12 threads 24 threads
6100 pkts 6120 pkts 6000 pkts 6010 pkts
You can see that the global number of packets exchanged between the two
processes is quite constant and near 6000/sec.
DUNIX V4.0B :
------------
3 threads 6 threads 12 threads 24 threads
3400 pkts 3600 pkts 3450 pkts 3200 pkts
On DUNIX V4.0 the number of pakets exchanged / second is never higher than 3600
whatever the number of thread in the process. This still represent a loss
of performance of 40%.
So what are the conditions to get the benefits of the scheduling improvements ?
3- "This makes system calls more expensive in terms of CPU
usage, while relieving various reschedule-latency problems and system scaling
issues. It also makes the case of blocking (and unblocking) the
process (i.e., when there is no work to do) marginally more expensive; but,
working on the idea that there are a number of threads active, this would be an
atypical event."
>>>> If i understant well, the performances have been optimized in the case
where a thread is blocked and gives the hand to an other thread of the same
process (case of several thread locking/unlocking a mutex) rigth ?
>>>> In our application this is usally not the case. the threads are waiting
on I/O (sockets) and sometimes use mutex when acessing global data. This means
that most of the time when a thread is blocked
(on a read syst. call for example) it doesn't give the hand to an other thread
of the same process but unblock the receiver thread of the target process.
This example IS NOT an atypical event, but just the real life in the
telecom world and probably in some others.
then, what do we do next ?
Jean-marie.
-----------------------------------------------------------------------
Makefile:
--------
du40: send_thr_du40 recv_thr_du40
du32: send_thr_du32 recv_thr_du32
send_thr_du32: send_thr.c
cc -o send_thr_du32 send_thr.c -lpthreads -lmach -lc_r
recv_thr_du32: recv_thr.c
cc -o recv_thr_du32 recv_thr.c -lpthreads -lmach -lc_r
send_thr_du40: send_thr.c
cc -pthread -o send_thr_du40 send_thr.c
recv_thr_du40: recv_thr.c
cc -pthread -o recv_thr_du40 recv_thr.c
-----------------------------------------------------------------------
send_thr.c :
------------
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/un.h>
#include <sys/socket.h>
#define MAX_THREADS 3
#define SOCKET_NAME "my_sockets"
#define INBUFF_LEN 1024
#define BUFF_LEN 600
char message[BUFF_LEN];
void connect_and_emit()
{
int socket_id1;
struct sockaddr_un address;
int addr_len;
char buffer[INBUFF_LEN];
int status;
size_t nbbytes;
int msg_len;
int i;
printf(" Thread: The thread begin!\n");
address.sun_family = AF_UNIX;
strcpy(address.sun_path,SOCKET_NAME);
socket_id1 = socket(AF_UNIX,SOCK_STREAM,0);
if ( socket_id1 == -1)
{
printf("socket error %d\n",errno);
exit(-1);
}
status = connect(socket_id1,&address,sizeof(address));
if ( status == -1 )
{
printf("connect error %d\n",errno);
exit(-1);
}
msg_len = BUFF_LEN;
for (i=0 ;i<100000;i++)
{
nbbytes = write(socket_id1,&msg_len, sizeof(msg_len));
if ( !nbbytes )
{
printf(" write error : %d\n",errno);
break;
}
nbbytes = write(socket_id1,message,msg_len);
if ( !nbbytes )
{
printf(" write error : %d\n",errno);
break;
}
/* read acknowledge */
nbbytes = read(socket_id1,buffer,4);
if (nbbytes == 0)
{
printf(" read error : %d\n",errno);
break;
}
}
close(socket_id1);
printf(" Thread: The thread end!\n");
}
main(int argc, char **argv)
{
struct timespec sleep_time;
pthread_t thread_id[MAX_THREADS];
int terror=0;
int exitstatus;
void* result;
int i;
for (i=0; i< MAX_THREADS; i++)
{
printf("Main: Create the thread \n");
/* Create the thread
*/
#if _POSIX_C_SOURCE == 199506L
terror = pthread_create (&thread_id[i], NULL,
(void *(*)(void*)) connect_and_emit,
NULL);
#else
terror = pthread_create (&thread_id[i], pthread_attr_default,
(void *(*)(void*)) connect_and_emit,
NULL);
#endif
if (terror != 0)
{
printf("pthread_create() failed\n");
exit(-1);
}
}
/*
* wait for termination of threads
*/
for (i=0; i< MAX_THREADS; i++)
pthread_join(thread_id[i], &result);
exit(0);
}
------------------------------------------------------------------------------
recv_thr.c
----------
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/un.h>
#include <sys/socket.h>
#define MAX_THREADS 24
#define SOCKET_NAME "my_sockets"
#define INBUFF_LEN 1024
#define BUFF_LEN 600
char message[BUFF_LEN];
struct param {
int indx;
int socket;
} ;
struct param param[MAX_THREADS];
int pkt_received[MAX_THREADS];
receive(struct param *param)
{
size_t nbbytes;
char buffer[INBUFF_LEN];
unsigned long nbpacket;
int msg_len;
printf("reading ...\n");
nbpacket=0;
while (1)
{
nbbytes = read(param->socket,&msg_len,sizeof(msg_len));
if (nbbytes == 0)
{
printf(" read error : %d\n",errno);
break;
}
nbbytes = read(param->socket,buffer,msg_len);
if (nbbytes == 0)
{
printf(" read error : %d\n",errno);
break;
}
/* send acknowledge */
nbbytes = write(param->socket,&msg_len, sizeof(msg_len));
if ( !nbbytes )
printf(" write error : %d\n",errno);
nbpacket++;
}
pkt_received[param->indx] = nbpacket;
close(param->socket);
}
main(int argc, char **argv)
{
int socket_id1,socket_id2;
struct sockaddr_un address;
struct sockaddr_un acc_address;
size_t addr_len;
int status;
struct timespec sleep_time;
pthread_t thread_id[MAX_THREADS];
int terror;
int exitstatus;
void* result;
int i;
long pckts_per_sec;
double ftime1, ftime2;
struct timespec time1, time2;
address.sun_family = AF_UNIX;
strcpy(address.sun_path,SOCKET_NAME);
socket_id1 = socket(AF_UNIX,SOCK_STREAM,0);
if ( socket_id1 == -1)
{
printf("socket error %d\n",errno);
exit(-1);
}
status = bind(socket_id1,&address,sizeof(address));
if ( status == -1)
{
printf("bind error %d\n",errno);
exit(-1);
}
status = listen(socket_id1,SOMAXCONN);
if ( status == -1)
{
printf("listen error %d\n",errno);
exit(-1);
}
printf(" listen ok\n");
for (i=0; i<MAX_THREADS;i++)
{
printf(" accepting conn\n");
socket_id2 = accept(socket_id1,&acc_address,&addr_len);
if ( socket_id2 == -1)
{
printf("accept error %d\n",errno);
exit(-1);
}
printf(" accept ok\n");
/* get starting time */
if (i==0)
getclock(TIMEOFDAY,&time1);
param[i].indx = i;
param[i].socket = socket_id2;
printf("Main: Create the thread \n");
/* Create the thread
*/
#if _POSIX_C_SOURCE == 199506L
terror = pthread_create (&thread_id[i], NULL,
(void *(*)(void*)) receive,
¶m[i]);
#else
terror = pthread_create (&thread_id[i], pthread_attr_default,
(void *(*)(void*)) receive,
¶m[i]);
#endif
if (terror != 0)
{
printf("pthread_create() failed\n");
exit(-1);
}
}
/*
* wait for termination of threads
*/
for (i=0;i<MAX_THREADS;i++)
{
pthread_join(thread_id[i], &result);
}
/* compute average */
getclock(TIMEOFDAY,&time2);
ftime1 = time1.tv_sec + time1.tv_nsec * 0.000000001;
ftime2 = time2.tv_sec + time2.tv_nsec * 0.000000001;
ftime1 = ftime2 - ftime1;
printf(" elapsed time : %f\n",ftime1);
pckts_per_sec = 0;
for (i=0;i<MAX_THREADS;i++)
{
pckts_per_sec += pkt_received[i];
}
printf(" number of packet sent : %ld\n",pckts_per_sec);
printf(" packets per second : %f\n", (float)pckts_per_sec/ftime1);
close(socket_id1);
unlink(SOCKET_NAME);
exit(0);
}
|
10014.3 | Very fast I/O? | WTFN::SCALES | Despair is appropriate and inevitable. | Tue Jun 03 1997 12:40 | 52 |
| .2> You did it well .... (joke :-)
Unfortunately, sometimes, in order to make a step forward, you have to take a
step backward; in this case, it was a big step forward, and we apparently
haven't yet uncovered or recovered from our various steps backward... :-}
.2> I returned to my desk and have writen the same kind of program but using
.2> threads.
Your program looks reasonable to me. (My command of socket programming is
not strong, but it looks like you avoided the obvious pitfalls.) I'm a
little surprised and disappointed to find that it doesn't affect your
performance.
.2> If i understant well, the performances have been optimized in the case
.2> where a thread is blocked and gives the hand to an other thread of the
.2> same process (case of several thread locking/unlocking a mutex) right ?
The thrust of our model is that user mode synchronization, such as blocking
on a mutex, should be _very_ fast, while kernel mode synchronization, such as
blocking for I/O, should not decrease the level of concurrency in the process
and should be as efficient as possible, the expectation being that there
should be no increase in latency and that the extra CPU cost would be
recouped in concurrent execution and in decrease of scheduling latency for
other threads in the application.
.2> most of the time when a thread is blocked (on a read syst. call for
.2> example) it doesn't give the hand to an other thread of the same process
.2> but unblock the receiver thread of the target process.
Is the target process typically on the same machine as the sending process?
That is, it was _very_ interesting that the I/O throughput was almost
unaffected (less than a few percent) by the number of threads you used --
this suggests that the I/O has somehow been optimized to the extent where it
is so fast that it is thread scheduling, and not the I/O system, which is the
bottleneck. And, any time that the overhead of thread scheduling itself is
the bottleneck, then using threads is unlikely to be a win for you (unless
performance is not an issue or unless you have some sort of contention
problem in your application).
Thus, if your expected deployment is always on a single machine, you might
want to investigate using shared memory to communicate rather than sockets.
Otherwise, if your typical deployment involves multiple machines, you might
want to try running your benchmark in a more representative environment.
In the meantime, please feel free to enter a QAR the on V4.0B performance.
Please include your pair of multithreaded test programs and then numbers that
you saw on both platforms. (If you decide to QAR either of the other
performance/scaling problems you saw, please enter them as separate QARs.)
Webb
|
10014.4 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Tue Jun 03 1997 14:35 | 44 |
| >.2> I returned to my desk and have writen the same kind of program but using
>.2> threads.
>
>Your program looks reasonable to me. (My command of socket programming is
>not strong, but it looks like you avoided the obvious pitfalls.) I'm a
>little surprised and disappointed to find that it doesn't affect your
>performance.
I'm not surprised or disappointed. This is a "degenerate" case, that,
unfortunately, has become much more common that we'd expected. Our assumption
(and these assumptions are very difficult to test) was that most threaded
code would do most synchronization in user mode, within the process -- that
is, primarily blocking on condition variables and mutexes. Furthermore, there
will usually be threads ready to perform work that haven't been able to get a
VP. The upcall protocol for kernel-blocking I/O allows the kernel to tell us
that a VP is free to handle one of those ready threads.
As Jean-Marie noted in .2, this is NOT common of "telecom" applications. They
do most of their blocking in the kernel, primarily in I/O, and suffer greatly
from the overhead of upcalls. Furthermore, because they rarely have threads
that are READY to do work, we gain no useful concurrency from the upcalls.
Absolutely worst-case performance.
This sort of application won't ever benefit from 2-level scheduling, because
only one level is doing (nearly) all of the scheduling anyway. You should be
creating your threads with POSIX system contention scope. Doing so would drop
our user-mode scheduler (which isn't helping you) out of the loop. Your mutex
and condition variable synchronization will be more expensive, because they
will require kernel calls -- but you don't do many anyway, relative to the
number of I/O calls, so that shouldn't affect your performance much.
System contention scope is implemented for Digital UNIX 4.0D.
Oh, one more thing... you're building wrong in your 3.2 lines. You should be
using -threads, not "-lpthreads -lmach -lc_r". Although you have the right
libraries, they're also provided by -threads. And you're missing the critical
-D_REENTRANT compilation option.
I was about to try a comparison, but I see that your test programs don't
write any performance metric, nor did you say how you arrived at the numbers
you've posted. What command sequence do you actually use to run these test
programs and to generate the performance numbers?
/dave
|
10014.5 | | TAEC::URAGO | | Wed Jun 04 1997 05:12 | 33 |
| Webb, dave,
Thanks for all your explanations, the impact of the 2-level scheduling in our
case in now clearer for me.
happy to see that we are a "degenerate" case .... :-(.
Anyway, as the performances AND the support of DUNIX V4.0 are one of the goals
of our TeMIP Vnext, I will try D-UNIX V4.0D and the system contention scope.
Concerning my tests programs, sorry for the lake of explanations.
To use it, on D-UNIX V4.0, just start recv_thr_du40 in a session and then
start send_thr_du40 in another session.
Each sending thread (from send_thr_du40 process) emit 100000 packets and exits.
On the receiver side when all packets have been received (end by read error)
the total number of received packets is divided by the elapsed time and then
gives the number of packets received per second. This is done by recv_thr_du40.
I know that we can do something best in terms of statistics but this gives an
idea of the overall performances.
The number of threads of each process is set by the #define MAX_THREADS (must be
the same on each side)
running the tests with 3 threads take 1min 30sec on my 3000/700
running it with 6 --> 3 minutes etc ... be patient !
Do not hesitate to ask me more info off-line if you have any trouble using it.
Regards,
Jean-marie
|
10014.6 | Data and new code... | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Wed Jun 04 1997 09:04 | 440 |
| OK, I understand the problem I had running it. The two programs must use the
same number of threads for successful completion -- and in the source you
posted, one used 3 threads and one used 24 threads. So it didn't complete,
and didn't print the statistical information.
I added setup code to take the number of threads as an argument (-t<n>), and
to handle an option (-s) to create system contention scope threads.
On "Digital UNIX 4.0D" (actually, it's not: it's a 4.0B system with a special
kernel and my latest [debug] private sandbox thread library), with my
AlphaStation 600 5/266 workstation, I get:
Process Contention Scope (default):
3 threads 6 threads 12 threads 24 threads
10439 pkts 8488 pkts 7957 pkts 7164 pkts
System Contention Scope:
3 threads 6 threads 12 threads 24 threads
13431 pkts 12726 pkts 12270 pkts 12575 pkts
I don't know what (if anything) it means that I'm getting substantially
better numbers than yours even for PCS. Faster CPU? For comparison, an
identical (hardware) system running stock Digital UNIX 4.0 showed only 7076
packets/sec with 3 threads -- but that's still twice your numbers.
PCS on 4-CPU AlphaServer 2100A 5/300:
3 threads 6 threads 12 threads 24 threads
3293 pkts 4573 pkts 5083 pkts 4982 pkts
SCS on 4-CPU AlphaServer 2100A 5/300:
3 threads 6 threads 12 threads 24 threads
4823 pkts 14799 pkts 15621 pkts 16115 pkts
My code follows: (Note that it won't compile on 3.2 -- I didn't even pretend
to support both interfaces.) As well as adding the options and improving some
messages, I also cleaned up the termination protocol to avoid the annoying
spurious receive error messages (the sender sends a message length of 0 to
terminate).
/*
* send_thr.c
*/
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/un.h>
#include <sys/socket.h>
#define MAX_THREADS 24
#define ARGS "t:s"
#define SOCKET_NAME "my_sockets"
#define INBUFF_LEN 1024
#define BUFF_LEN 600
char message[BUFF_LEN];
void connect_and_emit()
{
int socket_id1;
struct sockaddr_un address;
int addr_len;
char buffer[INBUFF_LEN];
int status;
size_t nbbytes;
int msg_len;
int i;
printf(" Thread %d begins!\n", pthread_getselfseq_np());
address.sun_family = AF_UNIX;
strcpy(address.sun_path,SOCKET_NAME);
socket_id1 = socket(AF_UNIX,SOCK_STREAM,0);
if ( socket_id1 == -1)
{
printf("socket error %d\n",errno);
exit(-1);
}
status = connect(socket_id1,&address,sizeof(address));
if ( status == -1 )
{
printf("connect error %d\n",errno);
exit(-1);
}
msg_len = BUFF_LEN;
for (i=0 ;i<100000;i++)
{
nbbytes = write(socket_id1,&msg_len, sizeof(msg_len));
if ( !nbbytes )
{
printf(" (len) write error : %d\n",errno);
break;
}
nbbytes = write(socket_id1,message,msg_len);
if ( !nbbytes )
{
printf(" (data) write error : %d\n",errno);
break;
}
/* read acknowledge */
nbbytes = read(socket_id1,buffer,4);
if (nbbytes == 0)
{
printf(" (ack) read error : %d\n",errno);
break;
}
}
msg_len = 0;
nbbytes = write(socket_id1,&msg_len, sizeof(msg_len));
if ( !nbbytes )
{
printf(" (done) write error : %d\n",errno);
}
close(socket_id1);
printf(" Thread %d done!\n", pthread_getselfseq_np());
}
main(int argc, char **argv)
{
struct timespec sleep_time;
pthread_t thread_id[MAX_THREADS];
int terror=0,threadcount=MAX_THREADS;
int exitstatus;
void* result;
int i, errflg, c, status;
pthread_attr_t attr;
status = pthread_attr_init (&attr);
if (status != 0) {
printf ("Attr init failed\n");
exit (-1);
}
optarg = NULL;
errflg = 0;
while (!errflg && ((c = getopt (argc, argv, ARGS)) != -1))
switch (c) {
case 't':
threadcount = atoi (optarg);
printf ("Using %d threads\n", threadcount);
break;
case 's':
printf ("Setting SCS\n");
status = pthread_attr_setscope (&attr, PTHREAD_SCOPE_SYSTEM);
if (status != 0) {
printf ("Error setting scope\n");
exit (-1);
}
break;
default:
errflg++;
}
if (errflg) {
printf ("%s: usage %s\n", argv[0], ARGS);
exit (-1);
}
if (threadcount > MAX_THREADS) {
printf ("Too many threads (%d): using %d\n", threadcount,
MAX_THREADS);
threadcount = MAX_THREADS;
}
for (i=0; i< threadcount; i++)
{
printf("Main: Create the thread \n");
/* Create the thread
*/
#if _POSIX_C_SOURCE == 199506L
terror = pthread_create (&thread_id[i], &attr,
(void *(*)(void*)) connect_and_emit,
NULL);
#else
terror = pthread_create (&thread_id[i], pthread_attr_default,
(void *(*)(void*)) connect_and_emit,
NULL);
#endif
if (terror != 0)
{
printf("pthread_create() failed\n");
exit(-1);
}
}
/*
* wait for termination of threads
*/
for (i=0; i< threadcount; i++)
pthread_join(thread_id[i], &result);
exit(0);
}
/*
* recv_thr.c
*/
#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/un.h>
#include <sys/socket.h>
#define MAX_THREADS 24
#define SOCKET_NAME "my_sockets"
#define INBUFF_LEN 1024
#define ARGS "t:s"
#define BUFF_LEN 600
char message[BUFF_LEN];
struct param {
int indx;
int socket;
} ;
struct param param[MAX_THREADS];
int pkt_received[MAX_THREADS];
receive(struct param *param)
{
size_t nbbytes;
char buffer[INBUFF_LEN];
unsigned long nbpacket;
int msg_len;
printf(" Thread %d begins!\n", pthread_getselfseq_np());
nbpacket=0;
while (1)
{
nbbytes = read(param->socket,&msg_len,sizeof(msg_len));
if (nbbytes == 0)
{
printf(" (len) read error : %d\n",errno);
break;
}
if (msg_len == 0) /* Done */
break;
nbbytes = read(param->socket,buffer,msg_len);
if (nbbytes == 0)
{
printf(" (buf) read error : %d\n",errno);
break;
}
/* send acknowledge */
nbbytes = write(param->socket,&msg_len, sizeof(msg_len));
if ( !nbbytes )
printf(" (ack) write error : %d\n",errno);
nbpacket++;
}
pkt_received[param->indx] = nbpacket;
close(param->socket);
printf(" Thread %d done!\n", pthread_getselfseq_np());
}
main(int argc, char **argv)
{
int socket_id1,socket_id2;
struct sockaddr_un address;
struct sockaddr_un acc_address;
size_t addr_len;
int status,threadcount=MAX_THREADS;
struct timespec sleep_time;
pthread_t thread_id[MAX_THREADS];
int terror;
int exitstatus;
void* result;
int i,errflg,c;
long pckts_per_sec;
double ftime1, ftime2;
struct timespec time1, time2;
pthread_attr_t attr;
status = pthread_attr_init (&attr);
if (status != 0) {
printf ("Attr init failed\n");
exit (-1);
}
optarg = NULL;
errflg = 0;
while (!errflg && ((c = getopt (argc, argv, ARGS)) != -1))
switch (c) {
case 't':
threadcount = atoi (optarg);
printf ("Using %d threads\n", threadcount);
break;
case 's':
printf ("Setting SCS\n");
status = pthread_attr_setscope (&attr, PTHREAD_SCOPE_SYSTEM);
if (status != 0) {
printf ("Error setting scope\n");
exit (-1);
}
break;
default:
errflg++;
}
if (errflg) {
printf ("%s: usage %s\n", argv[0], ARGS);
exit (-1);
}
if (threadcount > MAX_THREADS) {
printf ("Too many threads (%d): using %d\n", threadcount,
MAX_THREADS);
threadcount = MAX_THREADS;
}
address.sun_family = AF_UNIX;
strcpy(address.sun_path,SOCKET_NAME);
socket_id1 = socket(AF_UNIX,SOCK_STREAM,0);
if ( socket_id1 == -1)
{
printf("socket error %d\n",errno);
exit(-1);
}
status = bind(socket_id1,&address,sizeof(address));
if ( status == -1)
{
printf("bind error %d\n",errno);
exit(-1);
}
status = listen(socket_id1,SOMAXCONN);
if ( status == -1)
{
printf("listen error %d\n",errno);
exit(-1);
}
printf(" listen ok\n");
printf("Main: Create the threads\n");
for (i=0; i<threadcount;i++)
{
printf(" accepting conn\n");
socket_id2 = accept(socket_id1,&acc_address,&addr_len);
if ( socket_id2 == -1)
{
printf("accept error %d\n",errno);
exit(-1);
}
printf(" accept ok\n");
/* get starting time */
if (i==0)
getclock(TIMEOFDAY,&time1);
param[i].indx = i;
param[i].socket = socket_id2;
/* Create the thread
*/
#if _POSIX_C_SOURCE == 199506L
terror = pthread_create (&thread_id[i], &attr,
(void *(*)(void*)) receive,
¶m[i]);
#else
terror = pthread_create (&thread_id[i], pthread_attr_default,
(void *(*)(void*)) receive,
¶m[i]);
#endif
if (terror != 0)
{
printf("pthread_create() failed\n");
exit(-1);
}
}
/*
* wait for termination of threads
*/
for (i=0;i<threadcount;i++)
{
pthread_join(thread_id[i], &result);
}
/* compute average */
getclock(TIMEOFDAY,&time2);
ftime1 = time1.tv_sec + time1.tv_nsec * 0.000000001;
ftime2 = time2.tv_sec + time2.tv_nsec * 0.000000001;
ftime1 = ftime2 - ftime1;
printf(" elapsed time : %f\n",ftime1);
pckts_per_sec = 0;
for (i=0;i<threadcount;i++)
{
pckts_per_sec += pkt_received[i];
}
printf(" number of packet sent : %ld\n",pckts_per_sec);
printf(" packets per second : %f\n", (float)pckts_per_sec/ftime1);
close(socket_id1);
unlink(SOCKET_NAME);
exit(0);
}
|
10014.7 | | TAEC::URAGO | | Wed Jun 04 1997 10:15 | 18 |
|
"I don't know what (if anything) it means that I'm getting substantially
better numbers than yours even for PCS. Faster CPU? For comparison, an
identical (hardware) system running stock Digital UNIX 4.0 showed only 7076
packets/sec with 3 threads -- but that's still twice your numbers."
>>> identical hardware ... to yours ?
If yes, your results in PCS with 3 threads with your 4.0D show 10439 pkts.
does that mean that you win 30% with the 4.0D even in PCS compared to 4.0B?
For info, I have tested the following hardware with 4.0B:
Dec 3000 - M700 : 3 threads : 3400 pkts
AlphaStation 255/300 : 3 threads : 3650 pkts
AlphaStation 500/400 : 3 threads : 8000 pkts
Jean-marie
|
10014.8 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Wed Jun 04 1997 10:33 | 22 |
| >>> identical hardware ... to yours ?
Yes, I mean two AlphaStation 600 5/266 systems (ALCOR). But remember, I'm not
talking "4.0B" to "4.0D". One was a "stock", unpatched 4.0 file server -- the
other (my workstation) is a hacked-up 4.0B system with a nightly-build 4.0D
kernel and a debug thread library from my sandbox. Nevertheless, to the
limited extent that a comparison can be considered valid, I gained about 48%
(7076 -> 10439) moving from the 4.0 system to the "4.0D" system, with PCS
threads. I'm not sure how my AlphaStation 600 5/266 and your AlphaStation
500/400 compare, and I assume that's the basis of your "30%", so I can't say
whether that's even as valid as my comparison.
Actually, the interesting thing about my results, in case anyone didn't
notice, is the SMP numbers. The AlphaSever 2100A (with a faster chip) are
substantially worse than the AlphaStation 600 numbers -- ALWAYS, for PCS, and
also for SCS until the processors are "saturated" (there's over 300%
improvement from 3 threads to 6 threads). The SMP 12 and 24 thread numbers go
UP from there, whereas the uniprocessor numbers go down as contention
increases. I don't pretend to understand this, and I'm not inclined, at this
time, to even try.
/dave
|
10014.9 | EV4 -> EV5 doubles performance at same clock | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Wed Jun 04 1997 15:40 | 9 |
| If you must compare Alphas to oranges...
System CPU MHz approx SPECint95
DEC 3000/700 EV45 225? 4
Astn 255/300 EV45 300 5
Astn 600 5/266 EV5 266 8
Asvr2100A 5/300 EV5 300 9
Astn 500/400 EV56 400 12
|
10014.10 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Thu Jun 05 1997 09:00 | 19 |
| > DEC 3000/700 EV45 225? 4
> Astn 255/300 EV45 300 5
> Astn 600 5/266 EV5 266 8
> Asvr2100A 5/300 EV5 300 9
> Astn 500/400 EV56 400 12
So none of Jean-Marie's numbers are comparable to any of mine, even if the
versions were identical. The numbers in .7 are EV4, EV45, and EV56
respectively, while mine are EV5.
Thanks, Bill. Bevin Brett & I looked up the AlphaStation 255 series in
AltaVista the other day for a different problem, and we couldn't find
anything that identified the actual chip. (Annoying.) I like the "<model>
<chip>/<fudge-speed>" convention, despite the fact that the speeds are fudged
and it doesn't distinguish "4" from "45" or "5" from "56". I wish they'd just
clean that up and use it consistently instead of switching back and forth
between that and the "<model>/<fudge-speed>" style!
/dave
|
10014.11 | SPEC disclosures come in handy | PERFOM::HENNING | | Fri Jun 06 1997 10:14 | 5 |
| The workstation group managed to implement three DIFFERENT naming
conventions in a single year (sigh). But if you want to find out
what's really inside any specific box, you might want to bookmark
http://www.specbench.org/cgi-bin/osgresults?conf=cpu95
|