[Raw Msg Headers][Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Malloc bug in scheduler
- To: <zmailer@nic.funet.fi>
- Subject: Re: Malloc bug in scheduler
- From: Martin Wendel <martin.wendel@its.uu.se>
- Date: Tue, 24 Feb 1998 17:03:26 +0100
- Cc: <mea@nic.funet.fi>
- Illegal-Object: Syntax error in Message-Id: value found on uria.its.uu.se:Message-Id: <5348.888336206.1043154386.3751@> ^-illegal subdomain in domain, propably extra '.' at the end of the address
- Reply-To: Martin Wendel <martin.wendel@its.uu.se>
>> I tried to install zmailer-2.99.49p9 (with patch1) on AIX 4.2.1,
>> but there is a bug in scheduler that makes it wait for ever.
>
> I know of a problem on router, but this one is new to me.
>
>> Scheduler log says this:
>>
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>> scheduler[9682]: malloc(4294965232): virtual memory exceeded, sleeping
>>
>> It seems to me scheduler calls emalloc with a negative argument. The
>> other processes does not produce these errors.
I was wrong about the negative argument,
In scheduler/resources.c resources_query_nofiles() I get the following
values: rc=0 rl.rlim_cur=2147483647 which leads to, in transport.c
stashprocess() i=2147483647 and then malloc(4294965232).
Undefining HAVE_SETRLIMIT makes it work, of course.
When doing a small test:
#include <stdio.h>
#include <sys/resource.h>
main()
{
struct rlimit rl;
printf("getrlimit(RLIMIT_NOFILE,&rl)=%d\n", getrlimit(RLIMIT_NOFILE,&rl));
printf("rl.rlim_cur=%d\n", rl.rlim_cur);
printf("getdtablesize()=%d\n", getdtablesize());
}
I get:
getrlimit(RLIMIT_NOFILE,&rl)=0
rl.rlim_cur=2147483647
getdtablesize()=2000
It seems RLIMIT_NOFILE is not defined in AIX 4.1.5 so the problem never occurs
there.
In resource.h (AIX4.2.1):
#define RLIMIT_NOFILE 7 /* max # allocated fds--not enforced */
This looks like the maximum number of files in the system, not open files per
process, and it is reasonable that it is a signed 32bit number. Since Solaris2.5
gives 64 on both rl.rlim_cur and getdtablesize(), HU-UX 10.20 gives 60 on both
and Digital Unix 4.0 gives 4096 on both. It seems they are not referring to
the same thing as AIX4.2.1 (NOFILE - Number of Open FILEs / NO of FILEs).
As I see it, ignore RLIMIT_NOFILE on AIX and use getdtablesize() instead.
----
Mail: Martin Wendel, IT-Support, Uppsala university, S-751 08 Uppsala, Sweden
Phone: +46-18-4717780, Fax: +46-18-4717725