On Tue, 11 Sep 2007 11:30:38 +0200 Are you sure ? segfaulting are sysloged only on 64bits kernel. Maybe your slapd/hscan processes are doing bad things, that make them core dump without notice on a 32bits kernel. Eric -
A very wild guess: AFAIK SUSE Distributions are XENified recently, that is they have libraries that treat thread local storage differently from the default. If these programs (powersaved, slapd, hscan) are all multithreaded, could it be that the cause of the problem is in that area? If not, any clues on debugging/tracing? There's a /usr/src/linux/Documentation/oops-tracing.txt, but no "segfault-tracing". I also learned that the error code is only documented for i386 arch (thanks to Emacs ediff): * error_code: * bit 0 == 0 means no page found, 1 means protection fault * bit 1 == 0 means read, 1 means write * bit 2 == 0 means kernel, 1 means user-mode So the problem (error 4) looks a bit like a read on a NULL-pointer dereference, right? And the "rip" is user space, correct? Regards, Ulrich -
rip points to userspace. If you are about dereferencing, look at rax. If it is 0, it usually is logical what happened. If it is slightly above, someone tried to access like foo->bar where foo==NULL. -
That would be because it has fsck-all to do with the kernel. Get the coredump, then use gdb to deal with it. -
Ok, but why is the message there at all? I think in Windows/XP the offending code and the registers are shown in such occasions. I'd say either drop the message, or improve it. It's also difficult to find the code after the program is gone due to mapping of shared libraries. I managed to get a core dump of the application however, and I did modify some code. I'll report once I have results. Maybe it's "mea culpa" for my program, but powersaved and slapd are still to be examined. Regards, Ulrich -
On 11 Sep 2007 at 15:01, Eric Dumazet wrote:
I'm using the senddmail milter library that does the socket communication. So any
bad things should be searched there.
I tend to think that the same program when being compiled as a 32-bit executable
does not cause these segfaults on a 64 bit kernel.
I also tried to use ksymoops to get a disassembly of the corresponding kernel
code, but the result did not look good to me.
Is there a deeper reason why the kernel does not provide more info (like a call
trace) on segfaults?
Will an strace of the program (multi-threaded, unfortunately, just as slapd (most
likely)) be helpful?
When I tried it for slapd, the (rest of the) strace was:
9931 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
9931 connect(3, {sa_family=AF_INET, sin_port=htons(427), sin_addr=inet_addr("12
7.0.0.1")}, 16) = 0
9931 setsockopt(3, SOL_SOCKET, SO_RCVLOWAT, [18], 4) = 0
9931 setsockopt(3, SOL_SOCKET, SO_SNDLOWAT, [18], 4) = -1 ENOPROTOOPT (Protocol
not available)
9931 mmap(NULL, 1434435584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1
, 0) = 0x2aaaaae32000
9931 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Regards,
Ulrich
-On Tue, 11 Sep 2007 17:15:26 +0200 Definitly a user mode problem, dereferencing a NULL pointer. Try to attach gdb on this process instead of stracing it, then a "bt" command should tell you some usefull things. Strange thing here is that this program wants a huge block of memory (1434435584 bytes), so maybe some file is corrupted, maybe you should check database integrity first. -
| Satyam Sharma | Re: 2.6.23-rc6-mm1 |
| Robin Lee Powell | NFS hang + umount -f: better behaviour requested. |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Michal Piotrowski | Re: 2.6.22-rc4-mm2 |
git: | |
| Shawn Pearce | Re: [RFC] Submodules in GIT |
| Linus Torvalds | People unaware of the importance of "git gc"? |
| Martin Langhoff | Handling large files with GIT |
| Pierre Habouzit | [PATCH] git-revert is one of the most misunderstood command in git, help users out. |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Douglas A. Tutty | OBSD's perspective on SELinux |
| askthelist | Packets Per Second Limit? |
| Christophe Rioux | OpenBSD as host for VMWare Server |
| Daniel J Blueman | [sky2, solved] transmit timeouts and firmware update... |
| Octavian Purdila | [RFC] support for IEEE 1588 |
| Johann Baudy | Packet mmap: TX RING and zero copy |
| Evgeniy Polyakov | [resend take 2 0/4] Distributed storage. |
