On 3/12/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
pread/pwrite address a miniscule fraction of lseek+read(v)/write(v)
use cases -- a fraction that someone cared about strongly enough to
get into X/Open CAE Spec Issue 5 Version 2 (1997), from which it
propagated into UNIX98 and thence into POSIX.2 2001. The fact that no
one has bothered to implement preadv/pwritev in the decade since
pread/pwrite entered the Single UNIX standard reflects the rarity with
which they appear in general code. Life is too short to spend it
rewriting application code that uses readv/writev systematically,
especially when that code is going to ship inside a widget whose
kernel you control.
Oh, please. Like _your_ employer is the poster child for code
quality. The cheap shot is also irrelevant to the point that I was
making, which is that sometimes portability simply doesn't matter and
the right thing to do is to firm up the semantics of the filesystem
primitives from underneath.
Quality means the devices you ship now keep working in the field, and
the probable cost of later rework if the requirements change does not
exceed the opportunity cost of over-engineering up front. Economy
gets a look-in too, and says that it's pointless to delay shipment and
bloat the application coding for cases that can't happen. If POSIX
says that any and all writes (except small pipe/FIFO writes, whatever)
can return a short byte count -- but you know damn well you're writing
to a device driver that never, ever writes short, and if it did you'd
miss a timing budget recovering from it anyway -- to hell with POSIX.
And if you want to build a test jig for this code that uses pipes or
dummy files in place of the device driver, that test jig should never,
ever write short either.
Not on a properly engineered widget, it won't. And if it does, and
the application isn't coded to cope in some way totally different from
an infinite retry loop, then you might as well signal the exception
condition using whatever mechanism is appropriate to the API
(-EWHATEVER, SIGCRISIS, or block until some other process makes room).
And in any case files on disk are the least interesting kind of file
descriptor in an embedded scenario -- devices and pipes and pollfds
and netlink sockets are far more frequent read/write targets.
Guaranteed by the standard, sure. Guaranteed by the implementation,
as long as you write in the size blocks that the device is expecting?
Lots of devices -- ALSA's OSS PCM emulation, most AF_LOCAL and
AF_NETLINK sockets, almost any "character" device with a
record-structured format. A short write to any of these almost
certainly means the framing is screwed and you need to close and
reopen the device. Not all of these are exclusively O_APPEND
situations, and there's no reason on earth not to thread-safe the
f_pos handling so that an application and filesystem/driver can agree
on useful lseek() semantics.
Here we agree. Except that I've rarely seen embedded application code
that wouldn't explode in my face if I tried it. Databases yes, and
the better class of mail and web servers, and relatively mature
scripting languages and bytecode interpreters; but the vast majority
of working programmers in these latter days do not exercise this level
of discipline.
Even in the "enterprise" space, most of the QA and testing people I
have dealt with couldn't hook a syscall if their children were
starving and the fish were only biting on syscalls. The embedded
situation is even worse. ltrace didn't work on ARM for years and
hardly anyone _noticed_, let alone did anything about it. (I posted a
fix to ltrace-devel a month ago, but evidently the few fish left in
that river don't bite on hooked syscalls.) strace maintenance doesn't
seem too healthy either, judging by what I had to do it in order to
get it to recognize ALSA ioctls and the quasi-syscalls involved in ARM
TLS.
But on that note -- do you have any idea how one might get ltrace to
work on a multi-threaded program, or how one might enhance it to
instrument function calls from one shared library to another? Or
better yet, can you advise me on how to induce gdbserver to stream
traces of library/syscall entry/exits for all the threads in a
process? And then how to cram it down into the kernel so I don't take
the hit for an MMU context switch every time I hit a syscall or
breakpoint in the process under test? That would be a really useful
tool for failure analysis in embedded Linux, doubly so on multi-core
chips, especially if it could be made minimally intrusive on the
CPU(s) where the application is running.
Cheers,
- Michael
-