utrace is a new kernel-side API for kernel modules, intended to make it tractable to work on novel ways to trace and debug user-mode tasks. A previous utrace prototype was in all Fedora kernels since Fedora Core 6. Some substantial implementation and API details in the current code are different from those past versions. Please look freshly at these patches. The current utrace code here is now included in Fedora Rawhide kernels. People interested in this code are on the utrace-devel@redhat.com mailing list. (See https://www.redhat.com/mailman/listinfo/utrace-devel for list info.) There is a wiki for this work at http://sourceware.org/systemtap/wiki/utrace (with not much content, feel free to add some). Current copies of these patches are at kept at http://people.redhat.com/roland/utrace/ as well as in the GIT tree shown below. This code cannot be enabled without CONFIG_HAVE_ARCH_TRACEHOOK and the arch details it indicates. In Linus's tree as of v2.6.27-rc4, only powerpc and sparc64 have that support. The x86 support is available by merging in the tip/x86/tracehook branch. For working on other arch support, there are some more details at http://sourceware.org/systemtap/wiki/utrace/arch/HowTo and these are mentioned in the comments in arch/Kconfig too (in v2.6.27-rc4). The first patch adds the utrace kernel API (if CONFIG_UTRACE=y is set). There is no change at all without the config option, and with it there is no effect on anything at all until a kernel module using the utrace API is loaded. There is detailed documentation on the API in DocBook form. The second patch adds the CONFIG_UTRACE_PTRACE option. When set, this makes ptrace use the utrace API as much as is necessary so that using both ptrace and utrace to debug the same threads at the same time won't become confused. The ptrace changes are somewhat kludgey. They're intended to be the simplest, non-regressing thing that suffices to enable hacking on new utrace modules while also doing normal ptrace-based ...
This adds the utrace facility, a new modular interface in the kernel for implementing user thread tracing and debugging. This fits on top of the tracehook_* layer, so the new code is well-isolated. The new interface is in <linux/utrace.h> and the DocBook utrace book describes it. It allows for multiple separate tracing engines to work in parallel without interfering with each other. Higher-level tracing facilities can be implemented as loadable kernel modules using this layer. The new facility is made optional under CONFIG_UTRACE. When this is not enabled, no new code is added. It can only be enabled on machines that have all the prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK. In this initial version, utrace and ptrace do not play well together. If both utrace and ptrace are attached to the same thread, they can confuse each other about the stopping and resuming of that thread. The old ptrace code is unchanged and nothing using ptrace should be affected by this patch as long as utrace is not used at the same time. A later patch will make them cooperate properly. Signed-off-by: Roland McGrath <roland@redhat.com> --- Documentation/DocBook/Makefile | 2 +- Documentation/DocBook/utrace.tmpl | 566 +++++++++ fs/proc/array.c | 3 + include/linux/sched.h | 5 + include/linux/tracehook.h | 61 +- include/linux/utrace.h | 711 +++++++++++ init/Kconfig | 10 + kernel/Makefile | 1 + kernel/utrace.c | 2496 +++++++++++++++++++++++++++++++++++++ 9 files changed, 3853 insertions(+), 2 deletions(-) diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 1615350..92ca631 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -7,7 +7,7 @@ # list of DOCBOOKS. DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ - kernel-hacking.xml kernel-locking.xml deviceiobook.xml ...
I'll says this again: tracehook_* is pointless abstraction because there will be no second generic tracing facility. The author of second Again, embed struct utrace directly into task_struct. task_struct lifetime rules are way more tested than struct utrace ones. Add simple spinlock guarding all accesses (OK, I haven't looked very closely if it's possible) Nobody needs hundred-line utrace_attach with CPU barriers. Nobody needs RCU. Nobody needs restart logic. Reminder: that struct utrace double-free was P_I_T_A to debug. I'll check last utrace oops we talked is still there and bogus patch was applied (sorry, haven't slept night at all). And run to confirm that attach/detach/exec program still crashes it. There is PREEMPT_RCU now so it will be even more not INIT_RCU_HEAD is not needed, call_rcu() will overwrite rcu head unconditionally. --
As promised, quickly reproducible via expt_ptratt.c: kernel/utrace.c:532 if (likely(!utrace->stopped)) BUG: unable to handle kernel paging request at ffff88017c51c958 IP: [<ffffffff8025e38b>] utrace_stop+0x9b/0x120 PGD 202063 PUD b067 PMD 17d8e5163 PTE 800000017c51c160 Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/kernel/uevent_seqnum CPU 0 Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom Pid: 32731, comm: exe Tainted: G W 2.6.27-rc4-next-20080827-utrace #4 RIP: 0010:[<ffffffff8025e38b>] [<ffffffff8025e38b>] utrace_stop+0x9b/0x120 RSP: 0000:ffff88017c521c38 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88017c02b880 RCX: 0000000000000000 RDX: 000000000000ebea RSI: 00000000ffffffff RDI: 0000000000000001 RBP: ffff88017c521c58 R08: 0000000000000002 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88017c51c8e8 R13: ffff88017c51c918 R14: ffff88017c51c8e8 R15: ffff88017c02b880 FS: 00007fe23dcf66f0(0000) GS:ffffffff8054b600(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88017c51c958 CR3: 000000017c077000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process exe (pid: 32731, threadinfo ffff88017c520000, task ffff88017c02b880) Stack: ffff88017c51c8e8 ffff88017c521ce8 ffff88017c51c8e8 ffff88017c02b880 ffff88017c521c88 ffffffff8025e4b0 ffff88017f92e360 ffff88017c51c8f8 ffff88017f92e360 ffff88017c51c8f0 ffff88017c521d38 ffffffff8025fb05 Call Trace: [<ffffffff8025e4b0>] finish_resume_report+0xa0/0xd0 [<ffffffff8025fb05>] utrace_get_signal+0x305/0x710 [<ffffffff802408cc>] get_signal_to_deliver+0x24c/0x310 [<ffffffff8020ab54>] do_notify_resume+0xd4/0x860 [<ffffffff80253fcd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8042d812>] ? ...
Another one: kernel BUG at kernel/utrace.c:2275! invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/kernel/slab/utrace_attached_engine/objects CPU 0 Modules linked in: xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom Pid: 5118, comm: exe Tainted: G W 2.6.27-rc4-next-20080827-utrace #5 RIP: 0010:[<ffffffff802608ef>] [<ffffffff802608ef>] utrace_get_signal+0x6ff/0x710 RSP: 0018:ffff88017ace5c98 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff88017aced910 RCX: ffff88017e2b23c0 RDX: ffffffffffffffff RSI: ffff88017ace5cf8 RDI: ffff88017acc5640 RBP: ffff88017ace5d38 R08: 0000000000000002 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff88017aced908 R14: ffff88017aced900 R15: ffff88017acc5640 FS: 0000000000000000(0000) GS:ffffffff80551600(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fff249c8bc9 CR3: 000000017ad2c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process exe (pid: 5118, threadinfo ffff88017ace4000, task ffff88017acc5640) Stack: ffff88017ace5ef8 0000000000000002 0000000000000000 ffff88017ace5ef8 ffff88017ace5e78 ffff88017ace5f58 ffff88017e206288 0000000000000400 0000000000000401 ffff88017e206848 0000001000000000 0000000000010000 Call Trace: [<ffffffff802412fc>] get_signal_to_deliver+0x24c/0x310 [<ffffffff8020ab54>] do_notify_resume+0xd4/0x860 [<ffffffff8028ff08>] ? check_bytes_and_report+0x38/0xd0 [<ffffffff80209b7b>] ? sys_execve+0x5b/0x80 [<ffffffff802901c3>] ? check_object+0x223/0x250 [<ffffffff80290811>] ? init_object+0x51/0x90 [<ffffffff80256f5d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff80209b7b>] ? sys_execve+0x5b/0x80 [<ffffffff8020b9c7>] int_signal+0x12/0x17 Code: 00 00 48 89 41 10 48 8b 82 20 01 00 00 48 ...
And overwritten poison if run in parallel with while true; do killall -9 expl_ptratt killall -9 exe done ============================================================================= BUG utrace: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff88017c31e7b0-0xffff88017c31e7f0. First byte 0x6c instead of 0x6b INFO: Allocated in utrace_attach_task+0x1f4/0x3d0 age=13 cpu=1 pid=5377 INFO: Freed in utrace_free+0x16/0x20 age=5 cpu=1 pid=5377 INFO: Slab 0xffffe2000532ae90 objects=21 used=2 fp=0xffff88017c31e780 flags=0x80000000000000c3 INFO: Object 0xffff88017c31e780 @offset=1920 fp=0xffff88017c31e540 Bytes b4 0xffff88017c31e770: fc 1f ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ü.ÿÿ....ZZZZZZZZ Object 0xffff88017c31e780: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88017c31e790: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88017c31e7a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88017c31e7b0: 6c 6c 6b 6b 6b 6b 6b 6b ff ff ff ff 6b 6b 6b 6b llkkkkkkÿÿÿÿkkkk Object 0xffff88017c31e7c0: ff ff ff ff ff ff ff ff 6b 6b 6b 6b 6b 6b 6b 6b ÿÿÿÿÿÿÿÿkkkkkkkk Object 0xffff88017c31e7d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88017c31e7e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88017c31e7f0: 6a 6b 6b 6b 6b 6b 6b a5 jkkkkkk¥ Redzone 0xffff88017c31e7f8: bb bb bb bb bb bb bb bb »»»»»»»» Padding 0xffff88017c31e838: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Pid: 5382, comm: expl_ptratt Tainted: G W 2.6.27-rc4-next-20080827-utrace #5 Call Trace: [<ffffffff8028f989>] print_trailer+0xf9/0x160 [<ffffffff8028ff75>] check_bytes_and_report+0xa5/0xd0 [<ffffffff80290048>] check_object+0xa8/0x250 [<ffffffff80291173>] __slab_alloc+0x4f3/0x670 ...
Yes, I've also said this went the old monolithic utrace was first posted (probably over a year ago) and it was conveniently ignored.. --
The most consistent feedback I've seen to all new features is that they mustn't add overhead when they're not being used. So I never considered it an option to bloat task_struct by ~120 bytes. Of course much more than that is entailed when a task is actually being traced somehow. But the presumption is that most tasks most of the time aren't, and that's what not to bloat. The revamp of the API after the first prototype made some of the internals much simpler to implement, that had been very sticky in the old prototype code. But the allocation and freeing of struct utrace is an area I did not fully revisit. Buggy is buggy, and sure it needs to be tested and fixed. I can't tell what you mean here. Do you mean something different from struct utrace.lock? If there were no pointer and its allocation I see. Thanks! At some point, this was in the recommended example uses of RCU. This macro is still used in several places around the kernel. Shouldn't they all be removed? I wonder why it still exists in rcupdate.h. Thanks, Roland --
Engine part is irrelevant for deciding "struct utrace utrace;" vs Let's actually measure something. 1) clean kernel (my usual config, x86_64, NR_CPUS=2, PREEMPT=y, SLUB_DEBUG=y) 2) struct utrace *utrace; 3) struct utrace utrace; (without rcu_head, without check_dead) taken from symlink cache /object_size, bytes ---------------------------------------- 1) task_struct 1392/1384 2) task_struct 1408/1400 utrace 72/72 3) task_struct 1456/1456 I don't know why "object_size" differs from to where symlink points to but it's irrelevant. So, your approach gives +16(+72) bytes, mine gives +72 bytes. So all this complexity is for ~56 bytes in untraced case. In traced case, more memory will be used. I mean sticking to simple locking rules if you agree to "struct utrace utrace;": 1) utrace_flags go into struct utrace 2) utrace.lock guards _everything_ under it, no exceptions. If I understand correctly ->utrace_flags are under utrace->lock, BUT struct utrace is reattachable itself. Of course you protect against it, but this doesn't help in convincing someone that there are no problems. Again, this is just general observation and an example. Checking for bugs will be much, much simpler. --
1. UTRACE_ATTACH_MATCH_DATA is not used. 2. s/utrace_attached_engine/utrace_engine/g Engine which is not attached is either just allocated or scheduled for freeing 3. get_utrace_lock() is funny. From name one should get valid struct utrace or -E depending only on task, why engine matters is a mystery. 4. subsys_initcall Just use module_init() or comment why it's needed. 5. Dummy in examiner -- not used (of course!) --
> 1. UTRACE_ATTACH_MATCH_DATA is not used. I don't understand this comment. The patch adds a new API, not users of In the way I've talked about things since the beginning, and probably in some email and maybe some of the documentation text, I've used "a tracing engine" to mean the whole coordinated body of code. (There is no data structure in the utrace API that corresponds to this, it's just a way of talking about the "user of utrace" code.) So you might talk about having your "engine" attach to many tasks. This is what I had in mind when I first made it 'struct utrace_attached_engine' for the API object that's about one attachment. But I won't quibble about names. I am happy to change any of the names to This "mystery" is addressed in the comments and the documentation. (Comments above the function says exactly what it does.) Details about why we do that are in manual section called "Engine and task pointers" (in the file added by the patch, Documentation/DocBook/utrace.tmpl). If you don't like to read the XML or wait for 'make htmldocs', I've put a copy of the formatted tree at: Since I never found much documentation about these macros, and it's not a module, that looked like the one to use based on the other uses in the source. If module_init is the generic one to use in nonmodule code, (Of course, that's why it's called that!) I found I had to add that or the kerneldoc magic would freak out. The type name is used by users of the API, so I wanted to have it documented with description text, but its contents are all private to the utrace implementation. When I had no magic /* public: */ comment or no fields declared after it, the 'make I said this is a substantially different API from the first utrace prototype, and I was not lying. The implementation didn't follow from the same tree's history, because it took an entirely different merging approach. In the new effort I've been doing since 2.6.25, I have only been preparing new versions of patches for ...
Sounds like kerneldoc wants a fix for this. And btw, I don't think half-private structures are a good idea. Either make it completely private by just providing a forward declaration in the header, or put all members in the public definition with a comment describing they are not for users. We don't have the arguments of ABI problems or hiding visibility for maintaince reasons e.g. library APIs have. --
Engine grew a refcount. Was it added to fix some bug? Old utrace git doesn't have this, new utrace git has only two posted patches, so asking. --
Why does it depend on MODULES? Does this mean that when ptrace is switched to using the utrace engines, non-modular kernels will have to lack ptrace(2)? Is there anything in the code that actually breaks without CONFIG_MODULES? Regards, Petr Tesarik --
> Why does it depend on MODULES? No. If that were done, then it would be: depends on MODULE || PTRACE No. Thanks, Roland --
In which case you should just remove this bogus (non-)dependency. --
This adds the CONFIG_UTRACE_PTRACE option under CONFIG_UTRACE.
When set, parts of ptrace are replaced so it uses the utrace
facilities for noticing events, stopping and resuming threads.
This makes ptrace play nicely with other utrace-based things
tracing the same threads. It also makes all ptrace uses rely on
some of the utrace code working right, even when you are not
using any other utrace-based things. So it's experimental and
not real well proven yet. But it's recommended if you enable
CONFIG_UTRACE and want to try new utrace things.
Signed-off-by: Roland McGrath <roland@redhat.com>
---
include/linux/ptrace.h | 21 ++
include/linux/sched.h | 1 +
include/linux/tracehook.h | 4 +
init/Kconfig | 17 ++
kernel/ptrace.c | 604 ++++++++++++++++++++++++++++++++++++++++++++-
kernel/signal.c | 14 +-
kernel/utrace.c | 15 ++
7 files changed, 670 insertions(+), 6 deletions(-)
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index ea7416c..06eaace 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -121,6 +121,7 @@ static inline void ptrace_unlink(struct task_struct *child)
int generic_ptrace_peekdata(struct task_struct *tsk, long addr, long data);
int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
+#ifndef CONFIG_UTRACE_PTRACE
/**
* task_ptrace - return %PT_* flags that apply to a task
* @task: pointer to &task_struct in question
@@ -154,6 +155,26 @@ static inline int ptrace_event(int mask, int event, unsigned long message)
return 1;
}
+static inline void ptrace_utrace_exit(struct task_struct *task)
+{
+}
+
+#else /* CONFIG_UTRACE_PTRACE */
+
+static inline int task_ptrace(struct task_struct *task)
+{
+ return 0;
+}
+
+static inline int ptrace_event(int mask, int event, unsigned long message)
+{
+ return 0;
+}
+
+extern void ptrace_utrace_exit(struct task_struct *);
+
+#endif /* !CONFIG_UTRACE_PTRACE */
+
/**
* ...I don't like this patch in it's current form. Having two different ways to do ptrace is a rather awkward thing and not very good for testing coverage. This should not be an option but required. I'm not even sure keeping a non-utrace version at all is a good idea. I'd rather set a deadline for arch maintainers to convert everything to the generic ptrace bits + regsets + tracehook and if they don't manage to do it by then ptrace will be disabled for those who can't keep up. Of course this will need an announcement on linux-arch first, but it's much better than a never ending phase of APIs in migration. --
The main idea behind having CONFIG_UTRACE_PTRACE be an option separate from CONFIG_UTRACE is just for debugging during this development and transition period. With CONFIG_UTRACE=y, CONFIG_UTRACE_PTRACE=n, then ptrace works as it ever did. You'll only have trouble if you use utrace on the same tasks. So if you hit bad problems working on something using utrace, you can always switch to CONFIG_UTRACE_PTRACE=n to use gdb and strace et al reliably on your userland code, which might be damn helpful in figuring out how you're triggering the utrace bugs. By the We agree on how it should end up. I have concentrated on how not to break any arch that doesn't care about my work, or put another way, how not to let progress be held hostage to delay as long as some arch's don't care. An arch deadline is fine with me (I'm not the one who has to meet it). I've always assumed there would be one eventually. But any reasonable deadline would be some time hence, I don't know how long (time scales in feature-removal-schedule.txt seem to be pretty long). I wouldn't like to make merging any of utrace be contingent on waiting for that deadline, just to avoid having a phase of APIs in migration--one that can certainly end. There is another point, not about the arch issue. In the first utrace prototype, I wound up spending basically all my time on ptrace. An internal reorganization that implements ptrace is not very interesting. I think that working on different users of utrace will be far more effective at honing the utrace layer. My hope is that utrace can be merged and EXPERIMENTAL for some time while users develop, beat on, and get it solid. Meanwhile, people not working on that need reliable old ptrace code when they don't select the newfangled options. As the patch posting said, this is a minimal kludge to get by for now. After some of those other users you've demanded have banged it all out, then I'll get back to worrying about ptrace later. (In the meantime, if it goes too ...
No problems doing it in your tree, but it's not how features get into mainline. Transition periods like this are bad, because people will simply disable the option if it doesn't work, and it will never get debugged. Conditional implementation of core functionality At least one or two major release. So in your place I would try to get consensus on linux-arch _now_. I've also tried to poke multiple arch maintainers into updating their implementations for other reasons in I don't think moving ptrace data insto the engine state is a minor reoraniation. It's shoveling free some space that you need for utrace itself if we want to get rid of the nasty dynamic allocation of it, which I really really think we want. --
And some internal details still horrible and overdesigned just like at If config option for ptrace is fine, please name it CONFIG_PTRACE. For one, there will be no second tracing infrastracture. For two, nobody General comments: On the good side is per-task struct operation. This is good and should be required from any such tracing facility. Linked list of attached tracers? I don't know. One the bad side, where are those nice tracing modules? Where are they? I've heard rumours utrace is needed for frysk and frysk people were pretty damn silent on linux-kernel. On the homepage there is module which frozes task right before coredump. AFAICS, Al Viro mentioned that non-schedulable TASK_BROKEN should be sufficient for this without wasting all that time that went into ptrace(2) stabilization and fixing holes in it. This all similar to systemtap/markers story. Big changes under promises that now, now somebody will use our thing. General reminder: people who collected ptrace(2) exploits proggies, try them again. Now to code. --
Yes. Merging such a huge amount of code that doesn't actually give the kernel any new capabilities seems like a strange idea. What users of this infrastrucure can we see that it can be merged with in one go? --
Please point out specific areas, and I'm sure there will be a reasonable explanation why they turned out this way. As for whether "struct utrace" should be a member of vs. pointed-to from task_struct, it may come down to the perceived need to avoid penalizing every thread with a hundred-odd bytes extra, whether or not Among others, utrace is an enablement layer for systemtap user-space probing, through another subsequent part that implements a kprobes-like API for user-space tasks. All this code now exists in at least prototype form, so if you need to see the bigger picture, look that way. Other users are anticipated, but first we need to get past In what way do you think those promises are unfulfilled? Systemtap has interfaced to markers since the beginning, and there are a bunch of markers in the tree. - FChE --
As usual nothing of that stuff has any real in-kernel users so the same argument applies here. If it did have real uses it could be merged at the same time. But the current uprobes mess is not a reason to merge utrace. --
Uprobes is just one user of utrace. It is intended for use for simple tracing where we need a kernel+userspace look at the problem at hand. The intention is for use in simple cases (when condition x is met, what is the value of variable y, etc). For more advanced tracing though, there are other ideas being proposed (see ntrace discussions on utrace-devel, but I guess now lkml is the right place for that discussion too). However, there are components of uprobes such as breakpoint insertion/removal and single-stepping infrastructure that are potentially useful to other userspace debuggers. We are working on factoring those out to live independent of uprobes. You should be seeing those patches soon. Ananth --
I hope my original mail was clear that I'm not principially against utrace. I think having a generic trace even dispatcher is a good thing if it is a) not overly complicated (see Alexey
Thanks for that clarification. From what you've said in many of your postings, it has been easy to get the impression that you are against everything. I sympathize with caution about a new API that needs users to iron out its kinks. Here is the difficulty I've been having. I have spent a lot of time keeping the patch series up to date, rebasing, etc. This has sunk all the time that I'd like to have been spending working on higher-level pieces such as the ntrace idea. (This will be much better now that all the tracehook work is in 2.6.27. I know you don't buy the rationale I've explained for that work. But I am confident that having this in is a substantial and immediate boon to the ease and efficiency of getting more work done sooner in this area.) At the same time, many of the people who have asked me about the state of the effort say things to the effect that they would like to try writing some features based on such an API once it's been merged in. It was my hope that having it merged in and available as a non-default option for those who select CONFIG_EXPERIMENTAL would encourage more people to actually get going on experiments with new features. Several actual users beating on it is of course what's necessary to iron out the API and the low-level code. Since I finally no longer expect to spend much more time myself on arch code or rebasing hassles given what's already merged, I look forward to soon having time to work on some higher-level stuff that makes this all get interesting. Whatever I produce won't be all things to all people, and it might not be to your taste at all. It's certain that others who tried could come up with some other things faster. Having the schedule of merging utrace contingent on having some things using it that people consider ready to be merged won't delay me working on such a thing, but I get the impression it might deter others. The motivating principle of utrace is that it is the lowest level and least overhead at which ...
Yes, now you've got all the lowlevel bits in that should have been long time ago :) And you don;t have to rebase it all the time if you don't have time for it. Keep developing ntrace on one major release and Then the people should help with the upstream process, post their bits to lkml, help you with forward-porting, etc. --
Yes, that's your price for avoiding more races, more code, more races, more tricky code and ultimately more ways to fsckup. God, you're in the very tricky Unix subsystem only a few people know intimately, and what we see? utrace code is just as tricky. When you're confident that interaction with engines part is fine, all stupid bugs were fixed, go change struct utrace to pointer. Now it can very well be a lie to say less memory is used because slab allocator There are no chickens and no eggs. utrace is in RHEL4, RHEL5, FC6, FC7, FC8, FC9 kernels already. I can't believe RedHat allowed to totally rewrite ptrace based on some Total 3 in scheduler and in spufs (ppc-specific). Amazing improvement for ugly macros, more self-modified kernel, explicit reasons stated why they are stupid spelt in review and after entering tree, and general dislike of some maintainers to add more trace_mark() entries. So there were promises that markers will be useful, 10 months passed and they are still useless. How did this happen? Now utrace is in better position in this respect -- ptrace(2) will use it from start but this doesn't change "promises" part. --
That's an idea worth considering, especially given the oopses you Well, that was how red hat broke the deadlock, and why RH kernels will (I don't understand what you mean. If you mean "does systemtap just want function entry points a la ftrace", then the answer is no, it Some of those hits represent more via macros, but anyway, more on Regardless of the strength of technical objections/responses, there is an ingrained cultural aspect to this that perhaps we'll break through They are already useful to their users. - FChE --
Yes. And btw, the early utrace patches actually moved the ptrace state into dynamically allocated per engine state, which would almost cancel out the additional members required for utrace. I'm not sure why this went away in the meantime. --
Which is I think the single most biggest mistake of this whole endevour. Why not have it out on lkml where people can actually see? --
Hi Roland, I've been looking over the utrace code: git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git git diff d3a47e82b6bc3724dd60f3ee4e84fe4479104382..utrace/master and while I'm nowhere near done, I'd like to provide some feedback and pose some questions. - what's up with these weak declarations? - struct utrace_attached_engine is a tad strange as we don't have a regular struct utrace_engine. - does it make sense to create this struct utrace_engine and replace the struct utrace_engine_ops and the void *data members of struct utrace_attached_engine with a pointer to it, and obtain the data by using container_of() on the engine itself? That is, let the user embed struct utrace_engine in a larger structure. - I encountered a lot of unannotated memory barriers. Please add a comment to each and every one describing the race and a pointer to its pair. There is no such thing as a trivial memory barrier. - it has these decidedly un-kernel-ish public/private comments - Why does it have two lists for attaching tasks? The description/comments explain how it works but not why we do it that way. - utrace_attach_task() was very hard to read, the code flow is unconventional at best. - utrace_stop() can seemingly return true even though it didn't get SIGKILL - contrary to its comments. - get_utrace_lock() made me look at ->engine_ops serialisation - I couldn't convince myself its race free. - I saw a lot of if (unlikely(a) || unlikely(b)) style thing, please write as if (unlikely(a || b)). - utrace_release_task() seems to be missing rcu_read_lock()/rcu_read_unlock() to ensure the utrace pointer stays valid. - utrace_control() seems to access ->exit_state in a racy manner. - some comments say 'race' but fail to provide specifics. - as was suggested by Christoph and Alexey, removing struct utrace *task_struct::utrace in favour of embedding it right into task_struct itself would remove quite a bit of complexity. I would ...
