Re: [PATCH 0/2] utrace

Previous thread: [PATCH][resubmit] TPM: update char dev BKL pushdown by Rajiv Andrade on Tuesday, August 26, 2008 - 1:56 pm. (5 messages)

Next thread: [PATCH] afs: fsclient.c sparse endian annotations of operation_ID by Harvey Harrison on Tuesday, August 26, 2008 - 3:05 pm. (4 messages)
From: Roland McGrath
Date: Tuesday, August 26, 2008 - 3:01 pm

utrace is a new kernel-side API for kernel modules, intended to make it
tractable to work on novel ways to trace and debug user-mode tasks.

A previous utrace prototype was in all Fedora kernels since Fedora Core 6.
Some substantial implementation and API details in the current code are
different from those past versions.  Please look freshly at these patches.

The current utrace code here is now included in Fedora Rawhide kernels.

People interested in this code are on the utrace-devel@redhat.com mailing list.
(See https://www.redhat.com/mailman/listinfo/utrace-devel for list info.)
There is a wiki for this work at http://sourceware.org/systemtap/wiki/utrace
(with not much content, feel free to add some).  Current copies of these
patches are at kept at http://people.redhat.com/roland/utrace/ as well as
in the GIT tree shown below.

This code cannot be enabled without CONFIG_HAVE_ARCH_TRACEHOOK and the arch
details it indicates.  In Linus's tree as of v2.6.27-rc4, only powerpc and
sparc64 have that support.  The x86 support is available by merging in the
tip/x86/tracehook branch.  For working on other arch support, there are some
more details at http://sourceware.org/systemtap/wiki/utrace/arch/HowTo and
these are mentioned in the comments in arch/Kconfig too (in v2.6.27-rc4).

The first patch adds the utrace kernel API (if CONFIG_UTRACE=y is set).
There is no change at all without the config option, and with it there is
no effect on anything at all until a kernel module using the utrace API is
loaded.  There is detailed documentation on the API in DocBook form.

The second patch adds the CONFIG_UTRACE_PTRACE option.  When set, this
makes ptrace use the utrace API as much as is necessary so that using both
ptrace and utrace to debug the same threads at the same time won't become
confused.  The ptrace changes are somewhat kludgey.  They're intended to be
the simplest, non-regressing thing that suffices to enable hacking on new
utrace modules while also doing normal ptrace-based ...
From: Roland McGrath
Date: Tuesday, August 26, 2008 - 3:01 pm

This adds the utrace facility, a new modular interface in the kernel for
implementing user thread tracing and debugging.  This fits on top of the
tracehook_* layer, so the new code is well-isolated.

The new interface is in <linux/utrace.h> and the DocBook utrace book
describes it.  It allows for multiple separate tracing engines to work in
parallel without interfering with each other.  Higher-level tracing
facilities can be implemented as loadable kernel modules using this layer.

The new facility is made optional under CONFIG_UTRACE.
When this is not enabled, no new code is added.
It can only be enabled on machines that have all the
prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK.

In this initial version, utrace and ptrace do not play well together.
If both utrace and ptrace are attached to the same thread, they can
confuse each other about the stopping and resuming of that thread.
The old ptrace code is unchanged and nothing using ptrace should be
affected by this patch as long as utrace is not used at the same time.
A later patch will make them cooperate properly.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 Documentation/DocBook/Makefile    |    2 +-
 Documentation/DocBook/utrace.tmpl |  566 +++++++++
 fs/proc/array.c                   |    3 +
 include/linux/sched.h             |    5 +
 include/linux/tracehook.h         |   61 +-
 include/linux/utrace.h            |  711 +++++++++++
 init/Kconfig                      |   10 +
 kernel/Makefile                   |    1 +
 kernel/utrace.c                   | 2496 +++++++++++++++++++++++++++++++++++++
 9 files changed, 3853 insertions(+), 2 deletions(-)

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 1615350..92ca631 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -7,7 +7,7 @@
 # list of DOCBOOKS.
 
 DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
-	    kernel-hacking.xml kernel-locking.xml deviceiobook.xml ...
From: Alexey Dobriyan
Date: Tuesday, August 26, 2008 - 3:55 pm

I'll says this again: tracehook_* is pointless abstraction because
there will be no second generic tracing facility. The author of second

Again, embed struct utrace directly into task_struct. task_struct
lifetime rules are way more tested than struct utrace ones.

Add simple spinlock guarding all accesses (OK, I haven't looked very
closely if it's possible)

Nobody needs hundred-line utrace_attach with CPU barriers.

Nobody needs RCU.

Nobody needs restart logic.

Reminder: that struct utrace double-free was P_I_T_A to debug.

I'll check last utrace oops we talked is still there and bogus patch was applied
(sorry, haven't slept night at all). And run to confirm that attach/detach/exec
program still crashes it. There is PREEMPT_RCU now so it will be even more not


INIT_RCU_HEAD is not needed, call_rcu() will overwrite rcu head unconditionally.

--

From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 2:32 pm

As promised, quickly reproducible via expt_ptratt.c:

kernel/utrace.c:532	if (likely(!utrace->stopped))

BUG: unable to handle kernel paging request at ffff88017c51c958
IP: [<ffffffff8025e38b>] utrace_stop+0x9b/0x120
PGD 202063 PUD b067 PMD 17d8e5163 PTE 800000017c51c160
Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0 
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom
Pid: 32731, comm: exe Tainted: G        W 2.6.27-rc4-next-20080827-utrace #4
RIP: 0010:[<ffffffff8025e38b>]  [<ffffffff8025e38b>] utrace_stop+0x9b/0x120
RSP: 0000:ffff88017c521c38  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88017c02b880 RCX: 0000000000000000
RDX: 000000000000ebea RSI: 00000000ffffffff RDI: 0000000000000001
RBP: ffff88017c521c58 R08: 0000000000000002 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88017c51c8e8
R13: ffff88017c51c918 R14: ffff88017c51c8e8 R15: ffff88017c02b880
FS:  00007fe23dcf66f0(0000) GS:ffffffff8054b600(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88017c51c958 CR3: 000000017c077000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process exe (pid: 32731, threadinfo ffff88017c520000, task ffff88017c02b880)
Stack:  ffff88017c51c8e8 ffff88017c521ce8 ffff88017c51c8e8 ffff88017c02b880
 ffff88017c521c88 ffffffff8025e4b0 ffff88017f92e360 ffff88017c51c8f8
 ffff88017f92e360 ffff88017c51c8f0 ffff88017c521d38 ffffffff8025fb05
Call Trace:
 [<ffffffff8025e4b0>] finish_resume_report+0xa0/0xd0
 [<ffffffff8025fb05>] utrace_get_signal+0x305/0x710
 [<ffffffff802408cc>] get_signal_to_deliver+0x24c/0x310
 [<ffffffff8020ab54>] do_notify_resume+0xd4/0x860
 [<ffffffff80253fcd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8042d812>] ? ...
From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 2:46 pm

Another one:

kernel BUG at kernel/utrace.c:2275!
invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/kernel/slab/utrace_attached_engine/objects
CPU 0 
Modules linked in: xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom
Pid: 5118, comm: exe Tainted: G        W 2.6.27-rc4-next-20080827-utrace #5
RIP: 0010:[<ffffffff802608ef>]  [<ffffffff802608ef>] utrace_get_signal+0x6ff/0x710
RSP: 0018:ffff88017ace5c98  EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff88017aced910 RCX: ffff88017e2b23c0
RDX: ffffffffffffffff RSI: ffff88017ace5cf8 RDI: ffff88017acc5640
RBP: ffff88017ace5d38 R08: 0000000000000002 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88017aced908 R14: ffff88017aced900 R15: ffff88017acc5640
FS:  0000000000000000(0000) GS:ffffffff80551600(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff249c8bc9 CR3: 000000017ad2c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process exe (pid: 5118, threadinfo ffff88017ace4000, task ffff88017acc5640)
Stack:  ffff88017ace5ef8 0000000000000002 0000000000000000 ffff88017ace5ef8
 ffff88017ace5e78 ffff88017ace5f58 ffff88017e206288 0000000000000400
 0000000000000401 ffff88017e206848 0000001000000000 0000000000010000
Call Trace:
 [<ffffffff802412fc>] get_signal_to_deliver+0x24c/0x310
 [<ffffffff8020ab54>] do_notify_resume+0xd4/0x860
 [<ffffffff8028ff08>] ? check_bytes_and_report+0x38/0xd0
 [<ffffffff80209b7b>] ? sys_execve+0x5b/0x80
 [<ffffffff802901c3>] ? check_object+0x223/0x250
 [<ffffffff80290811>] ? init_object+0x51/0x90
 [<ffffffff80256f5d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff80209b7b>] ? sys_execve+0x5b/0x80
 [<ffffffff8020b9c7>] int_signal+0x12/0x17
Code: 00 00 48 89 41 10 48 8b 82 20 01 00 00 48 ...
From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 3:00 pm

And overwritten poison if run in parallel with

	while true; do
		killall -9 expl_ptratt
		killall -9 exe
	done

=============================================================================
BUG utrace: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff88017c31e7b0-0xffff88017c31e7f0. First byte 0x6c instead of 0x6b
INFO: Allocated in utrace_attach_task+0x1f4/0x3d0 age=13 cpu=1 pid=5377
INFO: Freed in utrace_free+0x16/0x20 age=5 cpu=1 pid=5377
INFO: Slab 0xffffe2000532ae90 objects=21 used=2 fp=0xffff88017c31e780 flags=0x80000000000000c3
INFO: Object 0xffff88017c31e780 @offset=1920 fp=0xffff88017c31e540

Bytes b4 0xffff88017c31e770:  fc 1f ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ü.ÿÿ....ZZZZZZZZ
  Object 0xffff88017c31e780:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff88017c31e790:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff88017c31e7a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff88017c31e7b0:  6c 6c 6b 6b 6b 6b 6b 6b ff ff ff ff 6b 6b 6b 6b llkkkkkkÿÿÿÿkkkk
  Object 0xffff88017c31e7c0:  ff ff ff ff ff ff ff ff 6b 6b 6b 6b 6b 6b 6b 6b ÿÿÿÿÿÿÿÿkkkkkkkk
  Object 0xffff88017c31e7d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff88017c31e7e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff88017c31e7f0:  6a 6b 6b 6b 6b 6b 6b a5                         jkkkkkk¥        
 Redzone 0xffff88017c31e7f8:  bb bb bb bb bb bb bb bb                         »»»»»»»»        
 Padding 0xffff88017c31e838:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
Pid: 5382, comm: expl_ptratt Tainted: G        W 2.6.27-rc4-next-20080827-utrace #5
Call Trace:
 [<ffffffff8028f989>] print_trailer+0xf9/0x160
 [<ffffffff8028ff75>] check_bytes_and_report+0xa5/0xd0
 [<ffffffff80290048>] check_object+0xa8/0x250
 [<ffffffff80291173>] __slab_alloc+0x4f3/0x670
 ...
From: Christoph Hellwig
Date: Saturday, August 30, 2008 - 6:45 am

Yes, I've also said this went the old monolithic utrace was first posted
(probably over a year ago) and it was conveniently ignored..

--

From: Roland McGrath
Date: Wednesday, September 3, 2008 - 5:11 am

The most consistent feedback I've seen to all new features is that they
mustn't add overhead when they're not being used.  So I never considered it
an option to bloat task_struct by ~120 bytes.  Of course much more than
that is entailed when a task is actually being traced somehow.  But the
presumption is that most tasks most of the time aren't, and that's what not
to bloat.

The revamp of the API after the first prototype made some of the internals
much simpler to implement, that had been very sticky in the old prototype
code.  But the allocation and freeing of struct utrace is an area I did not
fully revisit.  Buggy is buggy, and sure it needs to be tested and fixed.

I can't tell what you mean here.  Do you mean something different 
from struct utrace.lock?  If there were no pointer and its allocation

I see.  Thanks!  At some point, this was in the recommended example uses of
RCU.  This macro is still used in several places around the kernel.
Shouldn't they all be removed?  I wonder why it still exists in rcupdate.h.


Thanks,
Roland
--

From: Alexey Dobriyan
Date: Wednesday, September 3, 2008 - 10:01 am

Engine part is irrelevant for deciding "struct utrace utrace;" vs

Let's actually measure something.

1) clean kernel (my usual config, x86_64, NR_CPUS=2, PREEMPT=y, SLUB_DEBUG=y)
2) struct utrace *utrace;
3) struct utrace utrace; (without rcu_head, without check_dead)

			taken from symlink
	cache		/object_size, bytes
----------------------------------------
1)	task_struct	1392/1384

2)	task_struct	1408/1400
	utrace		72/72

3)	task_struct	1456/1456


I don't know why "object_size" differs from to where symlink points to
but it's irrelevant.

So, your approach gives +16(+72) bytes, mine gives +72 bytes.

So all this complexity is for ~56 bytes in untraced case. In traced
case, more memory will be used.


I mean sticking to simple locking rules if you agree to "struct utrace utrace;":
1) utrace_flags go into struct utrace
2) utrace.lock guards _everything_ under it, no exceptions.

If I understand correctly ->utrace_flags are under utrace->lock, BUT
struct utrace is reattachable itself. Of course you protect against it,
but this doesn't help in convincing someone that there are no problems.

Again, this is just general observation and an example. Checking for
bugs will be much, much simpler.

--

From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 1:04 pm

1. UTRACE_ATTACH_MATCH_DATA is not used.

2. s/utrace_attached_engine/utrace_engine/g

   Engine which is not attached is either just allocated or scheduled
   for freeing

3. get_utrace_lock() is funny.

   From name one should get valid struct utrace or -E depending only on
   task, why engine matters is a mystery.

4. subsys_initcall

   Just use module_init() or comment why it's needed.

5. Dummy in examiner -- not used (of course!)

--

From: Roland McGrath
Date: Wednesday, September 3, 2008 - 5:11 am

> 1. UTRACE_ATTACH_MATCH_DATA is not used.

I don't understand this comment.  The patch adds a new API, not users of

In the way I've talked about things since the beginning, and probably in
some email and maybe some of the documentation text, I've used "a tracing
engine" to mean the whole coordinated body of code.  (There is no data
structure in the utrace API that corresponds to this, it's just a way of
talking about the "user of utrace" code.)  So you might talk about having
your "engine" attach to many tasks.  This is what I had in mind when I
first made it 'struct utrace_attached_engine' for the API object that's
about one attachment.

But I won't quibble about names.  I am happy to change any of the names to

This "mystery" is addressed in the comments and the documentation.
(Comments above the function says exactly what it does.)  Details about
why we do that are in manual section called "Engine and task pointers"
(in the file added by the patch, Documentation/DocBook/utrace.tmpl).  
If you don't like to read the XML or wait for 'make htmldocs', I've put
a copy of the formatted tree at:

Since I never found much documentation about these macros, and it's not
a module, that looked like the one to use based on the other uses in the
source.  If module_init is the generic one to use in nonmodule code,

(Of course, that's why it's called that!)  I found I had to add that or
the kerneldoc magic would freak out.  The type name is used by users of
the API, so I wanted to have it documented with description text, but
its contents are all private to the utrace implementation.  When I had
no magic /* public: */ comment or no fields declared after it, the 'make

I said this is a substantially different API from the first utrace
prototype, and I was not lying.  The implementation didn't follow from
the same tree's history, because it took an entirely different merging
approach.  In the new effort I've been doing since 2.6.25, I have only
been preparing new versions of patches for ...
From: Christoph Hellwig
Date: Wednesday, September 3, 2008 - 11:44 am

Sounds like kerneldoc wants a fix for this.  And btw, I don't think
half-private structures are a good idea.  Either make it completely
private by just providing a forward declaration in the header, or
put all members in the public definition with a comment describing
they are not for users.  We don't have the arguments of ABI problems
or hiding visibility for maintaince reasons e.g. library APIs have.

--

From: Alexey Dobriyan
Date: Saturday, August 30, 2008 - 8:05 am

Engine grew a refcount.

Was it added to fix some bug?

Old utrace git doesn't have this, new utrace git has only two
posted patches, so asking.

--

From: Petr Tesarik
Date: Wednesday, September 3, 2008 - 5:58 am

Why does it depend on MODULES? Does this mean that when ptrace is
switched to using the utrace engines, non-modular kernels will have to
lack ptrace(2)?

Is there anything in the code that actually breaks without
CONFIG_MODULES?

Regards,
Petr Tesarik


--

From: Roland McGrath
Date: Wednesday, September 3, 2008 - 11:08 am

> Why does it depend on MODULES? 


No.  If that were done, then it would be:
	depends on MODULE || PTRACE

No.


Thanks,
Roland
--

From: Christoph Hellwig
Date: Wednesday, September 3, 2008 - 11:46 am

In which case you should just remove this bogus (non-)dependency.
--

From: Petr Tesarik
Date: Thursday, September 4, 2008 - 2:03 am

+1

Petr Tesarik

From: Roland McGrath
Date: Tuesday, August 26, 2008 - 3:02 pm

This adds the CONFIG_UTRACE_PTRACE option under CONFIG_UTRACE.
When set, parts of ptrace are replaced so it uses the utrace
facilities for noticing events, stopping and resuming threads.

This makes ptrace play nicely with other utrace-based things
tracing the same threads.  It also makes all ptrace uses rely on
some of the utrace code working right, even when you are not
using any other utrace-based things.  So it's experimental and
not real well proven yet.  But it's recommended if you enable
CONFIG_UTRACE and want to try new utrace things.

Signed-off-by: Roland McGrath <roland@redhat.com>
---
 include/linux/ptrace.h    |   21 ++
 include/linux/sched.h     |    1 +
 include/linux/tracehook.h |    4 +
 init/Kconfig              |   17 ++
 kernel/ptrace.c           |  604 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/signal.c           |   14 +-
 kernel/utrace.c           |   15 ++
 7 files changed, 670 insertions(+), 6 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index ea7416c..06eaace 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -121,6 +121,7 @@ static inline void ptrace_unlink(struct task_struct *child)
 int generic_ptrace_peekdata(struct task_struct *tsk, long addr, long data);
 int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
 
+#ifndef CONFIG_UTRACE_PTRACE
 /**
  * task_ptrace - return %PT_* flags that apply to a task
  * @task:	pointer to &task_struct in question
@@ -154,6 +155,26 @@ static inline int ptrace_event(int mask, int event, unsigned long message)
 	return 1;
 }
 
+static inline void ptrace_utrace_exit(struct task_struct *task)
+{
+}
+
+#else  /* CONFIG_UTRACE_PTRACE */
+
+static inline int task_ptrace(struct task_struct *task)
+{
+	return 0;
+}
+
+static inline int ptrace_event(int mask, int event, unsigned long message)
+{
+	return 0;
+}
+
+extern void ptrace_utrace_exit(struct task_struct *);
+
+#endif	/* !CONFIG_UTRACE_PTRACE */
+
 /**
  * ...
From: Christoph Hellwig
Date: Saturday, August 30, 2008 - 6:38 am

I don't like this patch in it's current form.  Having two different ways
to do ptrace is a rather awkward thing and not very good for testing
coverage.  This should not be an option but required.  I'm not even sure
keeping a non-utrace version at all is a good idea. I'd rather set a
deadline for arch maintainers to convert everything to the generic
ptrace bits + regsets + tracehook and if they don't manage to do it by
then ptrace will be disabled for those who can't keep up.  Of course
this will need an announcement on linux-arch first, but it's much better
than a never ending phase of APIs in migration.
--

From: Roland McGrath
Date: Wednesday, September 3, 2008 - 5:10 am

The main idea behind having CONFIG_UTRACE_PTRACE be an option separate
from CONFIG_UTRACE is just for debugging during this development and
transition period.  With CONFIG_UTRACE=y, CONFIG_UTRACE_PTRACE=n, then
ptrace works as it ever did.  You'll only have trouble if you use utrace
on the same tasks.  So if you hit bad problems working on something
using utrace, you can always switch to CONFIG_UTRACE_PTRACE=n to use gdb
and strace et al reliably on your userland code, which might be damn
helpful in figuring out how you're triggering the utrace bugs.  By the

We agree on how it should end up.  I have concentrated on how not to
break any arch that doesn't care about my work, or put another way, how
not to let progress be held hostage to delay as long as some arch's
don't care.  An arch deadline is fine with me (I'm not the one who has
to meet it).  I've always assumed there would be one eventually.  But
any reasonable deadline would be some time hence, I don't know how long
(time scales in feature-removal-schedule.txt seem to be pretty long).  I
wouldn't like to make merging any of utrace be contingent on waiting for
that deadline, just to avoid having a phase of APIs in migration--one
that can certainly end.

There is another point, not about the arch issue.  In the first utrace
prototype, I wound up spending basically all my time on ptrace.  An
internal reorganization that implements ptrace is not very interesting.
I think that working on different users of utrace will be far more
effective at honing the utrace layer.  My hope is that utrace can be
merged and EXPERIMENTAL for some time while users develop, beat on, and
get it solid.  Meanwhile, people not working on that need reliable old
ptrace code when they don't select the newfangled options.

As the patch posting said, this is a minimal kludge to get by for now.
After some of those other users you've demanded have banged it all out,
then I'll get back to worrying about ptrace later.  (In the meantime, if
it goes too ...
From: Christoph Hellwig
Date: Wednesday, September 3, 2008 - 11:41 am

No problems doing it in your tree, but it's not how features
get into mainline.  Transition periods like this are bad, because
people will simply disable the option if it doesn't work, and it will
never get debugged.  Conditional implementation of core functionality

At least one or two major release.  So in your place I would try to get
consensus on linux-arch _now_.  I've also tried to poke multiple arch
maintainers into updating their implementations for other reasons in

I don't think moving ptrace data insto the engine state is a minor
reoraniation.  It's shoveling free some space that you need for utrace
itself if we want to get rid of the nasty dynamic allocation of it,
which I really really think we want.

--

From: Alexey Dobriyan
Date: Tuesday, August 26, 2008 - 3:34 pm

And some internal details still horrible and overdesigned just like at


If config option for ptrace is fine, please name it CONFIG_PTRACE.
For one, there will be no second tracing infrastracture. For two, nobody

General comments:

On the good side is per-task struct operation. This is good and should
be required from any such tracing facility.

Linked list of attached tracers? I don't know.

One the bad side, where are those nice tracing modules? Where are they?

I've heard rumours utrace is needed for frysk and frysk people were
pretty damn silent on linux-kernel.

On the homepage there is module which frozes task right before coredump.
AFAICS, Al Viro mentioned that non-schedulable TASK_BROKEN should be
sufficient for this without wasting all that time that went into
ptrace(2) stabilization and fixing holes in it.

This all similar to systemtap/markers story. Big changes under promises
that now, now somebody will use our thing.


General reminder:
people who collected ptrace(2) exploits proggies, try them again.

Now to code.

--

From: Christoph Hellwig
Date: Tuesday, August 26, 2008 - 3:39 pm

Yes.  Merging such a huge amount of code that doesn't actually give
the kernel any new capabilities seems like a strange idea.  What users
of this infrastrucure can we see that it can be merged with in one go?

--

From: Frank Ch. Eigler
Date: Tuesday, August 26, 2008 - 5:17 pm

Please point out specific areas, and I'm sure there will be a reasonable
explanation why they turned out this way.

As for whether "struct utrace" should be a member of vs. pointed-to
from task_struct, it may come down to the perceived need to avoid
penalizing every thread with a hundred-odd bytes extra, whether or not

Among others, utrace is an enablement layer for systemtap user-space
probing, through another subsequent part that implements a
kprobes-like API for user-space tasks.  All this code now exists in at
least prototype form, so if you need to see the bigger picture, look
that way.  Other users are anticipated, but first we need to get past

In what way do you think those promises are unfulfilled?  Systemtap
has interfaced to markers since the beginning, and there are a bunch
of markers in the tree.


- FChE
--

From: Christoph Hellwig
Date: Wednesday, August 27, 2008 - 6:54 am

As usual nothing of that stuff has any real in-kernel users so the same
argument applies here.  If it did have real uses it could be merged at
the same time.  But the current uprobes mess is not a reason to merge
utrace.

--

From: Ananth N Mavinakayanahalli
Date: Wednesday, August 27, 2008 - 9:40 am

Uprobes is just one user of utrace. It is intended for use for simple
tracing where we need a kernel+userspace look at the problem at hand.
The intention is for use in simple cases (when condition x is met, what
is the value of variable y, etc).

For more advanced tracing though, there are other ideas being proposed
(see ntrace discussions on utrace-devel, but I guess now lkml is the
right place for that discussion too).

However, there are components of uprobes such as breakpoint
insertion/removal and single-stepping infrastructure that are
potentially useful to other userspace debuggers. We are working on
factoring those out to live independent of uprobes. You should be seeing
those patches soon.

Ananth
--

From: Christoph Hellwig
Date: Saturday, August 30, 2008 - 6:40 am

I hope my original mail was clear that I'm not principially against
utrace.  I think having a generic trace even dispatcher is a good thing
if it is a) not overly complicated (see Alexey
From: Roland McGrath
Date: Wednesday, September 3, 2008 - 5:09 am

Thanks for that clarification.  From what you've said in many of your
postings, it has been easy to get the impression that you are against
everything.

I sympathize with caution about a new API that needs users to iron out
its kinks.  Here is the difficulty I've been having.

I have spent a lot of time keeping the patch series up to date,
rebasing, etc.  This has sunk all the time that I'd like to have been
spending working on higher-level pieces such as the ntrace idea.

(This will be much better now that all the tracehook work is in 2.6.27.
I know you don't buy the rationale I've explained for that work.  But I
am confident that having this in is a substantial and immediate boon to
the ease and efficiency of getting more work done sooner in this area.)

At the same time, many of the people who have asked me about the state
of the effort say things to the effect that they would like to try
writing some features based on such an API once it's been merged in.

It was my hope that having it merged in and available as a non-default
option for those who select CONFIG_EXPERIMENTAL would encourage more
people to actually get going on experiments with new features.  Several
actual users beating on it is of course what's necessary to iron out
the API and the low-level code.

Since I finally no longer expect to spend much more time myself on arch
code or rebasing hassles given what's already merged, I look forward to
soon having time to work on some higher-level stuff that makes this all
get interesting.  Whatever I produce won't be all things to all people,
and it might not be to your taste at all.  It's certain that others who
tried could come up with some other things faster.  Having the schedule
of merging utrace contingent on having some things using it that people
consider ready to be merged won't delay me working on such a thing, but
I get the impression it might deter others.

The motivating principle of utrace is that it is the lowest level and
least overhead at which ...
From: Christoph Hellwig
Date: Wednesday, September 3, 2008 - 11:37 am

Yes, now you've got all the lowlevel bits in that should have been
long time ago :)  And you don;t have to rebase it all the time if you
don't have time for it.  Keep developing ntrace on one major release and

Then the people should help with the upstream process, post their bits
to lkml, help you with forward-porting, etc.

--

From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 8:34 am

Yes, that's your price for avoiding more races, more code, more races,
more tricky code and ultimately more ways to fsckup.

God, you're in the very tricky Unix subsystem only a few people know
intimately, and what we see? utrace code is just as tricky.

When you're confident that interaction with engines part is fine, all
stupid bugs were fixed, go change struct utrace to pointer. Now it can
very well be a lie to say less memory is used because slab allocator

There are no chickens and no eggs.

utrace is in RHEL4, RHEL5, FC6, FC7, FC8, FC9 kernels already.

I can't believe RedHat allowed to totally rewrite ptrace based on some


Total 3 in scheduler and in spufs (ppc-specific).

Amazing improvement for ugly macros, more self-modified kernel, explicit
reasons stated why they are stupid spelt in review and after entering tree,
and general dislike of some maintainers to add more trace_mark() entries.


So there were promises that markers will be useful, 10 months passed and
they are still useless. How did this happen?

Now utrace is in better position in this respect -- ptrace(2) will use
it from start but this doesn't change "promises" part.

--

From: Frank Ch. Eigler
Date: Friday, August 29, 2008 - 12:04 pm

That's an idea worth considering, especially given the oopses you

Well, that was how red hat broke the deadlock, and why RH kernels will

(I don't understand what you mean.  If you mean "does systemtap just
want function entry points a la ftrace", then the answer is no, it

Some of those hits represent more via macros, but anyway, more on

Regardless of the strength of technical objections/responses, there is
an ingrained cultural aspect to this that perhaps we'll break through

They are already useful to their users.


- FChE
--

From: Christoph Hellwig
Date: Saturday, August 30, 2008 - 6:43 am

Yes.  And btw, the early utrace patches actually moved the ptrace
state into dynamically allocated per engine state, which would almost
cancel out the additional members required for utrace.  I'm not sure
why this went away in the meantime.

--

From: Alexey Dobriyan
Date: Wednesday, August 27, 2008 - 11:50 am

Oh, I totally misread what this potion is about.

Of course, it shouldn't exist at all.

--

From: Peter Zijlstra
Date: Tuesday, August 26, 2008 - 7:03 pm

Which is I think the single most biggest mistake of this whole endevour.
Why not have it out on lkml where people can actually see?

--

From: Peter Zijlstra
Date: Monday, October 6, 2008 - 1:47 pm

Hi Roland,

I've been looking over the utrace code:

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git

git diff d3a47e82b6bc3724dd60f3ee4e84fe4479104382..utrace/master

and while I'm nowhere near done, I'd like to provide some feedback and
pose some questions.

 - what's up with these weak declarations?

 - struct utrace_attached_engine is a tad strange as we don't have a
regular struct utrace_engine.

 - does it make sense to create this struct utrace_engine and replace
the struct utrace_engine_ops and the void *data members of struct
utrace_attached_engine with a pointer to it, and obtain the data by
using container_of() on the engine itself? That is, let the user embed
struct utrace_engine in a larger structure.

 - I encountered a lot of unannotated memory barriers. Please add a
comment to each and every one describing the race and a pointer to its
pair. There is no such thing as a trivial memory barrier.

 - it has these decidedly un-kernel-ish public/private comments

 - Why does it have two lists for attaching tasks? The
description/comments explain how it works but not why we do it that way.

 - utrace_attach_task() was very hard to read, the code flow is
unconventional at best.

 - utrace_stop() can seemingly return true even though it didn't get
SIGKILL - contrary to its comments.

 - get_utrace_lock() made me look at ->engine_ops serialisation - I
couldn't convince myself its race free.

 - I saw a lot of if (unlikely(a) || unlikely(b)) style thing, please
write as if (unlikely(a || b)).

 - utrace_release_task() seems to be missing
rcu_read_lock()/rcu_read_unlock() to ensure the utrace pointer stays
valid.

 - utrace_control() seems to access ->exit_state in a racy manner.

 - some comments say 'race' but fail to provide specifics.

 - as was suggested by Christoph and Alexey, removing struct utrace
*task_struct::utrace in favour of embedding it right into task_struct
itself would remove quite a bit of complexity. I would ...
Previous thread: [PATCH][resubmit] TPM: update char dev BKL pushdown by Rajiv Andrade on Tuesday, August 26, 2008 - 1:56 pm. (5 messages)

Next thread: [PATCH] afs: fsclient.c sparse endian annotations of operation_ID by Harvey Harrison on Tuesday, August 26, 2008 - 3:05 pm. (4 messages)