RE: [PATCH]: tipc: Fix oops on send prior to entering networked mode

Previous thread: [net-next 2.6 PATCH] net/pcmcia: convert to use netdev_for_each_mc_addr by Jiri Pirko on Friday, February 19, 2010 - 11:48 am. (2 messages)

Next thread: Subject: [PATCH 6/6] bna: Brocade 10Gb Ethernet device driver by Rasesh Mody on Friday, February 19, 2010 - 2:52 pm. (1 message)
From: Neil Horman
Date: Friday, February 19, 2010 - 12:40 pm

Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE

user programs can oops the kernel by sending datagrams via AF_TIPC prior to
entering networked mode.  The following backtrace has been observed:

ID: 13459  TASK: ffff810014640040  CPU: 0   COMMAND: "tipc-client"
#0 [ffff81002d9a5810] crash_kexec at ffffffff800ac5b9
#1 [ffff81002d9a58d0] __die at ffffffff80065127
#2 [ffff81002d9a5910] do_page_fault at ffffffff80066da7
#3 [ffff81002d9a5a00] error_exit at ffffffff8005dde9
[exception RIP: tipc_node_select_next_hop+90]
RIP: ffffffff8869d3c3  RSP: ffff81002d9a5ab8  RFLAGS: 00010202
RAX: 0000000000000001  RBX: 0000000000000001  RCX: 0000000000000001
RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000001001001
RBP: 0000000001001001   R8: 0074736575716552   R9: 0000000000000000
R10: ffff81003fbd0680  R11: 00000000000000c8  R12: 0000000000000008
R13: 0000000000000001  R14: 0000000000000001  R15: ffff810015c6ca00
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1
#5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558
#6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d
#7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79
#8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b
#9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261
RIP: 0000003cbd8d49a3  RSP: 00007fffc84e0be8  RFLAGS: 00010206
RAX: 000000000000002c  RBX: ffffffff8005d116  RCX: 0000000000000000
RDX: 0000000000000008  RSI: 00007fffc84e0c00  RDI: 0000000000000003
RBP: 0000000000000000   R8: 00007fffc84e0c10   R9: 0000000000000010
R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
R13: 00007fffc84e0d10  R14: 0000000000000000  R15: 00007fffc84e0c30
ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b

What happens is that, when the tipc module in inserted it enters a standalone
node mode in which communication to its own address is allowed <0.0.0> but not
to other addresses, since the appropriate ...
From: Stephens, Allan
Date: Monday, February 22, 2010 - 3:44 pm

Hi Neil:

Good work on spotting this bug, and in tracking down the cause.  I took
a look at your patch, but there are a couple of problems I can see with
the approach you've taken to fix things:

1) The check you've added to send_msg() in net/tipc/socket.c will help
prevent things from blowing up if the message sender is using an AF_MIPC
socket from user space, but it won't help prevent a similar oops if an
"application" uses TIPC's native API to send a message directly from
kernel space.

2) The other change you made to defer setting tipc_mode to TIPC_NET_MODE
will cause problems if TIPC has to bail out during tipc_net_start()
because of problems.  Specifically, the ensuing call to tipc_net_stop()
won't get a chance to clean up any partial initialization that got done
prior to the initialization problem, which could result in memory leaks.

Fortunately, I think there's a relatively easy solution.  Since TIPC
always needs to call tipc_node_select() in order to send an off-node
message, you should be able to add the necessary error checking there.
I'm thinking of something along the lines of:

static inline struct tipc_node *tipc_node_select(u32 addr, u32 selector)
{
	if (likely(in_own_cluster(addr)))
		return tipc_local_nodes ?
tipc_local_nodes[tipc_node(addr)] : NULL;
	return tipc_net->zones ? tipc_node_select_next_hop(addr,
selector) : NULL;
}

Please give this a try and let us know how things go.

Regards,
--

From: Neil Horman
Date: Monday, February 22, 2010 - 6:11 pm

Thanks.  Like I mentioned in my initial post, my approach only had minimal
That seems like it might be a problem in and of itself.  If the startup/shutdown
code relies on the state of the implementation, perhaps that worth cleaning up
I'm happy to give this a try, but I'm a bit concerned by this approach.  It
certainly seems like it will solve the problem at hand, but it doesn't seem to
address the underlying cause, which is that you can execute code paths which the
state of the implementation doesn't/shouldn't allow.  In other words, this solve
the immediate problem, but it still allows someone to try send data via tipc
before tipc is ready to send data.  It would be nice if we could deal with the
larger problem.

I'll let you know how the above patch goes.

Regards
--

From: Stephens, Allan
Date: Tuesday, February 23, 2010 - 8:02 am

I'm afraid I don't understand what you mean here.  It sounds like you're
saying that TIPC's code shouldn't rely on the state of its own

I don't think you've stated the issue correctly.  The problem you
encountered isn't that TIPC is allowing users to send data before TIPC
is ready, it's that TIPC is allowing users to send data *off-node*
before TIPC is ready.  TIPC was deliberately designed so that messages
could be sent within the node itself as soon as possible, without having
to wait for full connectivity to the rest of the network, and this it
does quite well.  Later, once connectivity to the network is
established, TIPC allows applications to send data to other nodes, and
this also appears to work properly providing the applications use TIPC's
location transparent service naming form of addressing.

What we don't appear to have anticipated is that someone would attempt
to send messages to another node without first receiving an indication
from TIPC that the node could be reached.  As far as I can tell, the
only way that the crash you encountered could be generated would be if
your application blindly tries to send to a named service on a specified
node without first waiting for notification that the node exists.
(Please correct me if I'm wrong about this.)  While this operation is
certainly legal, including a specific node address in a send operation
kind of defeats the whole location transparent addressing premise on
which TIPC is based and I'm wondering if it's really necessary to do
things this way.  Regardless, the fix I've suggested seems to me to be a
reasonable way of blocking premature sends off-node while still

Thanks,
Al 
--

From: Neil Horman
Date: Tuesday, February 23, 2010 - 9:09 am

Not at all, I'm saying quite the opposite, that TIPC should rely on its own
state, but in its current implementation:

1) It doesn't (i.e. theres no check in the send path for messages that the
internal mode is TIPC_NET_MODE if the destination address is not for the local z.c.n
tuple).

2) It couldn't rely on the internal state if it did check (i.e tipc_net_start
sets tipc_mode to TIPC_NET_MODE prior to initalizing the data structures
required to support sending messages off node).  So een if we did do a check for
being in NET mode prior to sending a message, it would be useless because theres
a window of time where the implementation says its ok to send, but its really
Well, yes.  Sorry for not being clear.  We only need to check that we're in net
My initial patch checked for this:
+       if ((tipc_mode != TIPC_NET_MODE) &&
+           (dest->addr.name.domain != tipc_own_addr))
+               return -EHOSTUNREACH;
+

If we're not in NET mode but the destination is not tipc_own_addr (initialized

As you said, its perfectly legal, and one of the mandates of the kernel is to
prevent unprivlidged user space applications from taking down the entire system
when they do something stupid.  I completely agree with you that user space is
doing bad things here, but bad things need to result in errors and broken
I agree that you patch fixes the exact problem that I reported here, but theres
more to it than that.  A quick grep of the tipc stack reveals the following
symbols:
tipc_bearers
media_list
tipc_local_nodes
bcbearer
bclink
tipc_net.zones

All of these symbols:

1) Are allocated dynamically in tipc_net_start, _after_ tipc_mode is set to
TIPC_NET_MODE

2) dereferenced without NULL pointer checks in either the send path or in the
netlink configuration path, both of which are reachable from user space.

So your patch fixes the last item on your list, but what about the others?  In
fact, I'll bet I can very quickly change the application to trip over a ...
From: Stephens, Allan
Date: Tuesday, February 23, 2010 - 9:21 am

The semantics of TIPC addressing don't allow a node address of the form
<0.0.N> where N != 0, so this kind of a send ateempt should be caught
and handled by TIPC.  However, you've already found one missing error
check, so it's certainly worth trying it out!

Regards,
Al
--

From: Neil Horman
Date: Wednesday, February 24, 2010 - 11:53 am

So, I've tested out your patch, and it fixes the problem that was reported (no
suprisingly, it was pretty clear that it would), it also managed to fix the
access to tipc_local_nodes (I'd previously missed the chunk of the patch that
added the ? to that access), so thats all great.  Then I did this:
./tipc-config -lsr 1.1.10:eth3-1.1.17:eth2

And got this:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000c4
IP: [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 [tipc]
PGD 11be1a067 PUD 11c721067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3 
Pid: 1284, comm: tipc-config Not tainted 2.6.33-rc8 #1 0YK962/PowerEdge SC1435
RIP: 0010:[<ffffffffa030f6a8>]  [<ffffffffa030f6a8>]
tipc_bearer_find_interface+0x1f/0x66 [tipc]
RSP: 0018:ffff88011c7b3a08  EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88011c7b3900
RDX: 0000000000000000 RSI: ffff88011c7b39c0 RDI: ffff88011c7b3a3c
RBP: ffff88011c7b3a28 R08: 0000000000000032 R09: 000000000000000a
R10: ffffffffa03258b8 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88011c7b3a3c R14: 0000000000000040 R15: ffffffff829a89b0
FS:  00007fd2d1c56700(0000) GS:ffff880082400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c4 CR3: 000000011c7aa000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tipc-config (pid: 1284, threadinfo ffff88011c7b2000, task
ffff88011cba48a0)
Stack:
 ffff880023022260 ffff88011c7b3a38 ffff880023022260 ffff88011c7b3aa0
<0> ffff88011c7b3a88 ffffffffa031294c 336874650100100a ffff880023022200
<0> 0100101100000040 ffffff0032687465 ffff88011c7b3a88 00000000c78c4377
Call Trace:
 [<ffffffffa031294c>] link_find_link+0x40/0x9d [tipc]
 [<ffffffffa0312fce>] ? tipc_link_cmd_reset_stats+0x63/0xbb [tipc]
 [<ffffffffa0312fda>] ...
From: Stephens, Allan
Date: Wednesday, February 24, 2010 - 12:05 pm

Hi Neil:

Have you tried upgrading your system to use TIPC 1.7.6 (available at
http://tipc.sourceforge.net/tipc_linux.html)?  This is a significant
revised and enhanced version of TIPC that hasn't yet made its way into
mainsteam Linux, and seems to be the version-of-choice for most TIPC
users.  It also appears to avoid a number of the issues that currently
exist in TIPC 1.6, including the one caused by the random configuration
command you mentioned in your email below.

I didn't have a problem with you working on a small patch for TIPC 1.6
to get around a limited problem, but I'd hate to see you waste time on
fixing issues that have already been addressed in TIPC 1.7.

Regards,
--

From: Neil Horman
Date: Wednesday, February 24, 2010 - 2:15 pm

Yeah, yeah, it looks like 1.7.6 peppered the config paths with tipc_mode checks
all over the place. Fine, can you post a patch here of the change that added
that?  Can you also post a version of the patch that you posted for the
tipc_local_node patch and tipc_net that davem can apply (since that fixed the
remaining problems).

That just leaves the race condition on the mode setting (in which there is a
time between the setting of tipc_mode and the completion of tipc_net_start
during which you can pass all the mode checks without having all the data
initalized.  I'll send a patch for that shortly.

As for this being a waste of time, its really not.  Despite having a later
version that developers might like to use, most distributions fork the upstream
kernel directly, and assume fixes appear here.  Even if developers never use the
tipc module that ships with the upstream kernel, just having it built in case
anyone wants to use it opens those distributions up to critical bugs, like this
one, which allows unprivlidged local users to crash the system.  And for those
distributions, 'Just go get the latest source' really isn't a viable option in
many/most cases. If its not fixed in the public kernel, then the code isn't very
worthwhile to anyone.  And for those distributions, 'Just go get the latest
source' really isn't a viable option in many/most cases.


Looking at the 1.7.6 tarball on sourceforge, its dated 10/10/2008, so you've
basically got a 1.5 year lag between the development version and the commited
version that distributions are using (and counting).  I'm sorry, I'm not trying
to be crass about this, but its rather disconcerting to see that kind of
discrepancy between the code development point and whats in net-next.   It
implies that the only fix for a tipc problem is a wholesale upgrade of the tipc
directory in the kernel (since it would seem the sourceforge tipc cvs tree has
gone unused for a few years).  Based on that it seems unwise for any
distribution to default ...
From: David Miller
Date: Wednesday, February 24, 2010 - 6:38 pm

From: Neil Horman <nhorman@tuxdriver.com>

I'm extremely upset about this.

Especially given the fact that EVERY DAMN TIME there were TIPC
patches posted I applied EVERY DAMN ONE of them, and I did so
in an expediant manner.

So there is zero reason for the TIPC protocol development to not have
occured upstream.  I, in fact, facilitated things to work that way in
the smoothest way possible, if only the TIPC developers had obliged.
--

From: Stephens, Allan
Date: Thursday, February 25, 2010 - 7:24 am

I think it is important to understand how things got to be the way they
are with TIPC before any finger pointing starts.

When TIPC 1.6 was first integrated into Linux (back in 2.6.16) there was
some considerable flak generated because it was done as a bulk patch,
rather than being trickled in as a series of smaller, logically distinct
patches.  Since then, I have carefully ensured that all subsequent
patches were submitted in the conventional Linux manner to avoid giving
anyone cause for complaint.  (And there can be no argument about the
fact that David has been very good about applying the patches as soon as
they were made available.)

Later, the company that employs me found it necessary to enhance TIPC
with new capabilities which would be included in a couple of our
operating systems (one of which is Linux-based).  To meet our schedule,
it was necessary to make a large number of major changes to TIPC, and it
was felt that submitting these relatively untested changes to mainstream
Linux would be potentially destabilzing and therefore undesirable.
Thus, development was done downstream under the "TIPC 1.7" banner, with
the intent that the changes would be pushed back upstream once we knew
they wouldn't negatively impact existing functionality.  And so it went
at first -- about 65 patches from TIPC 1.7 were in fact incorporated
into Linux in 2008.  However, because TIPC 1.7 was not developed using
git (remember that Linux was only one of the operating systems
involved), there was no existing set of small, logically distinct
patches that could be applied to mainstream Linux, and each of the
patches that was submitted had to be carefully drafted from the finished
code base in a time-consuming manual process.

Then came an economic downturn, and our company was forced to divert
development resources away from TIPC and the remainder of the TIPC 1.7
code remained downstream-only (although freely available, and supported,
on SourceForge).  And while the status quo is ...
From: David Miller
Date: Thursday, February 25, 2010 - 8:06 am

From: "Stephens, Allan" <allan.stephens@windriver.com>

Are you out of your mind?

We're fixing the OOPS in the smallest and most expediant way
possible.

Integrating all of TIPC 1.7 to fix this OOPS is utter and pure
insanity.
--

From: Stephens, Allan
Date: Thursday, February 25, 2010 - 9:24 am

That's what I thought you'd say, but I still thought it was worth asking
the question.  I've got no problem with limiting the changes to the
matter currently at hand.

-- Al
--

From: David Miller
Date: Thursday, February 25, 2010 - 8:13 am

From: "Stephens, Allan" <allan.stephens@windriver.com>

Things were going just fine until Ericsson farmed the TIPC stuff out
to you guys, really.

In fact, an incredible amount of effort was made by the Ericsson folks
to get the TIPC stack upstream in the first place.

And now you guys made all of that basically for naught by taking your
work downstream.

A healthy upstream project turned into a "business decision."

Thanks!

The fact is, you would have had LESS work to do if you have integrated
your work upstream as a rule.  Us upstream folks would have been
handling any and all networking API changes transparently for you.

And people who run automated tools to validate code and look for bugs
would have been fixing bugs in TIPC for you.

The list goes on an on.

But it doesn't make any "business sense" for you to work on your code
upstream so you didn't do it.

And now you want to suggest that we dump huge unreviewable chunks of
code into the tree, again because it's less work for _YOU_.

Thanks!
--

From: Neil Horman
Date: Thursday, February 25, 2010 - 8:23 am

While I sympathize with your history, this really stick out of your explination
for me:
"""

To me I see that and read it as:
"We didn't have time to do things the way the Linux maintainers like to have
them done, so we went off and did it on our own, figuring we'd get back to doing
it the right way later"

Well, flash forward 1.5 years, and getting back to it never happened.  While I
understand that can happen, it puts users in the lurch if they expect upstream
to be the latest bits.  As such, have you considered just dropping TIPC from
upstream entirely?  I know that sounds a bit angry, I don't intend it to, I mean
it in all seriousness.  Companies have constraints that sometimes conflict with
upstream practices though, thats just a fact of life, and theres nothing we can
really do about that.  But if the review/accptance criteria of upstream
development doesn't fit into a companies budget or schedule, not doing kernel community
development might be the best solution.  Doing so tells the distros that they
have extra work to do if they want to incorporate/support tipc in their kernels,
and frees your company to develop on its own schedule, and according to its own
resources, while still being accessable to end users.  Of course, in so doing
you don't get the maintenence changes that come with an ever evolving ABI/etc,
but thats the price you would have to pay, and a decision you could make.

I'm not trying to tell you its the best solution, just an alternate option.
Honestly, what I'd like to see is all the remaining changes in TIPC 1.7 go in as
individual commits, so that we have a good change history, and a resonable
review on all the changes.  I know that takes longer but I think its the right
solution.  I'll even volunteer to help, if that lightens your load any.  If you
can provide me access to whatever scm you used (or even some modicum of
changelog in a text file, so that we could try to match up changes with
documentation), I'll happily start pulling stuff apart to ...
From: Stephens, Allan
Date: Thursday, February 25, 2010 - 1:33 pm

I can't argue with the basic thrust of your comments, Neil, nor with the
similar comments David has made in his previous emails.  The current
situation is certainly regrettable, and I think we're all in agreement
that the best solution is to get the remainder of TIPC 1.7 into Linux as
soon as possible (as individual patches, not a single mega-patch), then
to limit any future changes to the upstream side of things.

I've got no problem with Neil's offer to assist in getting the necessary
patches generated, as this will undoubtedly speed things along faster
than anything I could do on my own.  (I suggest that we get started on
this once we get the current oops situation resolved.)  This kind of
offer to share the burden is to be commended, and is an example that

Actually, we've got no plans to continue doing out-of-tree development
on TIPC -- we've already suffered enough pain by following that route,
and I certainly don't want to repeat the experience!  The upside of
having done things out-of-tree in the past is that it gives us hard data
we can use to squelch discussion the next time somebody suggests doing
it again.

Regards,
Al
 
--

From: Neil Horman
Date: Thursday, February 25, 2010 - 2:14 pm

Well, I'm glad to help, let me know what I can do, and what data you can put in
my hands to get it done.

FWIW, I'm looking over the missing TIPC_NET_MODE checks in the tarball, and I
think they're small enough that we can roll them in with the other changes to
the send path pretty easily.  I've also run accross a locking issue (just
theoretical, nothing I've demonstrated to myself yet).  But it appears to exist
in the 1.7 tarball as well as upstream.

I'll roll it all up and
post a omnibus patch to fix the opposes in the next day or two.

--

From: Neil Horman
Date: Wednesday, February 24, 2010 - 2:19 pm

As noted in my previous note, I'd agreed tht the preiviously posted patches for
the various tipc patches we encountered were sufficient for the oops at hand.
This patch closes the race condition thats left open after the application of
those patches.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/tipc/tipc.h |    3 ++-
 net/tipc/net.c          |    8 +++++---
 2 files changed, 7 insertions(+), 4 deletions(-)


diff --git a/include/net/tipc/tipc.h b/include/net/tipc/tipc.h
index 9566608..14076e9 100644
--- a/include/net/tipc/tipc.h
+++ b/include/net/tipc/tipc.h
@@ -54,7 +54,8 @@ u32 tipc_get_addr(void);
 
 #define TIPC_NOT_RUNNING  0
 #define TIPC_NODE_MODE    1
-#define TIPC_NET_MODE     2
+#define TIPC_TRANS_MODE	  2
+#define TIPC_NET_MODE     3
 
 typedef void (*tipc_mode_event)(void *usr_handle, int mode, u32 addr);
 
diff --git a/net/tipc/net.c b/net/tipc/net.c
index 7906608..daeafe7 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -278,7 +278,7 @@ int tipc_net_start(u32 addr)
 	tipc_cfg_stop();
 
 	tipc_own_addr = addr;
-	tipc_mode = TIPC_NET_MODE;
+	tipc_mode = TIPC_TRANS_MODE;
 	tipc_named_reinit();
 	tipc_port_reinit();
 
@@ -295,18 +295,20 @@ int tipc_net_start(u32 addr)
 	info("Started in network mode\n");
 	info("Own node address %s, network identity %u\n",
 	     addr_string_fill(addr_string, tipc_own_addr), tipc_net_id);
+	tipc_mode = TIPC_NET_MODE;
 	return 0;
 }
 
 void tipc_net_stop(void)
 {
-	if (tipc_mode != TIPC_NET_MODE)
+	if (tipc_mode < TIPC_TRANS_MODE)
 		return;
+	tipc_mode = TIPC_TRANS_MODE;
 	write_lock_bh(&tipc_net_lock);
 	tipc_bearer_stop();
-	tipc_mode = TIPC_NODE_MODE;
 	tipc_bclink_stop();
 	net_stop();
+	tipc_mode = TIPC_NODE_MODE;
 	write_unlock_bh(&tipc_net_lock);
 	info("Left network mode \n");
 }
--

From: David Miller
Date: Wednesday, February 24, 2010 - 6:34 pm

From: "Stephens, Allan" <allan.stephens@windriver.com>

Why is the TIPC kernel protocol code being developed out of tree
instead of directly upstream?

Regardless of that, if the code is in the kernel tree and it can be
OOPS'd, we must fix it.
--

From: Neil Horman
Date: Wednesday, February 24, 2010 - 6:42 pm

Agreed, with a combination of:
1) The patch which Allan posted to check tipc_net.zone and tipc_local_nodes for
being NULL in tipc_select_node

2) The patch that I asked Allan to extract from the sourceforge package which
adds checks for TIPC_NET_MODE in the appropriate places

3) The patch I posted to close the race in the setting of tipc_mode

we should fix all the oopses I've found so far.

Ideally it would seem what would be most appropriate would be to wholesale
import tipc 1.7.6 from sourceforge to the kernel to get all the fixes that have
gone in there.  That seems like its been waiting in the wings for over a year
now.

Neil

--

From: Neil Horman
Date: Tuesday, March 2, 2010 - 11:33 am

Ok, after some debate, heres version 2 of this patch.  Its a complete rewrite.
I started with the patches we'd proposed previously, then realized theres a much
easier and elegant way to handle this, which I've implemented below.

Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE

user programs can oops the kernel by sending datagrams via AF_TIPC prior to
entering networked mode.  The following backtrace has been observed:

ID: 13459  TASK: ffff810014640040  CPU: 0   COMMAND: "tipc-client"
[exception RIP: tipc_node_select_next_hop+90]
RIP: ffffffff8869d3c3  RSP: ffff81002d9a5ab8  RFLAGS: 00010202
RAX: 0000000000000001  RBX: 0000000000000001  RCX: 0000000000000001
RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000001001001
RBP: 0000000001001001   R8: 0074736575716552   R9: 0000000000000000
R10: ffff81003fbd0680  R11: 00000000000000c8  R12: 0000000000000008
R13: 0000000000000001  R14: 0000000000000001  R15: ffff810015c6ca00
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1
#5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558
#6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d
#7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79
#8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b
#9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261
RIP: 0000003cbd8d49a3  RSP: 00007fffc84e0be8  RFLAGS: 00010206
RAX: 000000000000002c  RBX: ffffffff8005d116  RCX: 0000000000000000
RDX: 0000000000000008  RSI: 00007fffc84e0c00  RDI: 0000000000000003
RBP: 0000000000000000   R8: 00007fffc84e0c10   R9: 0000000000000010
R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
R13: 00007fffc84e0d10  R14: 0000000000000000  R15: 00007fffc84e0c30
ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b

What happens is that, when the tipc module in inserted it enters a standalone
node mode in which communication to its own address is allowed <0.0.0> but not
to other addresses, ...
From: Stephens, Allan
Date: Wednesday, March 3, 2010 - 9:51 am

Hi Neil:

Your patch looks pretty good to me, and is definitely a much cleaner
solution than what was discussed previously.

The only improvements I can suggest are the following:

- It looks like the check to see if tipc_bearers is NULL at the start of
tipc_bearer_stop() is unnecessary, since it's now a static array.

- The check to see if media_list is NULL at the start of
tipc_register_media() needs to be replaced with a check to ensure
tipc_mode is TIPC_NET_MODE.  (Unlike the case with the tipc_bearers
check, we still need some sort of test in place to cause the routine to
fail if someone uses TIPC's native API to try registering an add-on
media type before TIPC is ready to handle it.)

Other than these minor points I can't see anything else that would cause
problems.

Regards,
--

From: Neil Horman
Date: Wednesday, March 3, 2010 - 11:31 am

Ok, version 3 of the path, taking comments into account.

Notes:
1) Removed now-meaningless check from tipc_bearer_stop

2) Added tipc_mode check at tipc_register_media.  I techincally think this isn't
needed, since we're only writing an entry to the media_list here, and don't
depend on anything else, but it doesn't hurt.  In fact, I'm wondering if this
isn't an example of how the whole tipc_mode stuff might become unneeded.  If we
can make all the relevant data structures statically allocated, TIPC_NODE_MODE
might just be the trivial case of TIPC_NET_MODE in which only the local node is
reachable.  Thats another debate though.


Summary:

Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE

user programs can oops the kernel by sending datagrams via AF_TIPC prior to
entering networked mode.  The following backtrace has been observed:

ID: 13459  TASK: ffff810014640040  CPU: 0   COMMAND: "tipc-client"
[exception RIP: tipc_node_select_next_hop+90]
RIP: ffffffff8869d3c3  RSP: ffff81002d9a5ab8  RFLAGS: 00010202
RAX: 0000000000000001  RBX: 0000000000000001  RCX: 0000000000000001
RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000001001001
RBP: 0000000001001001   R8: 0074736575716552   R9: 0000000000000000
R10: ffff81003fbd0680  R11: 00000000000000c8  R12: 0000000000000008
R13: 0000000000000001  R14: 0000000000000001  R15: ffff810015c6ca00
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
RIP: 0000003cbd8d49a3  RSP: 00007fffc84e0be8  RFLAGS: 00010206
RAX: 000000000000002c  RBX: ffffffff8005d116  RCX: 0000000000000000
RDX: 0000000000000008  RSI: 00007fffc84e0c00  RDI: 0000000000000003
RBP: 0000000000000000   R8: 00007fffc84e0c10   R9: 0000000000000010
R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
R13: 00007fffc84e0d10  R14: 0000000000000000  R15: 00007fffc84e0c30
ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b

What happens is that, when the tipc module in inserted it enters a standalone
node mode in which communication to its ...
From: David Miller
Date: Thursday, March 4, 2010 - 1:40 am

From: Neil Horman <nhorman@tuxdriver.com>

Applied.
--

From: David Miller
Date: Monday, March 8, 2010 - 1:19 pm

Doesn't apply to net-2.6, Neil please respin.
--

From: Neil Horman
Date: Monday, March 8, 2010 - 1:49 pm

Will do.  I'll repost in a few hours
Neil

--

From: Neil Horman
Date: Monday, March 8, 2010 - 2:13 pm

Yes, sorry David, didn't mean to reply so quickly before.  I just looked at your
net-2.6 tree, and d0021b252eaf65ca07ed14f0d66425dd9ccab9a6 is in there just
fine.  Not sure if I'm missing something or not, but as far as I can see you've
already apply the patch properly.
Regards
Neil

--

From: David Miller
Date: Monday, March 8, 2010 - 2:26 pm

From: Neil Horman <nhorman@tuxdriver.com>

Perfect.
--

Previous thread: [net-next 2.6 PATCH] net/pcmcia: convert to use netdev_for_each_mc_addr by Jiri Pirko on Friday, February 19, 2010 - 11:48 am. (2 messages)

Next thread: Subject: [PATCH 6/6] bna: Brocade 10Gb Ethernet device driver by Rasesh Mody on Friday, February 19, 2010 - 2:52 pm. (1 message)