Re: [PATCH] IPROUTE: correct nla nested message generated by netem_parse_opt

Previous thread: [PATCH 2/2] smp_call_function: use rwlocks on queues rather than rcu by Jeremy Fitzhardinge on Thursday, August 21, 2008 - 5:29 pm. (33 messages)

Next thread: [PATCH 0/2] Traffic control cgroups subsystem by Ranjit Manomohan on Thursday, August 21, 2008 - 5:53 pm. (1 message)
From: Jeff Kirsher
Date: Thursday, August 21, 2008 - 5:50 pm

From: Alexander Duyck <alexander.h.duyck@intel.com>

netem_parse_opt was generating a malformed nested compat message.  This patch
corrects it so that the nested arguments are contained within a nested nla
header.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 tc/q_netem.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/tc/q_netem.c b/tc/q_netem.c
index d06932e..a3365c1 100644
--- a/tc/q_netem.c
+++ b/tc/q_netem.c
@@ -128,7 +128,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			   struct nlmsghdr *n)
 {
 	size_t dist_size = 0;
-	struct rtattr *tail;
+	struct rtattr *nest;
 	struct tc_netem_qopt opt;
 	struct tc_netem_corr cor;
 	struct tc_netem_reorder reorder;
@@ -257,8 +257,6 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 		argc--; argv++;
 	}
 
-	tail = NLMSG_TAIL(n);
-
 	if (reorder.probability) {
 		if (opt.latency == 0) {
 			fprintf(stderr, "reordering not possible without specifying some delay\n");
@@ -277,8 +275,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 		return -1;
 	}
 
-	if (addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt)) < 0)
-		return -1;
+	nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
 
 	if (present[TCA_NETEM_CORR] &&
 	    addattr_l(n, 1024, TCA_NETEM_CORR, &cor, sizeof(cor)) < 0)
@@ -299,7 +296,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			return -1;
 		free(dist_data);
 	}
-	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	addattr_nest_compat_end(n, nest);
 	return 0;
 }
 

--

From: Stephen Hemminger
Date: Thursday, August 21, 2008 - 6:19 pm

On Thu, 21 Aug 2008 17:50:11 -0700

The kernel ABI can not change? The nested attribute order should not change.
--

From: Alexander Duyck
Date: Thursday, August 21, 2008 - 7:37 pm

On Thu, Aug 21, 2008 at 6:19 PM, Stephen Hemminger

The problem is the current kernel ABI was changed in commit
b9a2f2e450b0f770bb4347ae8d48eb2dea701e24 "netlink: Fix
nla_parse_nested_compat() to call nla_parse() directly" to support a
format that I have only seen generated in netem_parse_opt.  The kernel
and prio_parse_opt qdisc both generate the other format which includes
the extra attribute header to contain the nested attributes.  This
patch fixes tc so that netem will use the correct nested attribute
order for kernels prior to this commit and once this commit has been
reverted.
--

From: David Miller
Date: Friday, August 22, 2008 - 3:44 am

From: "Alexander Duyck" <alexander.duyck@gmail.com>
Date: Thu, 21 Aug 2008 19:37:08 -0700

--

From: Thomas Graf
Date: Wednesday, August 27, 2008 - 7:41 am

How was it ever supposed to work? The code looked like the following
prior to commit b03f4672007e533c8dbf0965f995182586216bf1 which made
it used nla_parse_nested_compat()

-       /* Handle nested options after initial queue options.
-        * Should have put all options in nested format but too late now.
-        */
-       if (nla_len(opt) > sizeof(*qopt)) {
-               struct nlattr *tb[TCA_NETEM_MAX + 1];
-               if (nla_parse(tb, TCA_NETEM_MAX,
-                             nla_data(opt) + sizeof(*qopt),
-                             nla_len(opt) - sizeof(*qopt), NULL))

nla_parse_nested_compat() now does exactly what the above code does. So
in what way has the kernel ABI changed?

There is two ways of sending a fixed struct + attributes inside an
attribute:

a) (old and outdated method)
  attr foo
    fixed struct
    [nested attr 1]
    [nested attr 2]

This format can be parsed with nla_parse_nested_compat(attr foo)

b) (new method)
  attr foo
    fixed struct
    attr container
      [nested attr 1]
      [nested attr 2]

This format is parsed with nla_parse_nested(attr container)
--

From: Duyck, Alexander H
Date: Wednesday, August 27, 2008 - 9:30 am

I think you will find that format a only existed in netem prior to your changes in nla_parse_nested_compat, and the problem is option b, which was a nested compat attribute, can no longer be parsed at all.  Basically what was generated before was an attribute with a nested attribute following it as data.  Now what you have is an attribute followed by a series of other attributes.

Also you are incorrect about nla_parse_nested.  The layout for it is something more like:

C) (nested format)
        attr foo
          [nested attr 1]
          [nested attr 2]

This is the format parsed by nla_parse_nested.  Basically what happened is that when you fixed things for netem you broke them for anyone else who might use the same message format.  If you take a look at nla_nest_compat_start and nla_nest_compat_end you will see that the message generated matches format b, but now you are parsing format a.  If the community insists that I can't change the kernel ABI back I am perfectly fine with that, but I think somebody needs to explain to me why we are handling two different formats and fix things so that the format is consistent.

Thanks,

Alex
--

From: Thomas Graf
Date: Wednesday, August 27, 2008 - 1:47 pm

Please learn to use you enter key. Thank you.


For the last time, the message format *CANNOT* be changed, the ABI has to
be stable. I didn't change the ABI, I _restored the ABI to the original
state after it was broken by the initial nla_parse_nested_compat()
implementation which contained a bug.

It's actually trivial to figure this out in 5 minutes using
git-whatchanged, not sure why you didn't do it.

1092cb219774a82b1f16781aec7b8d4ec727c981
[NETLINK]: attr: add nested compat attribute type

Introduced nla_parse_nested_compat() for use in netem, unfortunately
it contained a bug which made it behave incorrectly and broke netem.

b9a2f2e450b0f770bb4347ae8d48eb2dea701e24
netlink: Fix nla_parse_nested_compat() to call nla_parse() directly

This fixed nla_parse_nested_compat() in the way it was supposed
to be from the beginning. It restored the original ABI.

Unfortunately, commit 1e90474c377e92db7262a8968a45c1dd980ca9e5 made
prio use nla_parse_nested_compat() which it shouldn't have as it 
does not use the same format. It worked due to the bug in
nla_parse_nested_compat() and then broke when nla_parse_nested_compat()
was fixed. Since prio was removed, this is no longer a problem.

Ever since, netem is the only user of nla_parse_nested_compat() so
I have no idea what you mean when you say I broke it for everybody
else.

To put it very very simple, users of the message format as found in
netem is supposed to use nla_parse_nested_compat(), everybody else
is supposed to be using nla_parse_nested().

I won't bother to comment on the rest, since it shoudl be answered with
this or simply didn't make any sense.
--

From: Stephen Hemminger
Date: Wednesday, August 27, 2008 - 2:10 pm

On Wed, 27 Aug 2008 22:47:50 +0200

Thomas is right. The ABI was established incrementally and does not use
pure nested format. The original format was just one structure, then
as additional features were added, additional attributes were added.

--

From: David Miller
Date: Wednesday, August 27, 2008 - 11:47 pm

From: Thomas Graf <tgraf@suug.ch>

Correct, but here is the sequence of events I discovered after
researching this fully:

1) Patrick adds nla_next_compat*() and NLA_NESTED_COMPAT, in this
   changeset:

	commit 1092cb219774a82b1f16781aec7b8d4ec727c981
	Author: Patrick McHardy <kaber@trash.net>
	Date:   Mon Jun 25 13:49:35 2007 -0700

	    [NETLINK]: attr: add nested compat attribute type

2) The multiqueue packet scheduler bits got added by Peter W. in
   this changeset:

	commit d62733c8e437fdb58325617c4b3331769ba82d70
	Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
	Date:   Thu Jun 28 21:04:31 2007 -0700

	    [SCHED]: Qdisc changes and sch_rr added for multiqueue

   The thing to note is that at this point this code is using
   rtattr_parse_nested_compat(), RTA_NEST_COMPAT*(), etc.

   This went into 2.6.23

3) iproute2 gets sch_rr support from Peter W. on August 14th, from
   this iproute2 commit:

	commit 292ce96bca64dee087fe00d38743f5e2d1895c5d
	Author: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
	Date:   Tue Aug 14 11:21:24 2007 -0700

	    iproute2: sch_rr support in tc

4) On January 22nd, 2008, Patrick converted all of the packet schedulers
   to nla_*(), nlattr, etc.

	commit 1e90474c377e92db7262a8968a45c1dd980ca9e5
	Author: Patrick McHardy <kaber@trash.net>
	Date:   Tue Jan 22 22:11:17 2008 -0800

	    [NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API

   At this point, even though PRIO and RR were converted as well, they
   were still working properly with iproute2 as-is.

   This went into 2.6.25

5) One day later Patrick converted netem to nla_parse_nested_compat:

	commit b03f4672007e533c8dbf0965f995182586216bf1
	Author: Patrick McHardy <kaber@trash.net>
	Date:   Wed Jan 23 20:32:21 2008 -0800

	    [NET_SCHED]: sch_netem: use nla_parse_nested_compat

   This broke netem.

   This also went into 2.6.25

6) And then on May 22nd, 2008 we get the changeset that has obtained
   a lot of ...
From: Thomas Graf
Date: Thursday, August 28, 2008 - 3:18 am

So we remove nla_parsed_nested_compat() just to add it again under
a different name since netem is still going to use that piece of
code. That certainly makes sense. Fixing prio and rr in stable is
a one liner.
--

From: David Miller
Date: Thursday, August 28, 2008 - 3:19 am

From: Thomas Graf <tgraf@suug.ch>

I love patches, send me one :)
--

From: Thomas Graf
Date: Thursday, August 28, 2008 - 6:03 am

nla_parse_nested_compat() was used to parse two different message
formats in the netem and prio qdisc, when it was "fixed" to work
with netem, it broke the multi queue support in the prio qdisc.
Since the prio qdisc code in question is already removed in the
development tree, this patch only fixes the regression in the
stable tree.

Based on original patch from Alexander H Duyck <alexander.h.duyck@intel.com>

Signed-off-by: Thomas Graf <tgraf@suug.ch>

Index: linux-2.6.26.y/net/sched/sch_prio.c
===================================================================
--- linux-2.6.26.y.orig/net/sched/sch_prio.c	2008-08-28 13:24:22.000000000 +0200
+++ linux-2.6.26.y/net/sched/sch_prio.c	2008-08-28 14:16:44.000000000 +0200
@@ -228,14 +228,20 @@
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 	struct tc_prio_qopt *qopt;
-	struct nlattr *tb[TCA_PRIO_MAX + 1];
+	struct nlattr *tb[TCA_PRIO_MAX + 1] = {0};
 	int err;
 	int i;
 
-	err = nla_parse_nested_compat(tb, TCA_PRIO_MAX, opt, NULL, qopt,
-				      sizeof(*qopt));
-	if (err < 0)
-		return err;
+	qopt = nla_data(opt);
+	if (nla_len(opt) < sizeof(*qopt))
+		return -1;
+
+	if (nla_len(opt) >= sizeof(*qopt) + sizeof(struct nlattr)) {
+		err = nla_parse_nested(tb, TCA_PRIO_MAX,
+				       (struct nlattr *) (qopt + 1), NULL);
+		if (err < 0)
+			return err;
+	}
 
 	q->bands = qopt->bands;
 	/* If we're multiqueue, make sure the number of incoming bands
--

From: David Miller
Date: Tuesday, September 2, 2008 - 5:31 pm

From: Thomas Graf <tgraf@suug.ch>

I'll queue this up for -stable, thanks Thomas.
--

Previous thread: [PATCH 2/2] smp_call_function: use rwlocks on queues rather than rcu by Jeremy Fitzhardinge on Thursday, August 21, 2008 - 5:29 pm. (33 messages)

Next thread: [PATCH 0/2] Traffic control cgroups subsystem by Ranjit Manomohan on Thursday, August 21, 2008 - 5:53 pm. (1 message)