Re: [RFC] bonding: fix workqueue re-arming races

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Jay Vosburgh
Date: Wednesday, September 1, 2010 - 10:14 am

Jiri Bohac <jbohac@suse.cz> wrote:


	Yep, the current code does appear to have a race in
bond_alb_monitor that should be fixed (to check curr_active_slave before
dereferencing it).

	I also thought a bit more, and in the current code, the mode
shouldn't change in the middle of one of the work functions, because a
mode change requires the bond to be closed, so the various work things
will be stopped (more or less; excepting the race under disucssion
here).

	I don't think this is true for the new wq_rtnl functions,
however, because its work items are not canceled until the workqueue
itself is freed in bond_destructor.  Does the wq_rtnl open new races,
because it's work items are not synchronized with other activities
(close in particular)?  It's possible for its work functions (which may
do things like set the active slave, carrier, etc) to be invoked after
the bond has closed, and possibly reopened, or been otherwise adjusted.

	I'm not sure this is better than one of the alternatives I
believe we discussed the last time around: having the rtnl-acquiring
work functions do a conditional acquire of rtnl, and if that fails,
reschedule themselves.

	So, e.g., bond_mii_monitor becomes something like:

void bond_mii_monitor(struct work_struct *work)
{
	struct bonding *bond = container_of(work, struct bonding,
					    mii_work.work);

	read_lock(&bond->lock);

	if (bond->slave_cnt == 0)
		goto re_arm;

	if (bond->send_grat_arp) {
		read_lock(&bond->curr_slave_lock);
		bond_send_gratuitous_arp(bond);
		read_unlock(&bond->curr_slave_lock);
	}

	if (bond->send_unsol_na) {
		read_lock(&bond->curr_slave_lock);
		bond_send_unsolicited_na(bond);
		read_unlock(&bond->curr_slave_lock);
	}

	if (bond_miimon_inspect(bond)) {
		read_unlock(&bond->lock);
		/*
		 * Awe-inspiring comment explaining why we do this
		 */
		if (rtnl_trylock()) {
			read_lock(&bond->lock);

			bond_miimon_commit(bond);

			read_unlock(&bond->lock);
			rtnl_unlock();	/* might sleep, hold no other locks */
			read_lock(&bond->lock);
		} else {
			read_lock(&bond->lock);
			queue_work(bond->wq, &bond->mii_work);
			read_unlock(&bond->lock);
			return;
		}
	}

re_arm:
	if (bond->params.miimon)
		queue_delayed_work(bond->wq, &bond->mii_work,
				   msecs_to_jiffies(bond->params.miimon));
out:
	read_unlock(&bond->lock);
}

	Which, if I'm not missing something, does not need the
kill_timers (because I believe we can use cancel_delayed_work_sync now
that there's no deadlock against rtnl).

	It might need a "fast reschedule" flag of some sort so that the
gratutious ARPs and NAs aren't bunched up when the trylock fails, but
that's a secondary concern.

	Thoughts?


	Should we end up using this, it is also holding too many locks
over the dev_set_promiscuity call.



	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Tue Aug 31, 10:07 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Tue Aug 31, 1:54 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Wed Sep 1, 6:16 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Wed Sep 1, 10:14 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Wed Sep 1, 11:31 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Wed Sep 1, 1:00 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Wed Sep 1, 1:56 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Wed Sep 1, 5:54 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Thu Sep 2, 10:08 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Wed Sep 8, 5:06 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Jay Vosburgh, (Thu Sep 16, 3:44 pm)
Re: [RFC] bonding: fix workqueue re-arming races, Narendra K, (Fri Sep 24, 4:23 am)
Re: [RFC] bonding: fix workqueue re-arming races, Jiri Bohac, (Fri Oct 1, 11:22 am)
Re: [RFC] bonding: fix workqueue re-arming races, Narendra_K, (Tue Oct 5, 8:03 am)
Re: [RFC] bonding: fix workqueue re-arming races, Narendra_K, (Wed Oct 6, 12:36 am)