> On 3/18/08, Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>
>> You are right Peter, that fs/file.c contains some leftover from previous
>> implementation of defer queue,
>> that was using a timer.
>>
>> So we can probably provide a patch that :
>>
>> - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
>> spin_unlock_bh() in free_fdtable_work()
>> since we dont anymore use a softirq (timer) to reschedule the workqueue.
>>
>> ( this timer was deleted by the following patch :
>>
http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>>
>>
>> But, you cannot avoid use of spin_lock()/spin_unlock() because
>> schedule_work() makes no garantee that the work will be done by this cpu.
>>
>
> Ah.....u have hit the nail....and combine with Johannes Weiner's
> explanation, I have pieced together the full scenario:
>
> First, the following is possible:
>
> fddef = &get_cpu_var(fdtable_defer_list);
> spin_lock(&fddef->lock);
> fdt->next = fddef->next;
> fddef->next = fdt;==============>executing at CPU A
> /* vmallocs are handled from the workqueue context */
> schedule_work(&fddef->wq);
> spin_unlock(&fddef->lock);==============>executing at CPU B
> put_cpu_var(fdtable_defer_list);
>
> where the execution can switch CPU after the schedule_work() API, then
> LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
> really not necessary.
>
> But without the per_cpu structure, then the following "dedicated
> chunk" can only execute on one processor, with the possibility of
> switching to another processor after schedule_work():
>