Re: [origin tree boot failure] Re: [GIT PULL] core block bits for 2.6.37-rc1

Previous thread: 2.6.37-git1 by Anca Emanuel on Friday, October 22, 2010 - 12:35 am. (4 messages)

Next thread: [PATCH] i386: restore parentheses around one pushl_cfi argument by Jan Beulich on Friday, October 22, 2010 - 12:22 am. (2 messages)
From: Jens Axboe
Date: Friday, October 22, 2010 - 12:57 am

Hi Linus,

This first pull request is the core bits, meaning general
block layer changes or core support. Should be clean this time,
only 'weird bit' is the seemingly duplicate entry from Malahal.
This is caused by the first patch being buggy (and later
reverted), second patch used the same single line description.

Nothing really exciting in here. A good collection of fixes, some of
which are marked for stable as well.

The biggest addition this time around is the block IO throttling support
from Vivek.

Please pull.


  git://git.kernel.dk/linux-2.6-block.git for-2.6.37/core

Christof Schmitt (1):
      zfcp: Report scatter gather limit for DIX protection information

Corrado Zoccolo (1):
      cfq: improve fsync performance for small files

Geert Uytterhoeven (1):
      block: Turn bvec_k{un,}map_irq() into static inline functions

Jens Axboe (3):
      core: match_dev_by_uuid() should not be marked __init
      do_mounts: only enable PARTUUID for CONFIG_BLOCK
      block: revert bad fix for memory hotplug causing bounces

Malahal Naineni (2):
      block: set the bounce_pfn to the actual DMA limit rather than to max memory
      block: set the bounce_pfn to the actual DMA limit rather than to max memory

Mark Lord (2):
      block: Prevent hang_check firing during long I/O
      Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK

Martin K. Petersen (5):
      Consolidate min_not_zero
      block/scsi: Provide a limit on the number of integrity segments
      block: Ensure physical block size is unsigned int
      block: Fix double free in blk_integrity_unregister
      block: Make the integrity mapped property a bio flag

Namhyung Kim (2):
      block: fix an address space warning in blk-map.c
      sg: fix a warning in blk_rq_aligned() call

San Mehat (1):
      block: block_dump: Add number of sectors to debug output

Signed-off-by: Jan Kara (1):
      block: Fix race during disk initialization

Vivek Goyal (16):
      blk-cgroup: Kill ...
From: Ingo Molnar
Date: Saturday, October 23, 2010 - 8:29 am

Hi,


The upstream block bits pulled in this merge window (or maybe the workqueue bits) 
are possibly the cause a boot crash on today's -tip, using a trivial x86 bootup test 
(64-bit allyesconfig):

[  116.064281] calling  hd_init+0x0/0x302 @ 1
[  116.068529] hd: no drives specified - use hd=cyl,head,sectors on kernel command line
[  116.076334] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[  116.080274] last sysfs file: 
[  116.080274] CPU 0 
[  116.080274] Modules linked in:
[  116.080274] 
[  116.080274] Pid: 1, comm: swapper Tainted: G        W   2.6.36-tip-03555-g825d9ec-dirty #51843 A8N-E/System Product Name
[  116.080274] RIP: 0010:[<ffffffff81064380>]  [<ffffffff81064380>] __ticket_spin_trylock+0x4/0x21
[  116.080274] RSP: 0018:ffff88003c417c10  EFLAGS: 00010082
[  116.080274] RAX: ffff88003c418000 RBX: 6b6b6b6b6b6b6b6a RCX: 0000000000000000
[  116.080274] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6a
[  116.080274] RBP: ffff88003c417c10 R08: 0000000000000002 R09: 0000000000000001
[  116.080274] R10: 0000000000000286 R11: ffff880032498738 R12: 6b6b6b6b6b6b6b82
[  116.080274] R13: 0000000000000286 R14: 6b6b6b6b6b6b6b6b R15: 0000000000000001
[  116.080274] FS:  0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[  116.080274] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  116.080274] CR2: 0000000000000000 CR3: 0000000004071000 CR4: 00000000000006f0
[  116.080274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  116.080274] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  116.080274] Process swapper (pid: 1, threadinfo ffff88003c416000, task ffff88003c418000)
[  116.080274] Stack:
[  116.080274]  ffff88003c417c30 ffffffff8168c6ee 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6a
[  116.080274] <0> ffff88003c417c70 ffffffff82d37a20 ffffffff810a1b65 ffff88003c418000
[  116.080274] <0> ffffffff82d3836b 6b6b6b6b6b6b6b6a ffff8800330fcc20 ffff88003c417cb8
[  116.080274] Call Trace:
[  ...
From: Linus Torvalds
Date: Saturday, October 23, 2010 - 8:42 am

And we obviously have that "6b" pattern for a use-after free with slab
poisoning. Jens, have you tried with slab debugging?

                    Linus
--

From: Ingo Molnar
Date: Saturday, October 23, 2010 - 8:52 am

Btw., another data point, the crash goes away when the ancient XT-HD driver is 
turned off:

 # CONFIG_BLK_DEV_HD is not set

I'm not sure whether the bug is limited to this driver alone though.

	Ingo
--

From: Jens Axboe
Date: Saturday, October 23, 2010 - 9:51 am

Looks like a fairly straight forward case of uninitialized memory and
blk_sync_queue() -> throtl_shutdown_timer() ->
cancel_delayed_work_sync().

Will get that fixed up.

-- 
Jens Axboe

--

From: Jens Axboe
Date: Saturday, October 23, 2010 - 10:17 am

It frees q->td in blk_cleanup_queue(), but doesn't clear q->td. When the
final put happens, blk_sync_queue() is called and then ends up doing the
cancel_delayed_work_sync() on freed memory.

Two possible fixes:

- Clear ->td when the queue is goin dead. May require other ->td == NULL
  checks in the code, so I opted for:

- Move the free to when the queue is really going away, post doing the
  blk_sync_queue() call.

The below should fix it.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

diff --git a/block/blk-core.c b/block/blk-core.c
index 4514146..51efd83 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -462,8 +462,6 @@ void blk_cleanup_queue(struct request_queue *q)
 	if (q->elevator)
 		elevator_exit(q->elevator);
 
-	blk_throtl_exit(q);
-
 	blk_put_queue(q);
 }
 EXPORT_SYMBOL(blk_cleanup_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index da8a8a4..013457f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -471,6 +471,8 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_sync_queue(q);
 
+	blk_throtl_exit(q);
+
 	if (rl->rq_pool)
 		mempool_destroy(rl->rq_pool);
 


-- 
Jens Axboe

--

From: Ingo Molnar
Date: Saturday, October 23, 2010 - 11:21 am

This did the trick, thanks Jens!

Tested-by: Ingo Molnar <mingo@elte.hu>

	Ingo
--

From: Jens Axboe
Date: Saturday, October 23, 2010 - 11:43 am

Great, thanks for testing/reporting! I added your reported/tested-by.

Linus, please pull this single fix, better get this out the door since
I'll be travelling very shortly.


  git://git.kernel.dk/linux-2.6-block.git for-2.6.37/core

Jens Axboe (1):
      block: fix use-after-free bug in blk throttle code

 block/blk-core.c  |    2 --
 block/blk-sysfs.c |    2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
Jens Axboe

--

From: Maxim Levitsky
Date: Saturday, October 23, 2010 - 1:33 pm

I have here very similar bug.
Must have been caused by this patch series.
I pulled that tree, but that didn't affect anything.

System oopses/panics on removal of any hotplugable device.
(reproduced with xD, MemoryStick, and USB mass storage).

Here is backtrace for MemoryStick card:

<6>[   24.138665] r592: IRQ: card removed
<1>[   24.228293] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f8
<1>[   24.228966] IP: [<00000000000001f8>] 0x1f8
<4>[   24.230739] PGD 0 
<0>[   24.231182] Oops: 0010 [#1] PREEMPT SMP 
<0>[   24.231182] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda3/alignment_offset
<4>[   24.231182] CPU 1 
<4>[   24.231182] Modules linked in: dm_crypt firewire_net usb_storage usb_libusual cpufreq_powersave cpufreq_conservative cpufreq_userspace uvcvideo videodev v4l2_compat_ioctl32 acpi_cpufreq iwl3945 iwlcore snd_hda_codec_realtek mac80211 mperf r852 iTCO_wdt coretemp uhci_hcd sm_common ir_lirc_codec mspro_block snd_hda_intel ms_block ehci_hcd sdhci_pci lirc_dev joydev sbp2 nand snd_hda_codec cfg80211 firewire_ohci sdhci ir_sony_decoder ieee1394 nand_ids usbcore r592 ir_jvc_decoder snd_hwdep mmc_core nand_ecc ir_rc6_decoder ene_ir snd_pcm tg3 ir_rc5_decoder firewire_core mtd battery memstick ac ir_nec_decoder psmouse snd_page_alloc libphy sunrpc ir_core sg evdev serio_raw dm_mirror dm_region_hash dm_log dm_mod nouveau ttm drm_kms_helper drm i2c_algo_bit thermal video
<4>[   32.881606] 
<4>[   32.881606] Pid: 543, comm: kworker/u:4 Not tainted 2.6.36+ #191 Nettiling/Aspire 5720     
<4>[   32.881606] RIP: 0010:[<00000000000001f8>]  [<00000000000001f8>] 0x1f8
<4>[   32.881606] RSP: 0018:ffff880037a03ab8  EFLAGS: 00010086
<4>[   32.881606] RAX: ffff88007c0ebc00 RBX: ffff880037af9470 RCX: 0000000000000000
<4>[   32.881606] RDX: 0000000000000019 RSI: 0000000000000001 RDI: ffff880037af9470
<4>[   32.881606] RBP: ffff880037a03ad0 R08: 0000000000000000 R09: 0000000000000001
<4>[   32.881606] R10: ...
From: Vivek Goyal
Date: Saturday, October 23, 2010 - 11:15 pm

Looking at the backtrace and commit messages, it might be coming from
following commit.

commit 7681bfeeccff5efa9eb29bf09249a3c400b15327
Author: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Date:   Tue Oct 19 09:05:00 2010 +0200

    block: fix accounting bug on cross partition merges


Looks like we have freed the request queue in mspro_block_remove() and
then we are calling mspro_block_disk_release() which ends up accessing
request queue in disk_replace_part_tbl(). So use-after-free case.  
 
CCing Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>.

Thanks
Vivek
--

From: Vivek Goyal
Date: Saturday, October 23, 2010 - 10:48 pm

Thanks for the fix Jens. I had done testing with pulling out a usb key
from a running system to check for hot remove/ blk_cleanup_queue() path
and not sure why didn't I catch it.

I have got one little concern here. blk_throtl_exit() takes requeust queue
spin locks and relies on the fact that q->queue_lock is still around.

IIUC, in blk_release_queue(), there is no gurantee that driver has not
freed up the memory associated with spin lock (If it is a driver provided
spin lock).

Checking for q->td in throtl_shutdown_timer_wq(), might be a fix
but it has the potential to be racy as throtl_shutdown_timer_wq() does
not take spin lock and I guess it can't take spin lock to check for
q->td, as it is called in blk_release_queue-> blk_sync_queue path and
it is not guranteed if spin lock is still around.

So may be we need to come up with a method to make sure driver does not
release queue lock until all the users of queue are gone and one can safely
assume q->queue_lock is valid in blk_release_queue().

Or may be make q->td rcu protected. It is already spin lock protected and
it kind of will become messy to access it under rcu lock in
throtl_shutdown_timer_wq(), and under q->queue_lock in rest of the places.
Ofcourse freeing of q->td will be after waiting through call_rcu().

Thanks
Vivek
--

Previous thread: 2.6.37-git1 by Anca Emanuel on Friday, October 22, 2010 - 12:35 am. (4 messages)

Next thread: [PATCH] i386: restore parentheses around one pushl_cfi argument by Jan Beulich on Friday, October 22, 2010 - 12:22 am. (2 messages)