Temporarily at http://userweb.kernel.org/~akpm/2.6.20-rc6-mm3/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm... - Restored git-block.patch: mainly the block unplugging rework. The problematic CFQ updates have been taken out. - Restored the fsaio patches as a consequence. - A huge ACPI update. - A decent number of x86 patches have been temporarily dropped due to their clash against the ACPI update. - A few problems reported against 2.6.20-rc6-mm2 have been fixed. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Semi-daily snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits lis...
This patch makes the needlessly global gfs2_writepages() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
--- linux-2.6.20-rc6-mm3/fs/gfs2/ops_address.c.old 2007-02-06 08:30:19.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/gfs2/ops_address.c 2007-02-06 08:30:32.000000000 +0100
@@ -170,7 +170,8 @@
* and write whole extents at once. This is a big reduction in the
* number of I/O requests we send and the bmap calls we make in this case.
*/
-int gfs2_writepages(struct address_space *mapping, struct writeback_control *wbc)
+static int gfs2_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
-Hi, Now applied to the GFS2 -nmw git tree. Thanks, Steve. -
This patch contains the following cleanups:
- proper prototypes for global code in aacraid.h
- aac_rx_start_adapter() can now become static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
drivers/scsi/aacraid/aacraid.h | 3 +++
drivers/scsi/aacraid/linit.c | 2 --
drivers/scsi/aacraid/nark.c | 3 ---
drivers/scsi/aacraid/rkt.c | 3 ---
drivers/scsi/aacraid/rx.c | 2 +-
5 files changed, 4 insertions(+), 9 deletions(-)
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/aacraid.h.old 2007-02-06 08:22:50.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/aacraid.h 2007-02-06 08:27:17.000000000 +0100
@@ -1840,8 +1840,11 @@
int aac_get_adapter_info(struct aac_dev* dev);
int aac_send_shutdown(struct aac_dev *dev);
int aac_probe_container(struct aac_dev *dev, int cid);
+int _aac_rx_init(struct aac_dev *dev);
+int aac_rx_select_comm(struct aac_dev *dev, int comm);
extern int numacb;
extern int acbsize;
extern char aac_driver_version[];
extern int startup_timeout;
extern int aif_timeout;
+extern int expose_physicals;
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/rx.c.old 2007-02-06 08:21:40.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/rx.c 2007-02-06 08:21:50.000000000 +0100
@@ -294,7 +294,7 @@
* Start up processing on an i960 based AAC adapter
*/
-void aac_rx_start_adapter(struct aac_dev *dev)
+static void aac_rx_start_adapter(struct aac_dev *dev)
{
struct aac_init *init;
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/linit.c.old 2007-02-06 08:23:20.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/linit.c 2007-02-06 08:23:26.000000000 +0100
@@ -82,8 +82,6 @@
static int aac_cfg_major = -1;
char aac_driver_version[] = AAC_DRIVER_FULL_VERSION;
-extern int expose_physicals;
-
/*
* Because of the way Linux names scsi devices, the order in this table has
* become important. Check for on-board Raid first, add-in cards second.
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/n...This patch contains the following possible cleanups:
- move extern declarations to atl1.h
- make needlessly global code static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
BTW: Can we get a MAINTAINERS entry for this driver?
drivers/net/atl1/atl1.h | 6 ++++--
drivers/net/atl1/atl1_ethtool.c | 3 ---
drivers/net/atl1/atl1_hw.c | 6 ++----
drivers/net/atl1/atl1_main.c | 8 +++-----
drivers/net/atl1/atl1_param.c | 4 +---
5 files changed, 10 insertions(+), 17 deletions(-)
--- linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1.h.old 2007-02-06 07:55:58.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1.h 2007-02-06 08:19:50.000000000 +0100
@@ -34,8 +34,10 @@
s32 atl1_up(struct atl1_adapter *adapter);
void atl1_down(struct atl1_adapter *adapter);
int atl1_reset(struct atl1_adapter *adapter);
-s32 atl1_setup_ring_resources(struct atl1_adapter *adapter);
-void atl1_free_ring_resources(struct atl1_adapter *adapter);
+
+extern char atl1_driver_name[];
+extern char atl1_driver_version[];
+extern const struct ethtool_ops atl1_ethtool_ops;
struct atl1_adapter;
--- linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1_hw.c.old 2007-02-06 07:52:20.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1_hw.c 2007-02-06 07:56:22.000000000 +0100
@@ -31,8 +31,6 @@
#include "atl1.h"
-extern char atl1_driver_name[];
-
/**
* Reset the transmit and receive units; mask and clear all interrupts.
* hw - Struct containing variables accessed by shared code
@@ -209,7 +207,7 @@
* get_permanent_address
* return 0 if get valid mac address,
**/
-int atl1_get_permanent_address(struct atl1_hw *hw)
+static int atl1_get_permanent_address(struct atl1_hw *hw)
{
u32 addr[2];
u32 i, control;
@@ -602,7 +600,7 @@
return ret_val;
}
-struct atl1_spi_flash_dev flash_table[] = {
+static struct atl1_spi_flash_dev flash_table[] = {
/* MFR_NAME WRSR READ PRGM WREN WRDI RDSR RDID SECTOR_ERASE CHIP_ERASE */
...On Tue, 6 Feb 2007 23:12:29 +0100 Adrian, The atl1 driver currently follows this development pathway: developer -> netdev#atl1 -> netdev#ALL -> -mm Your patch is just a little bit out ahead of us. Some of your suggested changes are already in the pipeline; we're just waiting for Jeff to Already submitted to netdev#atl1. netdev#atl1 already has this change. The rest of these I'll bundle up and submit to netdev#atl1, too. Will -
Do what you consider the right thing - I don't care how it gets into the
various trees.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-Technical note: merging #atl1 into #ALL happens each time netdev-2.6.git is flushed out from my local machine. Jeff -
Noted. Thanks. -
acpi_os_readable() is no longer used.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
---
drivers/acpi/osl.c | 2 --
include/acpi/acpiosxf.h | 3 +--
2 files changed, 1 insertion(+), 4 deletions(-)
--- linux-2.6.20-rc6-mm3/include/acpi/acpiosxf.h.old 2007-02-06 06:57:15.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/acpi/acpiosxf.h 2007-02-06 06:57:53.000000000 +0100
@@ -240,9 +240,8 @@
acpi_os_validate_address(u8 space_id,
acpi_physical_address address, acpi_size length);
-u8 acpi_os_readable(void *pointer, acpi_size length);
-
#ifdef ACPI_FUTURE_USAGE
+u8 acpi_os_readable(void *pointer, acpi_size length);
u8 acpi_os_writable(void *pointer, acpi_size length);
#endif
--- linux-2.6.20-rc6-mm3/drivers/acpi/osl.c.old 2007-02-06 07:18:33.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/acpi/osl.c 2007-02-06 07:18:54.000000000 +0100
@@ -888,7 +888,6 @@
return 0;
}
-#endif /* ACPI_FUTURE_USAGE */
/* Assumes no unreadable holes inbetween */
u8 acpi_os_readable(void *ptr, acpi_size len)
@@ -901,7 +900,6 @@
return 1;
}
-#ifdef ACPI_FUTURE_USAGE
u8 acpi_os_writable(void *ptr, acpi_size len)
{
/* could do dummy write (racy) or a kernel page table lookup.
-It appears there a problem with the /proc/interrupts entry for "timer" .. It doesn't increment anymore .. This problem exists in the -rt tree also .. I haven't done a bisect , but I'm assuming this is HRT related .. Also my NMI watchdog isn't functioning , which also exists in the -rt tree, and -mm .. Also likely HRT related .. I don't have HRT or dynamic tick turned on .. This started happening in -mm2 , and it worked in -mm1 .. Daniel -
And why should it increment ? Is there a rule that it has to ? tglx -
I don't know .. I would imagine some users might look at it and wonder why there timer isn't ticking (I know it actually is ticking , but they don't), when it has is every other kernel. We could just remove the timer entry . Daniel -
No we can't. The timer interrupt is setup and it does not go away, as we keep the PIT as a backup for the broken lapics. tglx -
I'm not trying to create anything .. However, as I said before the /proc/interrupts "timer" entry doesn't work the same as it has in other kernels. Ok, how about adding the interrupts to the list which are driving the timer ? Daniel -
Yes, it is different. Why are you insisting, that something is a problem Simply because it works and it does not make any sense to have a per cpu timer (lapic) and the PIT firing at the same periodic interval. PIT does nothing else than jiffies64++. The clockevents code just optimizes that away and lets one cpu do the jiffies64++ in its periodic per cpu interrupt. Uurg. /proc/interrupts has nothing to do with timers. It's interrupts statistics. See LOC entry for the lapic ones. tglx -
In this case "different" goes into userspace .. So different could mean userspace regression, which is something that we don't want. I have no idea if any apps use /proc/interrupts , but it's possible since it's been around for a long time. The reason that I'm bringing it up at all is because people have ask me Your saying we can't remove it tho, if /proc/interrupts is not related to timers why does the entry exist at all ? Your saying the LOC entry is the new "timer" entry, but we still have the old "timer" entry .. Getting confusing .. It might be nicer to list all the registered clock event sources in /proc/interrupts, with more descriptive names .. Why is it that HRT doesn't use the "timer" as a valid timer? Daniel -
Because there are two clock sources in the machine and it's using the other one, so the interrupt isn't firing? Are you saying that the /proc statistics aren't accurate, or that you previously misunderstood what it was actually measuring and you'd now like it I didn't think Thomas even touched the /proc/interrupts reporting code. It's still accurate. The patch changed the usage of timers, /proc/interrupts is accurately showing the change, and you're surprised that what it was measuring wasn't what you thought it was measuring all along. This ain't jiffies. This is how often the PIT fired. They are not the same thing. Rob -- "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exupery -
I understand exactly what is happening . The statistics are unclear, and tend to confuse people . Daniel -
Ah, you can't answer this question. Right: A) Because there are multiple timer interrupt sources in the system, and we're now using a newer (better) one. B) This measures interrupts. It doesn't measure jiffies. Interrupts != jiffies. This is a conceptual issue. Rob -- "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exupery -
It _IS_ statistics info about the number of interrupts and has no fixed meaning at all. It does not cause any user space regression, as the interface is still the same. It produces different numbers, like the clock_getres() syscall returns different values on highres and !highres So it's a problem of user perception and not of a user space regression. Ok. Each irqaction struct which is used to request/setup an interrupt contains a name field. This is the one which shows up in /proc/interrupts. The one which is used to setup irq0 has .name = No. No. No. clockevents has nothing to do with /proc/interrupts. Because local apic timer is better. tglx -
At least we agree on this point .. I'm not trying to confuse anything, this issue needs to be discussed .. Daniel -
Well, if you enable dynticks you should expect the number of timer irqs it's quite easy to explain: because of the new dynticks feature. Both they are already listed in /proc/interrupts, depending on how they use interrupts. For a more complete list of in-use clockevent drivers see /proc/timer_info. But it would be wrong to touch /proc/interrupts to create some special-case for clockevents. Ingo -
I don't have that enabled tho .. This is with HRT/dynamic tick both off.. Daniel -
your kernel utilizes the kernelin a more optimal way: the new clockevents code now utilizes the local APIC timer irq (represented by the LOC field) for periodic interrupts. The local APIC timer irq has a cost of ~2 usecs per IRQ, while the PIT irq is ~10 usecs per irq. With HZ=1000 this means savings of ~8000 usecs per second - i.e. 8 msecs per second, which is 0.8% more raw CPU power available - which isnt that bad. we could make this clearer by renaming 'LOC' (which stands for 'LOCal timer interupts' and was added [and misnamed] by yours truly many moons ago) to 'apic-timer' and 'timer' to 'PIT-timer' but /that/ would be more of a userspace visible change than the change in the counter rates. Ingo -
If we change the current "timer" entry to be listed as "lapic-timer" and not "IO-APIC-edge" (or one of the other names) and replace it with the count from LOC , that would make sense cause that field already changes depending if you have a io-apic or not .. I think the regression (if you can call it that) is not scripts crashing, but more people not know what's going on with there system .. Daniel -
No. We are not fiddling with the IRQ subsystem statistics. IRQ subsystem is unrelated to timers. And we do switch away from PIT if we have an local apic timer, so the output of /proc/interrupt is just a mirror of the real system and not some made up thing, which will make it harder to I did not hear a complaint of anyone except you. I doubt that there will be a big confusion as long as the kernel does work as expected. tglx -
This is going to be a slow motion explosion .. Daniel -
doing that would not fake the old behavior (which is your suggestion), LOC is per CPU, while the PIT timer irq that was there is global. But, as per the previous mails, the new behavior is just fine, because /proc/interrupts just reflects reality. And the way the kernel utilizes the hardware has just changed - for the better. The same happens when say a network driver implements NAPI: the IRQ count goes way, way down. Or if a driver starts supporing MSI - the IRQ line even moves to another one. Do we try to fix those counts up to match the 'previous behavior'? Of course not. What you are suggesting makes no sense, is against current kernel practices - as we pointed it (that is something else: it's different because a different irq-chip is behind it.) Ingo -
I'm not saying we should "fake" anything .. I'm saying list what's really happening .. In a human readable way . Your saying we should keep it unreadable, and let the users be that much Why is that not the case with lapic ? Daniel -
We do that. IRQ0 is not happening. So simply it does not increment the It is readable, as it reflects the reality which is going on in the system and not some artificial view which you think is how the interrupt count should be presented. /proc/interrupt _IS_ statistics about the Local APIC is not really part of the interrupt subsystem as it uses a seperate entry vector for historic reasons and therefor is not handled by setup/request/free/... _irq() functions. tglx -
"replace the timer entry with lapic-timer and put the LOC count there"
is faking something that does not reflect reality. The 'timer' count is
we list precisely what is happening: the number of IRQ#0 interrupts and
the number of local APIC timer interrupts. Precisely where their
traditional place is.
i think you might be confused by the generic name that says 'timer'. You
should notice the other bits that are there too:
CPU0 CPU1
0: 495 0 IO-APIC-edge timer
the '0' means IRQ#0. That makes it clear that this is the PIT timer.
Clearer now?
Ingo
-I'm not trying to suggest we "fake" anything. Your just misunderstanding me.. I'm am suggesting we change LOC to something readable. If you think we're "faking" something by dropping the current "timer" request_irq() then we certainly don't need to do that .. The io-apic timer could potentially be a clock event device, that is it's function isn't it ? It generates interrupts (note I said interrupts) periodically .. The NMI is another example of that, generates non-maskable interrupts based off a clock.. All are clock based interrupt generating devices .. All could be clock event sources, with all the other clock event sources in the system. It makes sense (to me at least) that we should list all those interrupt generating devices in /proc/interrupts with statistics of their usage.. I'm making suggestion here, you can call it "fake"'ing something or Empirically, I know that users do not/will not understand what's happened. so take that how ever you want, but _I_ think we should do something so people better understand what has happened. Daniel -
as i pointed it out in the previous mail, the problem is that what you changing the current 'timer' entry (which is line 2 of /proc/interrupts) to be 'listed as lapic-timer' and to 'replace it with the count from LOC' is faking a count in a line where nothing like that should be. the kernel simply displays reality: IRQ#0 isnt increasing because it's not used, and LOC (local apic timers) is increasing. Ingo -
What about the statistics for the other interrupts in the system ? It clearly doesn't list all interrupts in the system . Daniel -
it is very much relevant: faking a count is something we /dont/ want to do with /proc/interrupts, for (very) basic compatibility, simplicity and policy reasons. And that is precisely what your suggestion was to what is your point? Ingo -
As I said you are misunderstanding me .. which is why this is not relevant any more .. Isn't the listing inconsistent ? /proc/interrupts only showing some special interrupts, and not others .. For example it shows NMI which is not related to request_irq() .. It shows some clock driver devices (timer, NMI, LOC) and not others (clock event devices) .. Daniel -
it's not inconsistent. /proc/interrupts lists registered interrupts plus some special hardcoded platform interrupts that are not explicitly registered - with the goal of providing a list of all active interrupt sources. /proc/interrupts has been doing that for more than 10 years. Clock event devices themselves are not 'interrupt lines', why should they be listed in /proc/interrupts? Ingo -
It shows _ALL_ used interrupts in the system. There is no point to let PIT is a clock event device and uses IRQ0, where the interrupt count is displayed: 0: 3022812 0 IO-APIC-edge timer Local APIC timer is a clock event device too and the interrupt count is displayed as well: LOC: 177795 1755941 There are no other clock event devices in a PC system at the moment and /proc/interrupt does not care, whether the interrupt was setup for a clock event device or something else. It displays the name which is given in the irqaction struct and does not care what it means. I did not change the name in the IRQ#0 setup, so it still displays "timer" (which can either be PIT or HPET), but this is something the interrupt layer does not know and does not care about. The special interrupts, which are not handled by the generic IRQ layer (LOC, NMI) are displayed to have the complete statistics available. We did not change anything on that. The changed behavior you are observing (IRQ#0 is not incrementing) is reflecting the reality of the system. IRQ#0 is not firing, so it does not increment. tglx -
So your saying the "timer" entry in /proc/interrupts can be either the HPET timer, the PIT timer? Mine says "IO-APIC-edge" which does that map to? It's going though the io-apic but it's still the pit ? Daniel -
IO-APIC: Input/Output Advanced Programmable Interrupt Controller. This device does not generate interrupts by itself. Devices, which generate interrupts are connected to it. 23: 82 0 IO-APIC-fasteoi ohci1394, HDA Intel This is IRQ#23 coming in via IO-APIC (fasteoi type). The interrupt is shared by two devices, which identified themself as "ohci1394" and "HDA Intel" via request_irq(). The interrupt originates from one of those devices. So it _IS_ going through the IO-APIC, but generated either by the Firewire device or the Audio device. 0: 186222 0 IO-APIC-edge timer This is IRQ#0 coming in via IO-APIC (edge type). The interrupt is not shared. The device identified itself as "timer" via setup_irq(). The interrupt originates from this device. The interrupt is either caused by PIT or HPET via a hardware switch mechanism, which is activated when you use HPET. There is no way to share IRQ#0 here. It's either or as defined by hardware magic. tglx -
actually, i quoted what you said: | If we change the current "timer" entry to be listed as "lapic-timer" | and not "IO-APIC-edge" (or one of the other names) and replace it with | the count from LOC this is a pretty clear sentence, i dont think i misunderstood anything about it. If i did, please point it out specifically. Ingo -
Geez , man I've corrected this statement already .. Why don't you quote the corrections. Your not listening cause your ignoring everything I said after this, and accepting only my first statement and rejecting everything else.. Like you want this to descend into a melee . Last and final correction. I'm saying drop the timer entry, which means drop the call to request_irq() for irq0 . Add lines for lapic-timer which take the place of LOC.. Daniel -
i'm sorry, but where did you "correct this statement already"? You havent replied to your mail to correct it explicitly, and there's no later statement of yours that says anything near to "let me correct this via X" or "i was wrong here, i meant Y". the only subsequent reference of yours seems to be: | I'm not saying we should "fake" anything .. I'm saying list what's | really happening .. In a human readable way . what you write here does not read as a 'correction', this disputes my characterisation, suggesting that your original point is still intact. How should i have known that you meant this to be a 'correction' of your original point, and that this (whatever it means precisely) replaces it? if you concede a point or correct a statement then /please/ make it clear. There's nothing bad about being wrong or being stupid it's not a request_irq() but a setup_irq(). dropping the IRQ#0 line would be fatally wrong: /proc/interrupt lists all active interrupt lines. There can (and often is) a count in IRQ#0. Why should it be hidden? furthermore, as i pointed it out earlier: what you suggest is bad for compatibility: removing/changing the non-count portions of the LOC or the IRQ#0 entry /will/ break scripts. Ingo -
I guess I will respond .... I know that you see corrections as responses to your own email, but it's not universal .. Everyone has their own methodology, AFAIK it's free I don't take a literal approach to email, which you seem to be taking .. I think your seeing this thread as an argument for or against something and you have taken a position which you diligently stick to.. My position is not fixed. However, your arguing as if my position was fixed. My perspective of this thread was not to argue for a specific change, but to throw out changes and see if anything stuck .. Where my statements were suppose to be loose to begin with, so loose as to only spark the start of an idea, not to promote something specific. If you read the start of the thread you'll notice that I gave Thomas two totally opposite ideas. "We could just remove the timer entry." or "[..]how about adding the interrupts to the list which are driving the timer ?" When I started the thread I had a similar position as Thomas, but I was concerned that I was missing something or the code was missing something.. This was the reason for starting the thread .. So I'll gladly concede all points. To me it wasn't about the argument, or even my own ideas .. Daniel -
Right, that's a real good suggestion. Here's the patch especially for
you. Apply it and figure out yourself, why your computer won't boot
anymore.
tglx
Index: linux-2.6.20/arch/i386/mach-default/setup.c
===================================================================
--- linux-2.6.20.orig/arch/i386/mach-default/setup.c
+++ linux-2.6.20/arch/i386/mach-default/setup.c
@@ -95,8 +95,10 @@ static struct irqaction irq0 = {
**/
void __init time_init_hook(void)
{
+#ifdef CONFIG_THIS_IS_NOT_DWALKERS_COMPUTER
irq0.mask = cpumask_of_cpu(0);
setup_irq(0, &irq0);
+#endif
}
#ifdef CONFIG_MCA
-Ingo -
I was running likely profiling and I noticed that when I turn on the "tickless" option I get the following line, +unlikely | 21111208| 9367278 need_resched()@:include/linux/sched.h@1597 This means that this line is 21111208 true, and 9367278 false. This existed on bootup, and stayed after about 7 hours of runtime (mostly idle). Since need_resched is a "static inline" there are multiple instances of need_resched() in the kernel. If I turn off the "tickless" feature this wrong unlikely disappears, and the output looks like this, unlikely | 0| 169 need_resched()@:include/linux/sched.h@1597 unlikely | 0| 94 need_resched()@:include/linux/sched.h@1597 unlikely | 0| 63 need_resched()@:include/linux/sched.h@1597 unlikely | 0| 19 need_resched()@:include/linux/sched.h@1597 unlikely | 0| 379 need_resched()@:include/linux/sched.h@1597 unlikely | 1| 202596 need_resched()@:include/linux/sched.h@1597 unlikely | 7| 205929 need_resched()@:include/linux/sched.h@1597 unlikely | 6461| 271690 need_resched()@:include/linux/sched.h@1597 Only a little after boot. I suppose this could be a natural side effect of the tickless feature but I thought I would report it anyway. Daniel -
Hello ! changes in git-acpi.patch in 2.6.20-rc6-mm3 (and maybe before) broke the Summit sub-arch (IBM x440) compile :( thanks, C. CC arch/i386/kernel/cpu/intel.o CC arch/i386/kernel/early_printk.o arch/i386/kernel/srat.c: In function 'parse_cpu_affinity_structure': arch/i386/kernel/srat.c:68: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:72: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:72: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:74: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:74: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:77: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:77: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c: In function 'parse_memory_affinity_structure': arch/i386/kernel/srat.c:93: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:97: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:97: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:100: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:101: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:102: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:103: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:108: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:134: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:135: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c:136: error: dereferencing pointer to incomplete type arch/i386/kernel/srat.c: In function 'acpi20_parse_srat': arch/i386/kernel/srat.c:188: error: 'ACPI_SRAT_PROCESSOR_AFFINITY' undeclared (first use in this function) arch/i386/kernel/srat.c:188: error: (Each undeclared identifier is reported only once arch/i386/kernel/srat.c:18...
Sorry, here is the patch... ACPI has switched to acpi_find_rsdp(), so srat.c might want to do that too, please check. Thanks,
got it. running a compile and boot test. I should have the results in 'my' morning (UTC+1). Thanks ! C. -
hmm, i got another issue while compiling : CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 arch/i386/kernel/built-in.o: In function `get_memcfg_from_srat': /home/legoater/linux/2.6.20-rc6-mm3/arch/i386/kernel/srat.c:279: undefined reference to `acpi_find_root_pointer' I'll catchup in the morning. thanks, C. -
Hi, I updated patch to use acpi_find_rsdp(), as all other code does. Could you please try it? Thanks,
Hello ! so it probably means that drivers/acpi/tables/tbxfroot.c is sure, I'll cancel the current boot test in which I was using acpi_find_root_pointer() in tbxfroot.c and restart one with your Thanks ! C. -
How long does it take to boot this thing? Regards, Alex. -
well, not that long, but i don't have access directly to this machine, only through a test batch manager ... C. -
dmesg looks fine. However, there is a : ACPI Warning (tbfadt-0415): Optional field "Gpe1Block" has zero address or length: 0000000000000000/4 [20070126] but I don't know how to interpret this ? Any Idea ? thanks, C. Linux version 2.6.20-rc6-mm3-lxc2-autokern1 (root@fpos1) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP Fri Feb 2 20:38:46 UTC 2007 BIOS-provided physical RAM map: sanitize start sanitize end copy_e820_map() start: 0000000000000000 size: 000000000009dc00 end: 000000000009dc00 type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 000000000009dc00 size: 0000000000002400 end: 00000000000a0000 type: 2 copy_e820_map() start: 00000000000e0000 size: 0000000000020000 end: 0000000000100000 type: 2 copy_e820_map() start: 0000000000100000 size: 00000000dfea25c0 end: 00000000dffa25c0 type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 00000000dffa25c0 size: 0000000000009c80 end: 00000000dffac240 type: 3 copy_e820_map() start: 00000000dffac240 size: 0000000000053dc0 end: 00000000e0000000 type: 2 copy_e820_map() start: 00000000fec00000 size: 0000000001400000 end: 0000000100000000 type: 2 copy_e820_map() start: 0000000100000000 size: 0000000120000000 end: 0000000220000000 type: 1 copy_e820_map() type is E820_RAM BIOS-e820: 0000000000000000 - 000000000009dc00 (usable) BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dffa25c0 (usable) BIOS-e820: 00000000dffa25c0 - 00000000dffac240 (ACPI data) BIOS-e820: 00000000dffac240 - 00000000e0000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000220000000 (usable) Node: 0, start_pfn: 0, end_pfn: 157 Node: 0, start_pfn: 256, end_pfn: 917410 Node: 0, start_pfn: 1048576, end_pfn: 2228224 get_memcfg_from_srat: assigning address to rsdp RSD PTR v0 [IBM ] Begin SRAT table scan.... CPU 0x00 in proximity domain 0x00 CPU 0x02 in proximity d...
This warning should probably be disabled, to not confuse users... Spec says that some registers are optional, and ACPICA used to keep silence then it encountered one, but now it produces this meaningless warning... Ignore it... Regards, ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ -
MD hung again as before so I compiled a kernel
without it. Next XFS started hanging during bootup.
Some traces of processes hung but I do not have a clue as to what is
wrong here...:
Call Trace:
[<a00000010074c1b0>] schedule+0x1bf0/0x1ec0
sp=e00000301560fac0 bsp=e000003015608fc8
[<a0000001003ba350>] xfs_buf_iorequest+0x130/0x820
sp=e00000301560fbd0 bsp=e000003015608f58
[<a0000001003c5b00>] xfs_bdstrat_cb+0x60/0x100
sp=e00000301560fc00 bsp=e000003015608f38
[<a0000001003b2ba0>] xfs_bwrite+0xe0/0x1e0
sp=e00000301560fc00 bsp=e000003015608f00
[<a0000001003a3980>] xfs_syncsub+0x2c0/0x520
sp=e00000301560fc00 bsp=e000003015608eb0
[<a0000001003a3d30>] xfs_sync+0x70/0xa0
sp=e00000301560fc00 bsp=e000003015608e88
[<a0000001003cb400>] vfs_sync+0xa0/0xc0
sp=e00000301560fc00 bsp=e000003015608e58
[<a0000001003c8910>] xfs_fs_write_super+0x70/0xa0
sp=e00000301560fc00 bsp=e000003015608e38
[<a00000010016d490>] sync_supers+0x150/0x260
sp=e00000301560fc00 bsp=e000003015608e08
[<a000000100115820>] wb_kupdate+0x60/0x280
sp=e00000301560fc00 bsp=e000003015608dc8
[<a000000100116570>] pdflush+0x330/0x4e0
sp=e00000301560fc50 bsp=e000003015608d90
[<a0000001000d2ac0>] kthread+0x220/0x2a0
sp=e00000301560fd50 bsp=e000003015608d48
[<a000000100010a50>] kernel_thread_helper+0xd0/0x100
sp=e00000301560fe30 bsp=e000003015608d20
[<a000000100009140>] start_kernel_thread+0x20/0x40
sp=e00000301560fe30 bsp=e000003015608d20
...On Wed, 31 Jan 2007 16:14:10 -0800 (PST) ow. Please don't make me drop git-block-and-lots-of-other-things again. -
Yes, 2.6.20-rc6-mm2 was okay. Sorry. -
On Wed, 31 Jan 2007 16:27:16 -0800 (PST) OK, thanks. Actually, we might not have lost an IO: it could be that we're simply missing an unplug. Are you able to unblock things by forcing some other IO against that queue? Say, do a read from /dev/sda? -
Could be - I can't be certain but I think we've got one thread waiting for a buffer to be unpinned before it is written, and the other thread waiting for log I/O to complete. The first thread won't unplug the device, and the log I/o is async so it won't either. What are the new unplugging rules introduced by the git-block patch? How do they differ from the existing rules? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -
Pretty simple: you read the largely-useless changelog then call the bravely uncommented blk_plug_current() when you're about to submit some IO and you call the audaciously uncommented blk_unplug_current() when you've finished and you're ready to let it rip. But usually none of that is necessary, because io_schedule() does all the work for you. err, this might help. --- a/fs/xfs/linux-2.6/xfs_buf.c~git-block-xfs-fix +++ a/fs/xfs/linux-2.6/xfs_buf.c @@ -979,7 +979,7 @@ xfs_buf_wait_unpin( set_current_state(TASK_UNINTERRUPTIBLE); if (atomic_read(&bp->b_pin_count) == 0) break; - schedule(); + io_schedule(); } remove_wait_queue(&bp->b_waiters, &wait); set_current_state(TASK_RUNNING); _ -
Well okay boot progresses further (maybe only on this boot) but system is
still hung.
Traces (this was a backtrace of all processes on the system. I removed
the irrelevant ones):
Delaying for 5 seconds...
All OS INIT slaves have reached rendezvous
Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100b24000) 0 (cpu 1
task 0xe00000b003bd8000) 0 (cpu 2 task 0xe000023c38248000) 0 (cpu 3 task
0xe00000b003d00000) 0 (cpu 4 task 0xe000023c38258000) 0 (cpu 5 task
0xe00000b003d10000) 0 (cpu 6 task 0xe000023c38268000) 0 (cpu 7 task
0xe00000b003d20000) 0 (cpu 8 task 0xe000023c382e8000) 0 (cpu 9 task
0xe00000b003d30000) 0 (cpu 10 task 0xe000023c38380000) 0 (cpu 11 task
0xe00000b003d40000)
Backtrace of pid 1 (init)
Call Trace:
[<a000000100799bb0>] schedule+0x1bf0/0x1ec0
sp=e00000307bd178c0 bsp=e00000307bd11828
[<a000000100797b20>] __down+0x240/0x280
sp=e00000307bd179d0 bsp=e00000307bd117e8
[<a0000001003b8f10>] xfs_buf_iowait+0x50/0x80
sp=e00000307bd17a00 bsp=e00000307bd117c8
[<a0000001003bb120>] xfs_buf_iostart+0x1a0/0x1c0
sp=e00000307bd17a00 bsp=e00000307bd117a0
[<a0000001003bc140>] xfs_buf_read_flags+0xe0/0x160
sp=e00000307bd17a00 bsp=e00000307bd11768
[<a00000010039e490>] xfs_trans_read_buf+0x50/0x6a0
sp=e00000307bd17a00 bsp=e00000307bd11710
[<a0000001003420a0>] xfs_btree_read_bufl+0xe0/0x120
sp=e00000307bd17a00 bsp=e00000307bd116d0
[<a000000100334de0>] xfs_bmap_read_extents+0x2c0/0x7e0
sp=e00000307bd17a10 bsp=e00000307bd11658
[<a000000100376820>] xfs_iread_extents+0x160/0x1c0
sp=e00000307bd17a20 bsp=e00000307bd11618
[<a000000100330b90>] xfs_bmapi+0x430/0x33a0
sp=e00000307bd17a20 bsp=...That down() probably wants a replug to precede it. Probably something
like:
if (atomic_read(&bp->b_io_remaining))
blk_replug_current_nested();
for xfs_buf_wait_unpin() and xfs_buf_lock(). Does this fix it?
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index f2bdf8b..1ef226e 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -909,6 +909,8 @@ xfs_buf_lock(
xfs_buf_t *bp)
{
XB_TRACE(bp, "lock", 0);
+ if (atomic_read(&bp->b_io_remaining))
+ blk_replug_current_nested();
down(&bp->b_sema);
XB_SET_OWNER(bp);
XB_TRACE(bp, "locked", 0);
@@ -979,7 +981,7 @@ xfs_buf_wait_unpin(
set_current_state(TASK_UNINTERRUPTIBLE);
if (atomic_read(&bp->b_pin_count) == 0)
break;
- schedule();
+ io_schedule();
}
remove_wait_queue(&bp->b_waiters, &wait);
set_current_state(TASK_RUNNING);
@@ -1291,6 +1293,8 @@ xfs_buf_iowait(
xfs_buf_t *bp)
{
XB_TRACE(bp, "iowait", 0);
+ if (atomic_read(&bp->b_io_remaining))
+ blk_replug_current_nested();
down(&bp->b_iodonesema);
XB_TRACE(bp, "iowaited", (long)bp->b_error);
return bp->b_error;
@@ -1682,6 +1686,7 @@ xfsbufd(
xfs_buf_t *bp, *n;
struct list_head *dwq = &target->bt_delwrite_queue;
spinlock_t *dwlk = &target->bt_delwrite_lock;
+ int count;
current->flags |= PF_MEMALLOC;
@@ -1697,6 +1702,7 @@ xfsbufd(
schedule_timeout_interruptible(
xfs_buf_timer_centisecs * msecs_to_jiffies(10));
+ count = 0;
age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
spin_lock(dwlk);
list_for_each_entry_safe(bp, n, dwq, b_list) {
@@ -1716,6 +1722,7 @@ xfsbufd(
_XBF_RUN_QUEUES);
bp->b_flags |= XBF_WRITE;
list_move_tail(&bp->b_list, &tmp);
+ count++;
}
}
spin_unlock(dwlk);
@@ -1730,6 +1737,8 @@ xfsbufd(
if (as_list_len > 0)
purge_addresses();
+ if (count)
+ blk_replug_c...Jens, this patch looks like you originally removed the explicit unplug calls that XFS used to prevent metadata I/O hangs and now you are putting them back. Correct? Reading on from Andrew's earlier comments, shouldn't XFS have worked unchanged? I'm just trying to understand why you removed the explicit unplugs in the first place..... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -
It should, the problem is if someone has plugged higher up in the hierarchy, then you do need the explicit replug to drain that before going to sleep and waiting for IO to complete. Not very happy about that situation, I'd prefer if that happened automagically. I'll likely change the code to fix that, so we don't have to sprinkle blk_replug_current_nested() and always call io_schedule() instead of schedule(). It's just going to cause too many problems. -- Jens Axboe -
No it still hangs consistently. This time at an earlier spot.
All OS INIT slaves have reached rendezvous
Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100b24000) 0 (cpu 1
task 0xe00000b003bd8000) 0 (cpu 2 task 0xe000023c38248000) 0 (cpu 3 task
0xe00000b003d00000) 0 (cpu 4 task 0xe000023c38258000) 0 (cpu 5 task
0xe00000b003d10000) 0 (cpu 6 task 0xe000023c38268000) 0 (cpu 7 task
0xe00000b003d20000) 0 (cpu 8 task 0xe000023c382e8000) 0 (cpu 9 task
0xe00000b003d30000) 0 (cpu 10 task 0xe000023c38380000) 0 (cpu 11 task
0xe00000b003d40000)
Backtrace of pid 223 (pdflush)
Call Trace:
[<a000000100799ed0>] schedule+0x1bf0/0x1ec0
sp=e0000030156879d0 bsp=e0000030156811a8
[<a0000001000db490>] synchronize_qrcu+0x170/0x1e0
sp=e000003015687ae0 bsp=e000003015681170
[<a0000001003f6bc0>] __make_request+0x160/0x880
sp=e000003015687b10 bsp=e000003015681130
[<a0000001003f17c0>] generic_make_request+0x4a0/0x520
sp=e000003015687b30 bsp=e0000030156810f8
[<a0000001003f78f0>] submit_bio+0x2f0/0x320
sp=e000003015687b50 bsp=e0000030156810b0
[<a0000001003bab20>] xfs_buf_iorequest+0x740/0x820
sp=e000003015687b70 bsp=e000003015681040
[<a0000001003869d0>] xlog_bdstrat_cb+0x50/0xe0
sp=e000003015687ba0 bsp=e000003015681020
[<a000000100384150>] xlog_state_release_iclog+0x770/0xcc0
sp=e000003015687ba0 bsp=e000003015680fc0
[<a000000100384860>] xlog_state_sync_all+0x1c0/0x460
sp=e000003015687ba0 bsp=e000003015680f60
[<a000000100385010>] _xfs_log_force+0xd0/0x5c0
sp=e000003015687bd0 bsp=e000003015680f00
[<a0000001003a3720>] xfs_syncsub+0x40/0x520
sp=e000003015687c00 bsp=e0000...That looks like barriers, could you try with those disabled? Sorry for making you go through this, I can't debug and fix it myself before monday. -- Jens Axboe -
