Re: Linux v2.6.27-rc1: fails to compile

Previous thread: interrupt overhead on ARM architecture by Alessio Sangalli on Monday, July 28, 2008 - 7:23 pm. (4 messages)

Next thread: Regression on ia64 with cpu masks: optimize and clean up cpumask_of_cpu() by Simon Horman on Monday, July 28, 2008 - 8:45 pm. (4 messages)
From: Linus Torvalds
Date: Monday, July 28, 2008 - 8:23 pm

It's two weeks (and one day), and the merge window is over.

Finally. I don't know why, but this one really did feel pretty dang busy. 
And the size of the -rc1 patch bears that out - at 12MB, it's about 50% 
bigger than 26-rc1 (but not that much bigger than 24/25-rc1, so it's not 
like it's anything unheard of).

The pure size of the -rc's _is_ making me a bit nervous, though. Sure, it 
means that we are good at merging it all, but I have to say that I 
sometimes wonder if we don't merge too much in one go, and even our 
current (fairly short) release cycle is actually too big.

Anyway, that's a discussion for some other event.

Much of -rc1 was in linux-next, but certainly not everything. We'll see 
how that whole thing ends up evolving - it certainly didn't solve all 
problems, and there was some bickering about things that weren't there 
(and some things that mostly were ;), but maybe it helped.

There's a ton of new stuff in there, but at least personally the 
interesting things are the BKL pushdown and perhaps the introduction of 
the lockless get_user_pages_fast(). The build system also got updated to 
allow moving the architecture include files ("include/asm-xyz") into the 
architecture subdirectories ("arch/xyz/include/asm"), and sparc seems to 
have taken advantage of that already.

But those changes are just small details in the end. As usual, the bulk of 
changes are all to device drivers (roughly half, as usual), with the arch 
directory amounting to about half of the remainder. Dirstat:

   3.2% arch/arm/
   9.2% arch/ppc/
  24.6% arch/
   5.2% drivers/char/drm/
   6.3% drivers/char/
   4.5% drivers/gpu/drm/
   4.5% drivers/gpu/
   4.6% drivers/media/video/
   5.5% drivers/media/
   3.0% drivers/net/wireless/
  10.7% drivers/net/
   6.4% drivers/usb/misc/
   4.7% drivers/usb/serial/
  12.9% drivers/usb/
  51.2% drivers/
   4.4% firmware/
   3.7% fs/
   9.2% include/

where the bulk of that fs/ update is the merge of the UBI filesystem, to 
pick ...
From: Nick Piggin
Date: Monday, July 28, 2008 - 9:01 pm

And lockless pagecache, woohoo!
--

From: Alistair John Strachan
Date: Tuesday, July 29, 2008 - 2:49 am

Hi,

Just tried switching to 2.6.27-rc1 on my desktop, with a supported zd1211rw
device, and my wireless AP does not "authenticate". With 2.6.26 (the only
previous working version tested) I get the following in dmesg:

[   17.481900] firmware: requesting zd1211/zd1211b_ub
[   17.536820] firmware: requesting zd1211/zd1211b_uphr
[   17.601837] zd1211rw 1-2:1.0: firmware version 4725
[   17.602837] zd1211rw 1-2:1.0: zd1211b chip 050d:705c v4810 high 00-17-3f AL2230_RF pa0 g--NS
[   18.613540] wlan0: Initial auth_alg=0
[   18.613540] wlan0: authenticate with AP 00:17:3f:a4:d6:9d
[   18.622538] wlan0: RX authentication from 00:17:3f:a4:d6:9d (alg=0 transaction=2 status=0)
[   18.622538] wlan0: authenticated
[   18.622538] wlan0: associate with AP 00:17:3f:a4:d6:9d
[   18.622538] wlan0: RX AssocResp from 00:17:3f:a4:d6:9d (capab=0x461 status=0 aid=2)
[   18.622538] wlan0: associated
[   18.622538] wlan0: switched to short barker preamble (BSSID=00:17:3f:a4:d6:9d)

Which is correct. One perhaps interesting detail is that the AP is unencrypted,
here is what iwlist wlan0 scanning sees:

wlan0     Scan completed :
          Cell 01 - Address: 00:17:3F:A4:D6:9D
                    ESSID:"strachan"
                    Mode:Master
                    Channel:6
                    Frequency:2.437 GHz (Channel 6)
                    Quality=100/100  Signal level=44/100
                    Encryption key:off
                    Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 11 Mb/s; 22 Mb/s
                              6 Mb/s; 9 Mb/s; 12 Mb/s; 18 Mb/s; 24 Mb/s
                              36 Mb/s; 48 Mb/s; 54 Mb/s
                    Extra:tsf=00000036b0f6e1b6

However, on 2.6.27-rc1 I see the following instead:

[   12.120189] firmware: requesting zd1211/zd1211b_ub
[   12.166388] firmware: requesting zd1211/zd1211b_uphr
[   12.218877] zd1211rw 4-2:1.0: firmware version 4725
[   12.258877] zd1211rw 4-2:1.0: zd1211b chip 050d:705c v4810 high 00-17-3f AL2230_RF pa0 g--NS
[   13.097289] wlan0: ...
From: Johannes Berg
Date: Tuesday, July 29, 2008 - 3:09 am

This is about the 100 millionth time this is reported. Please try the
patch I just posted.

johannes
From: Alistair John Strachan
Date: Tuesday, July 29, 2008 - 4:25 am

If it doesn't strain you too much more, could you actually tell me where this 
is? Your last 5 posts to LKML don't seem to contain such a patch, and I'm not 
subscribed to linux-wireless.

-- 
Cheers,
Alistair.
--

From: Johannes Berg
Date: Tuesday, July 29, 2008 - 4:26 am

Well, the latter has archives.

johannes
From: Hugh Dickins
Date: Tuesday, July 29, 2008 - 4:37 am

Wow, your patches seem to be a lot more helpful than your emails.
I presume it's this one below, which at first sight seems to be
working for me on iwl3945 - thank you for that.

Hugh


This patch fixes mac80211 to not use the skb->cb over the queue step
from virtual interfaces to the master. The patch also, for now,
disables aggregation because that would still require requeuing,
will fix that in a separate patch. There are two other places (software
requeue and powersaving stations) where requeue can happen, but that is
not currently used by any drivers/not possible to use respectively.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
This fixes wireless. At least it works on my WPA network, I haven't
actually tested a broken kernel.

 drivers/net/wireless/ath5k/base.c           |    2 -
 drivers/net/wireless/b43/xmit.c             |    2 -
 drivers/net/wireless/b43legacy/xmit.c       |    2 -
 drivers/net/wireless/iwlwifi/iwl-tx.c       |    2 -
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    2 -
 drivers/net/wireless/rt2x00/rt2x00mac.c     |    2 -
 include/linux/skbuff.h                      |    5 ++
 include/net/mac80211.h                      |    6 ---
 net/core/skbuff.c                           |    3 +
 net/mac80211/main.c                         |    8 ----
 net/mac80211/mlme.c                         |    8 +---
 net/mac80211/tx.c                           |   47 ++++++++++++----------------
 net/mac80211/wme.c                          |    3 +
 13 files changed, 40 insertions(+), 52 deletions(-)

--- everything.orig/include/net/mac80211.h	2008-07-29 09:08:16.000000000 +0200
+++ everything/include/net/mac80211.h	2008-07-29 11:07:41.000000000 +0200
@@ -206,8 +206,6 @@ struct ieee80211_bss_conf {
  * These flags are used with the @flags member of &ieee80211_tx_info.
  *
  * @IEEE80211_TX_CTL_REQ_TX_STATUS: request TX status callback for this frame.
- * @IEEE80211_TX_CTL_DO_NOT_ENCRYPT: send this frame without encryption;
- *	e.g., for EAPOL ...
From: Kalle Valo
Date: Tuesday, July 29, 2008 - 4:46 am

From: Alistair John Strachan
Date: Tuesday, July 29, 2008 - 4:55 am

Thanks for the patch, it fixes the issue for me with my zd1211rw. I hope 
the "100 million" rabid users that reported this can piss off happily with 
their working wireless. ;-)

(BTW thanks Hugh/Holger for the patch posting/link.)

-- 
Cheers,
Alistair.
--

From: Theodore Tso
Date: Tuesday, July 29, 2008 - 5:04 am

Yeah, it's really too bad -rc1 got released just before you were able
to post the fix to this, since if there were 100 million people who
were trying out kernels starting with -git7 that use wireless, there
will probably be 200 million people trying out -rc1.  :-)

Thanks for finding and fixing it, though.  I stopped trying out
kernels after -git6 since I was travelling at OSCON, and not having
wireless was a show-stopper for me....

						- Ted
--

From: Johannes Berg
Date: Tuesday, July 29, 2008 - 5:09 am

If everybody's going to decide now to hit on _me_, I'll point out that
davem's MQ TX changes broke it, I only heard about the problem once that
was out because nobody had found it earlier, and I was also travelling
at OLS.

Maybe the lesson we could learn from this is to not release an rc1 while
a bunch of important people are at various conferences.

johannes
From: Johannes Berg
Date: Tuesday, July 29, 2008 - 5:15 am

Of course that's not strictly true, it had been broken forever, it just
happened to never show up before. And I mean forever, the original
devicescape code that got in was already broken.

johannes
From: Theodore Tso
Date: Tuesday, July 29, 2008 - 8:18 am

Sorry, no, I wasn't trying to blame you.  I understand that this was a
hard problem to fix, and it wasn't at all obvious that changes in one
part of the networking stack would break wireless stack due to bad
assumptions it had made, that had been hiding for quite some time.

The timing is just very unfortunate, since if -rc1 had been delayed by
just one more day so it could have incorporated it we would probably
reduce the large number of regression reports; a lot of people who
test -rc1 don't necessarily follow netdev or linux-wireless.

I'm of course also nervously building -rc1 and about to test it, since
I haven't had a chance to test anything since -git6, and I'm wondering
if some other regression may have been introduced since then.

Since it probably doesn't get said enough to everyone who works of
fixing bugs/regressions, thanks very much for your efforts; I (and
many other people) very much appreciate it!!

						- Ted
--

From: John W. Linville
Date: Tuesday, July 29, 2008 - 10:52 am

FWIW, I think the MQ stuff didn't spend much (or any) time in -next...

-- 
John W. Linville
linville@tuxdriver.com
--

From: David Miller
Date: Tuesday, July 29, 2008 - 9:48 pm

From: "John W. Linville" <linville@tuxdriver.com>

It did in a few formats, but then Patrick McHardy pointed out something
that required my rewriting large swaths of it in the days leading up to
the merge window.

And the problem this showed up was a bug that existed in mac80211 long
before I made any TX multiqueue changes :-)
--

From: Alistair John Strachan
Date: Tuesday, July 29, 2008 - 6:57 am

Hi,

(Sorry for the CC frenzy. If you don't have or want anything to do with the
tracing framework in 2.6.27 or the microcode driver, you can stop reading
now.)

Noticing pq's mmiotrace was merged I tried to get a trace of the proprietary
NVIDIA blob. Normally I wouldn't waste your time posting a tainted oops,
however in this case it doesn't look related to the proprietary garbage and I
think there's a real bug somewhere.

As I understand it, the mmiotrace tracing framework requires only one logical
CPU to be active, automatically offlining the other CPUs. When mmiotrace is
disabled, it automatically re-enables the CPUs it offlined. If I offline the
spare CPUs myself, prior to enabling mmiotrace, I do not see the issue I'm
about to describe. That's why tracing people have been CCed, even though that
could be a red herring.

The full dmesg and kernel config are available from
http://devzero.co.uk/~alistair/2.6.27-rc1-mc-oops/

nvidia: module license 'NVIDIA' taints kernel.
Symbol init_mm is marked as UNUSED, however this module is using it.
This symbol will go away in the future.
Please evalute if this is the right api to use and if it really is, submit a report the linux kernel mailinglist together with submitting your code for inclusion.
nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nvidia 0000:01:00.0: setting latency timer to 64
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  173.14.09  Wed Jun  4 23:40:50 PDT 2008
in mmio_trace_init
mmiotrace: Disabling non-boot CPUs...
kvm: disabling virtualization on CPU1
CPU 1 is now offline
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
mmiotrace: CPU1 is down.
mmiotrace: enabled.
Symbol init_mm is marked as UNUSED, however this module is using it.
This symbol will go away in the future.
Please evalute if this is the right api to use and if it really is, submit a report the linux kernel mailinglist together with ...
From: Pekka Paalanen
Date: Tuesday, July 29, 2008 - 9:22 am

On Tue, 29 Jul 2008 14:57:58 +0100

I have a wild hunch...
Could you try the following:
1. with all cpus enabled, load the nvidia proprietary driver
2. start and quit X
3. disable a cpu by hand
4. unload the proprietary driver
5. enable the cpu by hand

I have a vague recollection of the nvidia blob doing something bad
with notifiers, so if that crashes, and it does not crash when you
leave step 4 out, it's an nvidia problem. I assume you unloaded the
blob before disabling mmiotrace, right?

You may need to alter the sequence of things, but my guess is that
the blob may leave a notifier registered even when it is unloaded,
so it crashes when the notifier chain is traversed. I'm not sure the

I'm not sure people are willing to look into this without a clean report,
so this would be cool. There's even a test module for mmiotrace in the
kernel, but I doubt it would make difference to use it or not, when trying
to reproduce the crash without the blob.

Thanks.

-- 
Pekka Paalanen
http://www.iki.fi/pq/
--

From: Alistair John Strachan
Date: Tuesday, July 29, 2008 - 9:50 am

Of course, and I should have attempted to reproduce without the driver.
Fortunately that was easy: it is not an NVIDIA driver bug.

Steps to reproduce: have CONFIG_MICROCODE=y and a suitable Intel
processor, then do:

echo mmiotrace >/debug/tracing/current_tracer
echo none >/debug/tracing/current_tracer

And you get this (snipped) oops:

in mmio_trace_init
mmiotrace: Disabling non-boot CPUs...
kvm: disabling virtualization on CPU1
CPU 1 is now offline
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
mmiotrace: CPU1 is down.
mmiotrace: enabled.
in mmio_trace_reset
mmiotrace: Re-enabling CPUs...
SMP alternatives: switching to SMP code
Booting processor 1/1 ip 6000
Initializing CPU#1
Calibrating delay using timer specific routine.. <6>7204.76 BogoMIPS (lpj=3602381)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
CPU1: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz stepping 06
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
kvm: enabling virtualization on CPU1
CPU0 attaching NULL sched-domain.
Switched to high resolution mode on CPU 1
CPU0 attaching sched-domain:
 domain 0: span 0-1 level MC
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level MC
  groups: 1 0
------------[ cut here ]------------
Kernel BUG at ffffffff8021a31d [verbose debug info unavailable]
invalid opcode: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in: rfcomm l2cap kvm_intel kvm ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables bridge stp llc acpi_cpufreq freq_table coretemp hwmon 
snd_pcm_oss snd_mixer_oss firewire_sbp2 hci_usb bluetooth arc4 ecb crypto_blkcipher cryptomgr crypto_algapi usbhid zd1211rw mac80211 crypto cfg80211 snd_emu10k1 snd_rawmidi 
snd_ac97_codec ac97_bus sg snd_seq_device snd_hda_intel snd_pcm ...
From: Dmitry Adamushko
Date: Wednesday, July 30, 2008 - 2:07 am

Yes, it's kind of a known issue. Take a look at this explanation:
http://lkml.org/lkml/2008/7/24/260

There were a few related discussions in other threads (mainly, Max
Krasnyansky and I were asking for additional info on possible
requirements from the 'microcode' driver...) heh, I think, we'd be
better off just fixing it one way or another.



-- 
Best regards,
Dmitry Adamushko
--

From: Dmitry Adamushko
Date: Wednesday, July 30, 2008 - 3:35 am

does a patch below fix it for you?
[ not really what we wanted ]

(non-white-space-damaged version is enclosed)
--- kernel/cpu.c-old    2008-07-30 12:31:15.000000000 +0200
+++ kernel/cpu.c        2008-07-30 12:32:02.000000000 +0200
@@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned in
                goto out_notify;
        BUG_ON(!cpu_online(cpu));

+       cpu_set(cpu, cpu_active_map);
+
        /* Now call notifier in preparation. */
        raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);

@@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)

        err = _cpu_up(cpu, 0);

-       if (cpu_online(cpu))
-               cpu_set(cpu, cpu_active_map);
-
 out:
        cpu_maps_update_done();
        return err;


-- 
Best regards,
Dmitry Adamushko
From: Peter Oruba
Date: Wednesday, July 30, 2008 - 6:28 am

Dmitry,

works for me...

Thanks,
Peter

--

From: Alistair John Strachan
Date: Thursday, July 31, 2008 - 5:49 am

Well, if this patch is all that can be done about the issue, it gets my tested 
seal of approval. The CPUs online/offline properly without upsetting the mc 
driver. Thanks.

-- 
Cheers,
Alistair.
--

From: Ingo Molnar
Date: Thursday, July 31, 2008 - 9:56 am

could you please send this patch with a changelog, explanation, etc.?

	Ingo
--

From: Dmitry Adamushko
Date: Thursday, July 31, 2008 - 12:52 pm

Now having thought a bit more on that issue, I tend to think that this
patch is not all that nice (so I agree with Max here).

The root problem is the way set_cpus_allowed_ptr() is used in
microcode's cpu-hotplug handler. With cpu_active_map in place
set_cpus_allowed_ptr() can't migrate a task on the soon-to-be-online
cpu from withing a CPU_ONLINE handler (more in details here:
http://lkml.org/lkml/2008/7/24/260)

Basically, this patch marks a 'cpu' available for other tasks to be
migrated to it before sending CPU_ONLINE notification to
subscribers... [ now, there can be CPU_ONLINE
http://lkml.org/lkml/2008/7/24/260handlers that has something to do
with enabling migration/load-balancing. e.g. migration_call() ,
although it has the highest prio and is supposed to run first in a
chain ]

In another thread, I've asked whether doing 'microcode update' in
start_secondary() (or even at the beginning of idle_cpu() would be
better):

pros:
- it's done as early as possible (no other tasks has started running
on a cpu yet);
- no actions in cpu-hotplug;

cons:
- microcode sub-systems becomes visible outside of microcode.c _but_
it's arch-specific part anyway + with object-oriented re-work (which
is in -tip), I think it'd be that bad.

Alternatives:

- delayed 'microcode' update -> scheduled to 'workqueue'  (cons: it's
not as early as possible);
- Max suggested a combination of IPI + some wotk (request_firmware())
from cpu-hotplug handler itself. But I think it's quite a complex
scheme (and maybe prone to other problems).



-- 
Best regards,
Dmitry Adamushko
--

From: Dmitry Adamushko
Date: Thursday, July 31, 2008 - 12:55 pm

it was supposed to be "it'd be _not_ that bad"


-- 
Best regards,
Dmitry Adamushko
--

From: Jesse Barnes
Date: Tuesday, July 29, 2008 - 9:27 am

I think linux-next has been a *huge* help.  It's been great at catching merge 
conflicts and build bugs (though not so much when you don't use it[1]!), and 
Stephen is really easy to work with.  So I, for one, would love to see it 
continue.

Jesse

[1] http://marc.info/?t=121699085400001&r=1&w=2
--

From: Linus Torvalds
Date: Tuesday, July 29, 2008 - 9:59 am

I don't think anybody wants it to go away. The question in my mind is more 
along the way of how/whether it should be changed. There was some 
bickering about patches that weren't there, and some about how _partial_ 
series were there but then the finishing touches broke things.

I don't personally really think that it's reasonable to expect everything 
to be in -next (but hey, I'm willing to be convinced otherwise). And don't 
get me wrong - it certainly wouldn't bother _me_ to have everything go 
through next, since it just makes it likelier that I have less to worry 
about.

BUT. I do think 'next' as it is has a few issues that either need to be 
fixed (unlikely - it's not the point of next) or just need to be aired as 
issues and understood:

 - I don't think it does 'quality control', and I think that's pretty 
   fundamental.

   Now, admittedly I don't look much at the patches of people I trust 
   either (that's what the whole point of that 'trust' is, after all - to 
   make me not be the part that limits development speed), but that's 
   still different from 'largely automated merging'.

   So I _do_ check the things that aren't obvious "maintainer works on his 
   own subsystem" or are so core that I really feel like I need to know 
   what's up. I seldom actually say "that's so broken that I refuse to 
   pull it", but I tend to do that a couple of times per release.

   That may not sound like much, but it's enough to make me worry about 
   'next'. I worry that 'it has been in next' has become a code-word for 
   "pull this, because it's good", and I'm not at all convinced that 
   'next' sees any real critical checking.

 - I don't think the 'next' thing works as well for the occasional 
   developer that just has a few patches pending as it works for subsystem 
   maintainers that are used to it.

   IOW, I think 'next' needs enough infrastructure setup from the 
   developer side that I don't think it's reasonable for _everything_ to 
   go ...
From: Roland Dreier
Date: Tuesday, July 29, 2008 - 10:31 am

>    That may not sound like much, but it's enough to make me worry about 
 >    'next'. I worry that 'it has been in next' has become a code-word for 
 >    "pull this, because it's good", and I'm not at all convinced that 
 >    'next' sees any real critical checking.

I've been mentioning that my trees have been in next as code for, "I
don't think this should break the build or clash too badly with anything
else."  And next has been useful to me on several occasions for catching
that sort of problem before things hit mainline.

 - R.
--

From: Andrew Morton
Date: Wednesday, July 30, 2008 - 2:03 am

Those people's patches are in -mm, which now holds maybe 100 or more
"trees", many of which are small or empty.

My project within the next couple of weeks is to get most of that

True.  But

a) some of the problematic changes which we've seen simply _should_
   have been in linux-next.  Some of them were even coming from
   developers whose trees are already in linux-next.

b) A lot of the bugs which hit your tree would have been quickly
   found in linux-next too.


But it's all shuffling deckchairs, really.  Are we actually merging
better code as a reasult of all of this?  Are we being more careful and
reviewing better and testing better?


Oh sure.  But it depends on the _reason_ why it wasn't in linux-next. 
If the reason is a good one then fine.  But if the reason is "I was too
slack", or "I only wrote it five minutes ago" then the system is good,
and the developer isn't.

--

From: Rafael J. Wysocki
Date: Thursday, July 31, 2008 - 3:22 pm

Well, if the number of the regressions list entries can be regarded as a
pointer, then yes, we are. :-)

There are 28 entries in there right now, compared to 53 entries initially in
the list during the 2.6.26 cycle (see
http://bugzilla.kernel.org/show_bug.cgi?id=11167 for reference).

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Tuesday, July 29, 2008 - 1:49 pm

That one happens to break things for me badly:

rafael@chimera:~/src/linux-2.6> make O=../build/mainline/chimera -j5
  GEN     /home/rafael/src/build/mainline/chimera/Makefile
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  Using /home/rafael/src/linux-2.6 as source for kernel
  CALL    /home/rafael/src/linux-2.6/scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  Building modules, stage 2.
Kernel: arch/x86/boot/bzImage is ready  (#208)
  MODPOST 564 modules
  IHEX2FW firmware/emi26/loader.fw
Failed to open destination file: Permission deniedihex2fw: Convert ihex files into binary representation for use by Linux kernel
usage: ihex2fw [<options>] <src.HEX> <dst.fw>
       -w: wide records (16-bit length)
       -s: sort records by address
  IHEX2FW firmware/emi26/bitstream.fw
  IHEX2FW firmware/emi26/firmware.fw
Failed to open destination file: Permission deniedihex2fw: Convert ihex files into binary representation for use by Linux kernel
usage: ihex2fw [<options>] <src.HEX> <dst.fw>
       -w: wide records (16-bit length)
       -s: sort records by address
make[2]: *** [firmware/emi26/loader.fw] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [firmware/emi26/bitstream.fw] Error 1
Failed to open destination file: Permission deniedihex2fw: Convert ihex files into binary representation for use by Linux kernel
usage: ihex2fw [<options>] <src.HEX> <dst.fw>
       -w: wide records (16-bit length)
       -s: sort records by address
make[2]: *** [firmware/emi26/firmware.fw] Error 1
make[1]: *** [modules] Error 2
make: *** [sub-make] Error 2

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Tuesday, July 29, 2008 - 2:01 pm

Actually, this happened due to some firmware files being created as root during
installations of pre-rc -git kernels from the O= directory.  So, not a real
problem, but somewhat confusing.

Thanks,
Rafael
--

From: Linus Torvalds
Date: Tuesday, July 29, 2008 - 2:01 pm

Yeah, I've had that happen myself. It used to be that "make 
modules_install" (as root, obviously) would try build a new version of the 
kernel, and as a result subsequent build attempts would fail horribly 
because of various random files being now owned-by-root.

I don't know if there is a whole lot we can do about it in general..

			Linus
--

From: David Woodhouse
Date: Tuesday, July 29, 2008 - 3:26 pm

Not in general. I _think_ I fixed the specific problem which led to
Rafael's situation, but I'm still waiting for confirmation of that.

-- 
dwmw2

--

From: Sam Ravnborg
Date: Tuesday, July 29, 2008 - 2:37 pm

Most architectures are easy to convert. But those that uses
symlinks to select between different platforms etc needs a bit more
care if we shall get rid of all symlinks.

Paul already fixed up sh and sent you a pull request.
I have something ready for arm (not yet posted).
And I sent Harvaard a small script that can fix avr32 when arm is done.
cris looks similar and I can take care too.

The rest that does not use additiona symlinks are in general much simpler.

Kyle already fixed up parisc (simple).
x86 is simple - only a small patch needed to arch/x86/Makefile.
I have not dared looking at um.

But will you accept this stuff now or will we have to wait until
next merge window?
Now is a good time as development just started for next kernel.
And testing is simple - does it build?

It will come in via arch maintainers but I will assist.

	Sam
--

From: Linus Torvalds
Date: Tuesday, July 29, 2008 - 2:42 pm

if the patches are really small adn the resulting build is well tested 
(ignoring the actual _move_ operation), I'm ok with taking them.

In fact, in many ways I'd _prefer_ to do it now, rather than have it 
pending and then do it durign the next merge window when there are a lot 

Well, simple and simple. I'd love to see x86 done, but you yourself said 
you haven't even dared look at UM. Which is the thing that is most likely 
to have odd build things with direct symlinks etc.

(But I haven't looked either. Maybe I'm wrong, and it's all trivial).

		Linus
--

From: Sam Ravnborg
Date: Tuesday, July 29, 2008 - 2:59 pm

For arm the actual diff is:
 Makefile                 |   20 +++++++-------------
 boot/compressed/Makefile |    3 ---
 tools/Makefile           |    1 +
 3 files changed, 8 insertions(+), 16 deletions(-)

But on top of this there are ~600 files that needed a
replacement of:
#include <asm/arch/foo.h>
to
#include <arch/foo.h>

So maybe not such a minimal patch - because I wanted to drop all
um i never trivial :-(
But I will give it a try - but I have to sleep first.

	Sam
--

From: Linus Torvalds
Date: Tuesday, July 29, 2008 - 3:03 pm

Why? There's nothing wrong with symlinks.

The problem with 'include/asm' isn't the symlink per se, it's that it 
split up the architecture parts in two separate areas - arch and include.

			Linus
--

From: Sam Ravnborg
Date: Tuesday, July 29, 2008 - 3:30 pm

I agree that we do this to combine all arch files. But we then have the
possibility to get rid of some build stuff that is fragile and needs
special care for O=.. builds etc.
So when we anyway do the renames I saw the opportunity to go one step
further.
Lets see how the x86 + um stuff looks like.

For x86 alone the patchs looks like this:

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index f5631da..c7493e7 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -110,16 +110,16 @@ KBUILD_CFLAGS += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,)
 mcore-y  := arch/x86/mach-default/
 
 # Voyager subarch support
-mflags-$(CONFIG_X86_VOYAGER)	:= -Iinclude/asm-x86/mach-voyager
+mflags-$(CONFIG_X86_VOYAGER)	:= -Iarch/x86/include/mach-voyager
 mcore-$(CONFIG_X86_VOYAGER)	:= arch/x86/mach-voyager/
 
 # generic subarchitecture
-mflags-$(CONFIG_X86_GENERICARCH):= -Iinclude/asm-x86/mach-generic
+mflags-$(CONFIG_X86_GENERICARCH):= -Iarch/x86/include/mach-generic
 fcore-$(CONFIG_X86_GENERICARCH)	+= arch/x86/mach-generic/
 mcore-$(CONFIG_X86_GENERICARCH)	:= arch/x86/mach-default/
 
 # default subarch .h files
-mflags-y += -Iinclude/asm-x86/mach-default
+mflags-y += -Iarch/x86/include/mach-default
 
 # 64 bit does not support subarch support - clear sub arch variables
 fcore-$(CONFIG_X86_64)  :=

And the script to move the files looks like this:
set -e
for D in include/asm-x86/mach-*; do
	echo $D
	DD=$(echo $D | cut -d '-' -f 3)
        mkdir -p arch/x86/include/mach-$DD/arch
        git mv include/asm-x86/mach-$DD/* arch/x86/include/mach-$DD
        rmdir include/asm-x86/mach-$DD
done

mkdir -p arch/x86/include/asm
git mv include/asm-x86/* arch/x86/include/asm

But I have not yet looked at um.

	Sam
--

From: Grant Coady
Date: Tuesday, July 29, 2008 - 3:03 pm

Couple machines failed to compile with same error in different place:

  CC      arch/x86/kernel/acpi/cstate.o
arch/x86/kernel/acpi/cstate.c: In function `acpi_processor_ffh_cstate_probe':
arch/x86/kernel/acpi/cstate.c:94: error: invalid lvalue in unary `&'
make[2]: *** [arch/x86/kernel/acpi/cstate.o] Error 1
make[1]: *** [arch/x86/kernel/acpi] Error 2
make: *** [arch/x86/kernel] Error 2
grant@peetoo:~/linux/linux-2.6.27-rc1a$

Linux peetoo 2.6.25.13a #15 Tue Jul 29 07:41:48 EST 2008 i686 pentium3 i386 GNU/Linux

Gnu C                  3.4.6
Gnu make               3.81
binutils               2.15.92.0.2
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.38
reiserfsprogs          3.6.19
quota-tools            3.13.
PPP                    2.4.4
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Linux C++ Library      6.0.3
Procps                 3.2.7
Net-tools              1.60
Kbd                    1.12
oprofile               0.9.1
Sh-utils               5.97
udev                   097
Modules Loaded         adm9240 hwmon_vid nfsd exportfs tulip e100
- - -

  CC      arch/x86/kernel/ldt.o
arch/x86/kernel/ldt.c: In function `alloc_ldt':
arch/x86/kernel/ldt.c:67: error: invalid lvalue in unary `&'
make[1]: *** [arch/x86/kernel/ldt.o] Error 1
make: *** [arch/x86/kernel] Error 2
grant@black:~/linux/linux-2.6.27-rc1a$

Linux black 2.6.26a #1 SMP Fri Jul 25 08:49:49 EST 2008 i686 pentium4 i386 GNU/Linux

Gnu C                  3.4.6
Gnu make               3.81
binutils               2.15.92.0.2
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.38
jfsutils               1.1.11
reiserfsprogs          3.6.19
xfsprogs               2.8.10
pcmciautils            014
pcmcia-cs              3.2.8
quota-tools            3.13.
PPP                    2.4.4
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Linux C++ Library      ...
From: Frederik Deweerdt
Date: Tuesday, July 29, 2008 - 3:40 pm

Hello Grant,
This issue has been reported, this is due to your old compiler.
Fix here:
http://lkml.org/lkml/2008/7/29/48

Regards,
Frederik
--

From: Grant Coady
Date: Tuesday, July 29, 2008 - 4:46 pm

Thanks, that does the trick :)


--

Previous thread: interrupt overhead on ARM architecture by Alessio Sangalli on Monday, July 28, 2008 - 7:23 pm. (4 messages)

Next thread: Regression on ia64 with cpu masks: optimize and clean up cpumask_of_cpu() by Simon Horman on Monday, July 28, 2008 - 8:45 pm. (4 messages)