Hi everyone, hi Linus, congratulations on this new great kernel-release :) I've another "regression" to report for 2.6.25: it's concerning much higher temperatures being read out by the "coretemp" kernel-module in comparison to 2.6.24* series e.g. where temperatures were around 40-47°C they are now constantly jumping around 55-70°C (even in idle !) several other users/testers have reported this issue too (both on zen-sources [heavy patched] & latest gentoo-sources) [slightly patched]: http://forums.gentoo.org/viewtopic-t-684812-highlight-.html if I understood git right it (also) happens in conjunction with an 3 weeks old acpi-snapshot (http://repo.or.cz/w/linux-2.6/zen-sources.git?a=heads) integrated in zen-sources I can "reproduce" this on an Intel Core 2 Duo 6600 (Conroe) with a P5W DH Deluxe mainboard (by Asus) since at least 2.6.25-rc8 (or possibly also rc7) the temperatures on mobile core (2) duo intel processors [e.g. an T7500] seem to be read out correctly - I can't tell it exactly but this seems to be an entirely "cosmetical" issue (hopefully it's not the opposite and the values are now read out correctly since such high temps would be very worrying ;) ) Keep up the great work & please don't forget your growing amount of linux-desktop users :) Regards Mat --
On Fri, Apr 18, 2008 at 5:43 PM, Linus Torvalds sorry, unfortunately I don't have the time right now to do it (exams-time), I hope that someone out of the gentoo-community would be willing to do so, I'll post a link to this entry in the forums & hope that someone can be found to do it Regards Mat --
[Linus Torvalds - Fri, Apr 18, 2008 at 08:43:35AM -0700] | | | On Fri, 18 Apr 2008, Matthew wrote: | > | > I can "reproduce" this on an Intel Core 2 Duo 6600 (Conroe) with a P5W | > DH Deluxe mainboard (by Asus) since at least 2.6.25-rc8 (or possibly | > also rc7) | | Can you bisect it? | | Linus it seems drivers/acpi/thermal.c has a small nit 881: /* sys I/F for generic thermal sysfs support */ 882: #define KELVIN_TO_MILLICELSIUS(t) (t * 100 - 273200) but it should multiply by 1000 meguess - Cyrill - --
[Cyrill Gorcunov - Fri, Apr 18, 2008 at 11:38:02PM +0400] | [Linus Torvalds - Fri, Apr 18, 2008 at 08:43:35AM -0700] | | | | | | On Fri, 18 Apr 2008, Matthew wrote: | | > | | > I can "reproduce" this on an Intel Core 2 Duo 6600 (Conroe) with a P5W | | > DH Deluxe mainboard (by Asus) since at least 2.6.25-rc8 (or possibly | | > also rc7) | | | | Can you bisect it? | | | | Linus | | it seems drivers/acpi/thermal.c has a small nit | | 881: /* sys I/F for generic thermal sysfs support */ | 882: #define KELVIN_TO_MILLICELSIUS(t) (t * 100 - 273200) | | but it should multiply by 1000 meguess | oh, i'm wrong, sorry - Cyrill - --
Hi Len, sure, the apps I am using for reading out the processor's temp via coretemp are / were: lm_sensors: version 2.10.4, 2.10.6 and (currently) 3.0.1: all report the same (higher) temperatures (compared to 2.6.24 series) CONFIG_THERMAL currently is enabled in the kernel: grep CONFIG_THERMAL /usr/src/linux/.config CONFIG_THERMAL=y I'll recompile the kernel now & let you know the effect in a few hours (currently I'm working on the box) "Dairinin" on forums.gentoo.org pointed out that the desktop-cpu simply might not be identified correctly: http://forums.gentoo.org/viewtopic-p-5065902.html#5065902 "Thats because new kernel (lm_sensors, etc?, etc?)incorrectly thinks our cpu's Tj is 100C, whereas for all core 2 duo's it is 85C (100C is for C0 stepping and above, AFAIK). Substract 15 and you'll get real temps." I don't know if that's the case but it sounds plausible to me thanks everyone for your answer & help so far Regards Mat --
ok, tested the vanilla-kernel this morning and it shows the exact high temperatures (with CONFIG_THERMAL=y) I've got a question: when trying to disable thermal it just sits there & won't change: <*> Hardware Monitoring support ---> -*- Generic Thermal sysfs driver ---> it seemingly depends on other things: Selected by: ACPI_THERMAL && !X86_VOYAGER && ACPI && ACPI_PROCESSOR is it safe to disable acpi_processor and acpi or CONFIG_THERMAL in general ? or will it burn down my box ? ;) I'm asking this because it says/writes: CONFIG_ACPI_THERMAL: │ │ │ │ This driver adds support for ACPI thermal zones. Most mobile and │ │ some desktop systems support ACPI thermal zones. It is HIGHLY │ │ recommended that this option be enabled, as your processor(s) │ │ may be damaged without it. thanks Mat
Don't worry about it -- that is sort of an exageration. In fact, it is your disk drive that will fry first:-) # CONFIG_ACPI_THERMAL is not set # CONFIG_THERMAL is not set Should be just fine, particularly for experimentation. In the case of a desktop system, ACPI_THERMAL is generally there just for processor throttling -- which would typically only be needed if you removed your heatsink or had some other serious cooling issue. And even if it is not there, a 2nd defense, the processor hardware thermal throttling would kick in automatically at a slightly higher temperature... -Len --
Amen on that. And to add confusion here, my makeit script started dying when I do a remake after changing a config option, cuz it can't find a System.map file to delete, even if I copy it in from /boot. I had to delete the make -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Check it out, send me comments, and dance joyously in the streets, -- Linus Torvalds announcing 2.0.27 --
And then, after killing the make clean statement, it still dies when making a -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Check it out, send me comments, and dance joyously in the streets, -- Linus Torvalds announcing 2.0.27 --
The coretemp kernel module reports 25°C on my PC when idle, and 34°C after having performed some computations (lmbench2). This looks normal. This test has been performed with a vanilla 2.6.25 kernel and a Core 2 Duo E6750 CPU (Asus motherboard). The 2.6.22 and 2.6.24 kernels report an incorrect temperature on the same system however (10°C). So there is either an issue with the patches that have been applied to your kernel or the behavior of the coretemp module for the 6600 and E6750 CPU's is different. Can you repeat the test with a vanilla 2.6.25 kernel ? (Added Rudolf Marek in CC, the coretemp author.) Bart. --
I see a change on my rrd graphs on Mar 5th 11:30 AM (from 25 to 40 average centigrades). This is when I booted 2.6.25-rc3-mm1 instead of 2.6.25-rc2-mm1, according to logs. [I have no idea whether the values were correct before or are correct now.] I might bisect it, if needed. --
Hi all, I will keep the rest of the mail intact for the lm-sensors list. The temperature is stored in hardware relative to maximum temperature. Mobile processors have some undocumented bits that calibrate the temperature to 85 or 100C. Intel claims that this does not work for desktops, so I changed the driver [1] and scale for desktop CPUs is from 0 to 100. In your case, the MAX temperature is changed from 85 to 100 for desktops But the relative change is same, in your case -40C below the max [2]. You should now see instead of 40 around 60. Because it was 85 - 40 and now it is 100 - 40. Second problem is that the scale is not linear so it works more fine close to the MAX. I think it is what happened. If you change the driver, so it will take 85C instead of 100C everything should go back to "normal". I'm sorry I did not invent relative only temp measurements, therefore I would recommend to watch how much is left to max temperature. Hope it explains it, Thanks, Rudolf [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=118a887188... [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentatio... --
Hello Mat, I'm not familiar with "coretemp", can you point me to the exact version of the application you are running so I can see how it is getting at the underlying information? Also, do you see any change with and without kernel built with CONFIG_THERMAL=y? thanks, -Len --
I think there is some confusion here: "coretemp" is a kernel module, and = all applications reading it will probably use the lm_sensors libraries.=20 (I don't think the hwmon module are related to ACPI) $ modinfo coretemp filename: /lib/modules/2.6.25-ARCH/kernel/drivers/hwmon/coretemp.ko= license: GPL description: Intel Core temperature monitor author: Rudolf Marek <r.marek@assembler.cz> depends: vermagic: 2.6.25-ARCH SMP preempt mod_unload That said, I have two Core 2 CPUs (one mobile, one desktop) and the=20 values coretemp reports have not changed compared to earlier kernel=20 versions (around 60=B0C when idle on the mobile, much less on the desktop= ). > Also, do you see any change with and without kernel built with=20 CONFIG_THERMAL=3Dy? The values I see from ACPI thermal are also the same as before (this is=20 funny: they are always about 15=B0C cooler than the coretemp values). So I don't see a regression here, maybe the reporter should try a=20 vanilla kernel.
thermal isn't working on this board (if you mean /proc/acpi/thermal_zone ...) I also tried a vanilla kernel & it showed the same higher temperature ;( here the last mail (on lkml it was corrupted) - I don't know if you ok, tested the vanilla-kernel this morning and it shows the exact high temperatures (with CONFIG_THERMAL=y) I've got a question: when trying to disable thermal it just sits there & won't change: <*> Hardware Monitoring support ---> -*- Generic Thermal sysfs driver ---> it seemingly depends on other things: Selected by: ACPI_THERMAL && !X86_VOYAGER && ACPI && ACPI_PROCESSOR is it safe to disable acpi_processor and acpi or CONFIG_THERMAL in general ? or will it burn down my box ? ;) I'm asking this because it says/writes: CONFIG_ACPI_THERMAL: │ │ │ │ This driver adds support for ACPI thermal zones. Most mobile and │ │ some desktop systems support ACPI thermal zones. It is HIGHLY │ │ recommended that this option be enabled, as your processor(s) │ │ may be damaged without it. thanks Mat
(this is the last mail I sent, I apologize if it got there already correctly and only my browser displays it corrupted on lkml.org) thermal isn't working on this board (if you mean /proc/acpi/thermal_zone ...) I also tried a vanilla kernel & it showed the same higher temperature ;( here the last mail (on lkml it was corrupted) - I don't know if you ok, tested the vanilla-kernel this morning and it shows the exact high temperatures (with CONFIG_THERMAL=y) I've got a question: when trying to disable thermal it just sits there & won't change: <*> Hardware Monitoring support ---> -*- Generic Thermal sysfs driver ---> it seemingly depends on other things: Selected by: ACPI_THERMAL && !X86_VOYAGER && ACPI && ACPI_PROCESSOR is it safe to disable acpi_processor and acpi or CONFIG_THERMAL in general ? or will it burn down my box ? ;) I'm asking this because it says/writes: CONFIG_ACPI_THERMAL: This driver adds support for ACPI thermal zones. Most mobile and some desktop systems support ACPI thermal zones. It is HIGHLY recommended that this option be enabled, as your processor(s) may be damaged without it. thanks Mat --
---------- Forwarded message ---------- ... (this is the last mail I sent, I apologize if it got there already correctly and only my browser displays it corrupted on lkml.org) thermal isn't working on this board (if you mean /proc/acpi/thermal_zone ...) I also tried a vanilla kernel & it showed the same higher temperature ;( here the last mail (on lkml it was corrupted) - I don't know if you were able to read it > sure, I'll test-drive the vanilla-kernel, too > > thanks ok, tested the vanilla-kernel this morning and it shows the exact high temperatures (with CONFIG_THERMAL=y) I've got a question: when trying to disable thermal it just sits there & won't change: <*> Hardware Monitoring support ---> -*- Generic Thermal sysfs driver ---> it seemingly depends on other things: Selected by: ACPI_THERMAL && !X86_VOYAGER && ACPI && ACPI_PROCESSOR is it safe to disable acpi_processor and acpi or CONFIG_THERMAL in general ? or will it burn down my box ? ;) I'm asking this because it says/writes: CONFIG_ACPI_THERMAL: This driver adds support for ACPI thermal zones. Most mobile and some desktop systems support ACPI thermal zones. It is HIGHLY recommended that this option be enabled, as your processor(s) may be damaged without it. thanks Mat --
I just updated from 2.6.24 to 2.6.25 (I usually follow whole development cycle, but I was very busy, so I skipped 2.6.25 cycle) I confirm this. I *know* that temperatures reported now are wrong. The reason is that bios did report same temperatures as coretemp in 2.6.24, moreover some time ago I have run a cpu tool (don't remember its name) on windows which similar to coretemp reads from each core directly, sensor data , and I noticed that temperature that bios reports is exactly the average temperature of both cores (I had to run this on windows - intel haven't released drivers for their QST for temperature monitoring from bios - very sad) And the driver did say in kernel log that TJMAX is 85C Lets at least make a kernel option to override tjmax? Best regards, Maxim Levitsky --
I too can confirm that it reports incorrect temperatures. I have a Q9450, and this is my "sensors" output: it8718-isa-0290 Adapter: ISA adapter <snip> temp1: +44°C (low = +127°C, high = +127°C) sensor = thermistor temp2: +22°C (low = +127°C, high = +60°C) sensor = diode temp3: -2°C (low = +127°C, high = +127°C) sensor = thermistor vid: +0.000 V coretemp-isa-0000 Adapter: ISA adapter Core 0: +44°C (high = +100°C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +44°C (high = +100°C) coretemp-isa-0002 Adapter: ISA adapter Core 2: +43°C (high = +100°C) coretemp-isa-0003 Adapter: ISA adapter Core 3: +41°C (high = +100°C) temp2 is the cpu temperature(matches bios), temp1 is the northbridge(i think, bios says "system temp"). i have watercooling, and well :P when i touch the "tube", its normal room temperature, and believe me, i would notice if it was 45.. this is with my cpu at idle - at full load on all 4 cores, temp2 says 35, and ~60 on coretemp, and THIS i would surely be able to notice over room temp :) --
Could it be that due to latest thermal changes ACPI is reading temperatures more often, or started to read sensors that interfere with libsensors? There was a patch-set that detect such interference from myself and Jean which was not accepted by Linus. This is what I got from Jean recently, he should be able to point you to the latest patches if you want to give them a try: ---------------------------------------------- That's OK, I'll include it in my i2c tree instead. While Linus doesn't like it, he didn't actually provide replacement code, only a vague idea which Thomas and myself know by experience, won't work well anyway. As the checks done by this driver are valuable, having it in -mm and linux-next is still good. -- Jean Delvare ---------------------------------------------- If you get a conflict you should see something like this when loading the hwmon/sensor driver: i2c /dev entries driver f71805f: Found F71805F/FG chip at 0x290, revision 19 ACPI: I/O resource f71805f [0x290-0x297] conflicts with ACPI region IP__ [0x295-0x296] ACPI: Device needs an ACPI driver Thomas --
Hi all, Adding Rudolf Marek to the thread, as he wrote the coretemp driver and is maintaining it. He was really the first person to contact about your problem... And how do you know? The newly reported temperatures could be correct and the previous ones were incorrect (that's actually the case.) The thing is, the temperature is stored as a relative value in the CPU. Relative to what, depends on the CPU model, can be 85°C or 100°C. Up to kernel 2.6.24 we had a set of rules to find out, in 2.6.25 we have a presumably better heuristic. So some people have seen their CPU The coretemp driver reports the CPU _core_ temperature. That's not something you can touch, believe me (unless you are an electron.) Also note that the CPU temperature reported by the IT8718F may or may not match the reality. To make sure, you'd need to know the type of thermal diode expected by the IT8718F, the type of thermal diode in your CPU, compute the correction factor if there is one. And you'd need to know where the thermal diode is exactly. It is most certainly built into the CPU, but some motherboard makers are doing weird things. 22°C seems very low to me, even for water-cooling. Note that non-linearity of thermal diodes makes measurements inaccurate as they get away from the expected operating point. I guess that thermal diodes used in CPUs are calibrated for best results around the expected In my experience, the BIOS is more likely to get the information from the on-board hardware monitoring chip than from the CPU MSRs as the If that windows tool was not written by Intel, then chances are that the author had as much difficulties as we did to get the correct TJmax values for the different CPU models, so it's hardly meaningful. And even a tool written by Intel themselves, I wouldn't necessarily trust Which driver, which kernel? As I wrote above, the coretemp heuristic changed in kernel 2.6.25, so the fact that a previous kernel was reporting a different tjmax value is irrelevant. ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Yes exactly. I decided to move to 0-100C scale, and move the limit too. Of course some users with too low jumped to better scale some like you seems to It is not a bug, a max limit changed too, it is just matter how to scale it. The temperature is non-physical so comparing it to physical temperature does not make any sense. I'm sorry I did not invent this relative temp stuff - Complain @intel. They have some calibration of TjMAX for mobiles, but this bit does not work for desktops/servers. I tried really hard to get at LEAST some documentation so the driver looks like it looks. And not Well again, I tried hard at Intel and I really could not get any info on some calibration bit. The temperature is non-physical on arbitrary scale. I changed Well again, Intel swears there is no way how to get the TjMAX for desktops/servers. It sucks but this is not my fault. Thanks, Rudolf -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIF5203J9wPJqZRNURAnFSAKC3GpafvkviWggGJPG2o71R4lel0wCgirnW Cr2RidnTZEdKTAj8yEviR0U= =lFMk -----END PGP SIGNATURE----- --
Hi Rudolf, hi @ all, so we were just too concerned all the time & even though the temperatures seem too high there's nothing to worry ? I'd be more tranquilized if I had the old temperatures ;) but like lm_sensors's output states - it's not bad until I / we're getting temperatures from 85°C (?) [in this particular case], ... @lkml, Linus: sorry for all the noise ;) Len, here's the output of sensors (lm_sensors): with acpi thermal-support compiled in: w83627ehf-isa-0290 Adapter: ISA adapter VCore: +1.12 V (min = +0.00 V, max = +1.74 V) in1: +12.36 V (min = +13.46 V, max = +13.46 V) ALARM AVCC: +3.34 V (min = +4.08 V, max = +2.03 V) ALARM 3VCC: +3.34 V (min = +3.92 V, max = +3.95 V) ALARM in4: +1.70 V (min = +1.53 V, max = +2.04 V) in5: +1.59 V (min = +2.04 V, max = +1.02 V) ALARM in6: +5.12 V (min = +6.53 V, max = +6.32 V) ALARM VSB: +3.26 V (min = +3.06 V, max = +4.08 V) VBAT: +3.20 V (min = +4.02 V, max = +4.02 V) ALARM in9: +1.61 V (min = +2.04 V, max = +2.04 V) ALARM Case Fan: 0 RPM (min = 0 RPM, div = 128) CPU Fan: 969 RPM (min = 0 RPM, div = 8) Aux Fan: 3970 RPM (min = 0 RPM, div = 2) fan4: 0 RPM (min = 83 RPM, div = 128) ALARM fan5: 1308 RPM (min = 0 RPM, div = 8) Sys Temp: +39.0°C (high = +123.0°C, hyst = -65.0°C) sensor = thermistor CPU Temp: +29.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode AUX Temp: +121.5°C (high = +80.0°C, hyst = +75.0°C) ALARM sensor = thermistor cpu0_vid: +1.350 V coretemp-isa-0000 Adapter: ISA adapter Core 0: +60.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +57.0°C (high = +84.0°C, crit = +100.0°C) ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- and ...
Note that you can easily get them back by tweaking your sensors.conf
file:
chip "coretemp-*"
compute temp1 @-15, @+15
If I remember correctly, at 84°C your CPU will start to throttle, at
100°C it will shut down. You still have 24°C before the former happens,
so it should be OK.
--
Jean Delvare
--
So the big deal here is that there is no "°C" anywhere in this Better drop the °C from there. It starts throttling at 84 ITUs and shuts down at 100 ITUs (Intel Thermal Units :p). -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh --
The high limit is when the all fans should go max. Throttle is at crit. Rudolf --
Ah, sorry then, I remembered incorrectly. -- Jean Delvare --
degrees Intel? :-) -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
So, im confused.. The reason for this is that the internal sensor is operating on some sort of weird scale, and thus when you interpolate it into "your" scale, it doesent quite come out in the actual degrees celcius the cpu temperature really is? so if i understand this correctly, the coretemp output does NOT --
It's really only an offset, rather than scaling. The temperature reported by the Core and Core2 CPUs is a relative temperature. It tells how far you are from the maximum temperature the CPU can survive. The value is expressed in (relative) degrees C. Rudolf did his best to find out the (absolute) temperature each CPU model can survive (known as TJmax) so that the coretemp driver can provide an absolute temperature to user-space, as all other hardware monitoring drivers do. Our hope was to limit the confusion, but it seems we failed ;) Maybe it would be better if the driver was reporting the relative temperature value directly when we don't know the TJmax value for sure - but then all user-space tools would need to learn how It should, but there's no guarantee on desktop/server CPUs. It can be offset by 15°C if the driver's heuristic to determine TJmax for your CPU is incorrect. I guess the offset could even be different - after all the documentation we got from Intel was incomplete so we don't really know. -- Jean Delvare --
Ah, please ignore my email about ITUs (Intel thermal units), then. The Actually, just libsensors would, and the local admin can adjust it at will using the config file. Nobody in userspace should be reading hwmon sysfs directly without the use of libsensors. If they are, it is their bug, and it is unsupported AFAIK. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh --
Correct. 1ITU=1°C as a difference between temperatures but not as Hmm, good point. I had not considered that we could hide this detail inside libsensors. I'll need to think about it. That being said, in practice that would probably not be too different Except for the features which are not supported by libsensors (e.g. Unsupported by me, certainly, and stupid as well for sure, especially given the amount of work that has gone into the new libsensors API to avoid this. -- Jean Delvare --
Hmm, that's an interesting ABI design. No, I do not think that's a good idea. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
That's how it is. The kernel drivers are to attempt to do their best to give proper readings. But if they don't, the library can apply arbritary user-configurable adjustments. I'd rather different attribute names like "raw_temp#" were used when the adjustment thorugh libsensors is *required*, though. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh --
Yes, that would be nice. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
If absolute temperature is not known, could we move it to some obviously invalid range, so that people would not treat it as absolute temperature? Like, set tjmax == 0K, and report relative temperatures below 0K? Or move it to 100-200C range, which is obviously invalid? Uff, does it differ between cpus with same stepping/etc? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
