Re: [RFC] New kernel-message logging API

Previous thread: [PATCH] Add extended crashkernel syntax to kernel-parameters.txt by Bernhard Walle on Saturday, September 22, 2007 - 11:25 am. (1 message)

Next thread: [PATCH] Update version information by Bernhard Walle on Saturday, September 22, 2007 - 12:55 pm. (1 message)
From: Vegard Nossum
Date: Saturday, September 22, 2007 - 12:27 pm

After recent discussions on LKML and a general dissatisfaction at the
current printk() kernel-message logging interface, I've decided to
write down some of the ideas for a better system.


Requirements
============

 * Backwards compatibility with printk(), syslog(), etc. There is no
way the whole kernel can be converted to a new interface in one go.
printk() is used all over the kernel, in many different ways,
including calls from assembly, multi-line prints spread over several
calls, etc.

 * Extensibility. Features like timestamping or file/line recording
[1] should be both selectable at compile-time and (if compiled in) at
run-time.


API
===

#define kprint(fmt, ...)

The main part of the kprint interface should be the kprint() function.
The string must be a single line of information (ie. it must not
contain any newlines). Calling kprint("Hello world") is equivalent to
calling printk("Hello world\n"). kprint() may be implemented as a
macro, and must not be called from assembly.


To support the different log-levels, there exists one kprint_*()
function for each log-level, for example kprint_info(). The string
must be a single line of information. Calling kprint_emerg("Oops.") is
equivalent to calling printk(KERN_EMERG "Oops.\n"). These functions
may also be implemented as macros, and must also not be called from
assembly. The different log-levels (and their functions) are:

enum kprint_loglevel {
	KPRINT_EMERG,	/* kprint_emerg() */
	KPRINT_ALERT,	/* kprint_alert() */
	KPRINT_CRIT,	/* kprint_crit() */
	KPRINT_ERROR,	/* kprint_error() and/or kprint_err() */
	KPRINT_WARNING,	/* kprint_warning() and/or kprint_warn() */
	KPRINT_NOTICE,	/* kprint_notice() */
	KPRINT_INFO,	/* kprint_info() */
	KPRINT_DEBUG,	/* kprint_debug() */
};

In order to print several related lines as one chunk, the emitter
should first allocate an object of the type struct kprint_buffer. This
buffer is initialized with the function kprint_buffer_init() which
takes as arguments a ...
From: Joe Perches
Date: Saturday, September 22, 2007 - 5:13 pm

Good ideas.  Perhaps a prefix of klog or kp_ instead?

Severity doesn't really equate to desire to log.
I'd prefer arbitrary control of log levels.

cheers, Joe

-

From: Kyle Moffett
Date: Saturday, September 22, 2007 - 5:40 pm

Umm, actually... Severity is *exactly* "desirability of logging".  If  
you find any place that's otherwise then please provide patches to  
fix it (or at least notify the maintainer with a useful bug report  
and reasoning.

Cheers,
Kyle Moffett

-

From: Joe Perches
Date: Saturday, September 22, 2007 - 5:47 pm

Disagree.

What's info to someone is an alert to someone else.
The problem is the valuation of the reasoning.

It's all opinion.

cheers, Joe

-

From: Kyle Moffett
Date: Saturday, September 22, 2007 - 8:43 pm

For starters, I think "its all opinion" is bogus; the value of each  
log level is pretty standardly defined for the kernel:
EMERG:   The system is completely failing right NOW.  Immediate admin  
action is definitely required
ALERT:   The system is about to fail or halfway-failing.  Immediate  
admin action may be required.
ERR:     Something went badly wrong.  We are able to continue without  
endangering the overall system.
NOTICE:  Something unusual happened but it's either not a device/code  
failure or it's an expected failure.
INFO:    Something useful happened, like a device probe or a major  
system-wide event
DEBUG:   Something happened that only a developer or somebody looking  
for bugs would care about.

All of those are basically defined as: "This is how much the admin  
probably wants to see this message".  If you find a message which  
doesn't seem to match the log level it's logged at, please submit a  
patch.

Even assuming its not a bogus argument, what you actually want is  
*NOT* "Show me INFO and DEBUG, but not EMERG, ALERT, ERR, or NOTICE",  
you actually want multiple metrics by which to measure things.  We  
already have multiple ways of filtering log messages:
   1)  Log-level (greater than $LEVEL).
   2)  Userspace textual post-processing filters and classifiers (EG:  
logcheck)
   3)  CONFIG_${SUBSYS}_DEBUG and CONFIG_${DRIVER}_DEBUG

Perhaps what you want is something like the syslog "facility" parameter?

The only real reason to add more kconfig options for logging is to  
decrease the size of the kernel binary; most anything else can be  
better done in userspace in something like logcheck.  If you're going  
to add new kconfig options then you aren't going to configure them  
generically since they're inherently subsystem/driver specific.  In  
that case you might add a per-driver "CONFIG_${DRIVER}_NOLOG" for the  
drivers you care about or a "CONFIG_MAX_LOG_LEVEL" to globally limit  
by severity but any other qualifiers inherently don't ...
From: Vegard Nossum
Date: Sunday, September 23, 2007 - 1:39 am

I don't know. Compare the following two lines:

printk(KERN_INFO "Message.\n");
kprint_info("Message.");

By dropping the lengthy macro (it's not like it's going to change
while we're running anyway, so why not make it a part of the function
name?) and the final newline, we actually end up with a net decrease
in line length.

I thought it would be nice to have something that looks familiar,
since that would ease an eventual transition. klog is a valid
alternative, but isn't kp a bit cryptic?


Vegard
-

From: Jan Engelhardt
Date: Sunday, September 23, 2007 - 1:45 am

I'd rather fix up code to reduce its indent rather than
trying microoptimizations at the function name level!
-

From: Vegard Nossum
Date: Sunday, September 23, 2007 - 2:20 am

Well, that's a different discussion, really. Is fixing the body of
[v]printk() enough? I don't think it's possible if you want to do
everything that I described in my original post.

It's not even an optimization from my point of view; I think Joe
Perches was arguing that code should be kept short. I was arguing that
it's already shorter than with the current situation.


Vegard
-

From: Jan Engelhardt
Date: Sunday, September 23, 2007 - 2:38 am

I mean the bodies of functions calling printk. (where possible)
Like this:

diff --git a/fs/quota_v2.c b/fs/quota_v2.c
index c519a58..49e04e7 100644
--- a/fs/quota_v2.c
+++ b/fs/quota_v2.c
@@ -86,12 +86,12 @@ static int v2_write_file_info(struct super_block *sb, int type)
 	dinfo.dqi_free_entry = cpu_to_le32(info->u.v2_i.dqi_free_entry);
 	size = sb->s_op->quota_write(sb, type, (char *)&dinfo,
 	       sizeof(struct v2_disk_dqinfo), V2_DQINFOOFF);
-	if (size != sizeof(struct v2_disk_dqinfo)) {
-		printk(KERN_WARNING "Can't write info structure on device %s.\n",
-			sb->s_id);
-		return -1;
-	}
-	return 0;
+	if (size == sizeof(struct v2_disk_dqinfo))
+		return 0;
+
+	printk(KERN_WARNING "Can't write info structure on device %s.\n",
+		sb->s_id);
+	return -1;
 }
 
 static void disk2memdqb(struct mem_dqblk *m, struct v2_disk_dqblk *d)


I gained 8 columns _without_ fiddling with the names.
-

From: Joe Perches
Date: Sunday, September 23, 2007 - 12:23 pm

I think that's a different discussion.

You could fix some of them whenever you like.

$ egrep -r -l --include=*.c "^[[:cntrl:]]{6,}printk" *

cheers, Joe

-

From: Joe Perches
Date: Sunday, September 23, 2007 - 12:25 pm

The problem isn't printk(KERN_<level> to kprint_<level>(.

The problems are the no level printk(foo)s to kprint_<level>(foo)


Probably no more than pr_<level> is today.

cheers, Joe

-

From: Miguel Ojeda
Date: Saturday, September 22, 2007 - 6:05 pm

Nice. I would suggest having some kind of standard way to show the
information on the screen/dmesg. I mean, instead of having plain lines
being written to the log, having something very short like:

SSL: Message sent.

Being:

SS - Subsystem ("EA"  for early code, "MM" for memory managment, "PU"
for processor stuff, "HD" for hard disks, "PP" for parallel port, "NT"
for networking, "VI" for video stuff, "FB" for framebuffers, "SN" for
sound stuff, "KE" for keyboard, "MO" for mouse, ... I think you get
the idea, just generic things).
L - Log level (0 for emerg, ..., 7 for debug)

And maybe some other character for other information. This would be
great to read pretty easily dmesg and for grepping, like:

  $ dmesg | grep ^FB

for getting just information about framebuffers, or

  $ dmesg | grep ^..[0123]

to get all the problems of the whole kernel/system.

So, for example, userspace scripts will be able to separate into
different log files the kernel stuff:

  #!/bin/sh
  dmesg | grep ^..[0123] > klog.errors
  dmesg | grep ^NT > klog.networking
  dmesg | grep ^HD > klog.harddisks
  dmesg | grep ^FB > klog.framebuffers

Maybe its weird at first, but I think it will speed up the reading of
plain dmesg's outputs for everyone at the cost of 3-5 more characters
at every line in dmesg.

Also, it may help to make printk()'s messages to be more uniform,
instead of having every .c file having differents ways to express
similar things.

Getting more complex, lets add another character:

SSLR: Message sent.

being R the reason of the message (D for information about a
probed-and-detected hardware [like a PCI card], R for a new
succesfully registered device [like a framebuffer], S for new settings
in a device [like taking up a ethernet link, or that messages about
IRQs], C for copyright/about/info lines, ...).

Now we have at some dmesg (for example):

  eth0: Broadcom 4400 10/100BaseT Ethernet 00:1f:a2:0c:4a:72
  ieee80211_crypt: registered algorithm 'TKIP'
  ...
From: Vegard Nossum
Date: Monday, September 24, 2007 - 10:27 pm

Thanks for the idea. Is this something you want to (manually) insert
in each printk, or would it be passed like a parameter? For the
subsystem part, it might be possible to #define a subsystem name at
the beginning of each file and have printing functions automatically
use this. But otherwise, I think the usefulness of this is perhaps a
little low compared to the effort needed (ie. for this to be useful,
each and every printk of the kernel would need to be changed). Also
notice, even in your examples, that most subsystems/drivers already
prefix their messages to identify the source. Perhaps a better effort
spent would be to go through the ones that are missing this "source"
prefix and fix them?

Vegard
-

From: Miguel Ojeda
Date: Tuesday, September 25, 2007 - 10:24 am

No parameters if possible: As you said, one or two #define's in the
beggining of the file can tell printk what to print as subsystem and
driver/...; and the reason (printk_xxx) could tell the loglevel. For
example: "detected", "registered" or "copyright" messages will have
always some low level priority; and "errors" or "alerts" high
priorities. This can help whenever selecting what loglevel priorities
to log, as all drivers will log most common messages in the same
loglevel.

So, with your change to printk_xxx, we can add real meaning to the
messages, and forget about selecting a priority for most common
messages. And that does not break your idea of writing printk_yyy,
being yyy all the priorities, as we will be able to use that functions
for uncommon kind of messages.

To clarify what I'm saying, here is an example:

  #define PRINTK_SUBSYSTEM "video"
  #define PRINTK_DRIVER "some123ag"
  ...
  printk_copyright("Pretty Systems Inc. (C) <some@body.com>");
  ...
  printk_detected("Some Pro 123ag VGA");
  ...
  printk_registered("Framebuffer /dev/fb1");
  ...
  if (really_weird_stuff_happened)
      printk_emerg("Video card appears to be disconnected!");

As you see, each message will write to the correct priority. Even,
things like "printk_copyright" could have a macro like:

#define PRINTK_COPYRIGHT(who, year, email) printk_copyright(who " (C)
" year " <" email ">");


Right... But in the long time, it can help a lot. For now, every
maintainer should add the #define line in each file and write the
reason while changing the function name (that is something we will

I would say 70% do it. Also, they just write the name of the
subsystem/driver, a ':' and then a non-standard string. I mean, most
of the messages are about a new detected device, an error encountered,
a registered/mounted device/link, the copyright, the authors, ... Some
of them say "registered XXX", others "XXX registered", other just say
"new XXX" or split the message in two lines, etc.

My point ...
From: Rob Landley
Date: Sunday, September 23, 2007 - 1:05 pm

That doesn't require changing the API.  Allowing the compiler to eliminate 



You know, I'm pretty happy with a first pass that doesn't address this issue 
at all.  Why bundle three unrelated problems into a single all-or-nothing 
pass?

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-

From: Michael Holzheu
Date: Monday, September 24, 2007 - 2:22 am

Hi Vegard,


Good luck :-)


Together with the idea of not allowing multiple lines in the kprint_xxx
functions, that would go with our approach having message numbers to
identify a message. Multiple lines are combined explicitly to one
message. I think it is a good idea to be able to identify, which lines
of a message belong together.



Would be nice to have some code here. How do you want to implement that?
You have to allocate / preallocate memory for the argv array. Something
like:

kprint_err(const char* fmt, ...)
{
	va_list   ap;
	struct kprint_message *msg;

	msg = &message_arry[current];
	
	va_start(ap, fmt);
	msg->argv = kmalloc(sizeof(long) * argc, GFP_KERNEL);
	...
	for (i = 0; i < argc, i++) {
		msg->argv[i] = va_arg(ap, long);
	}

If you do it like that, you can't support "%s", since then you would
store only the pointer and not the whole string. I think, that we can't
live without %s.

Michael

-

From: Joe Perches
Date: Monday, September 24, 2007 - 8:19 am

long long would not work either.

Perhaps it's better to scan the format string or indirect the
printk formatting logic.

Another possibility:

An added pass between gcc preprocessor and compiler could compact
or compress the format string without modifying the conversion
specifications so __attribute__ ((format (printf)) would still work.

printk could expand the compacted/compressed format.


-

From: Vegard Nossum
Date: Monday, September 24, 2007 - 9:43 am

Storing the format-string separately allows us to hash THAT instead of
the formatted (ie. console output) message. Since this will never
change from message to message, it can be looked up in a table or
whatever and allow user-space to do translations without for example
regular expressions.

I'm pretty sure this *would* be possible to do with existing printk(),

The idea was to not use snprintf() for the stored message, but have a
printf-like function that instead writes the output of each format
specifier to its own string (ie. each argument produces a new char*
object). Also needed would be an additional function that reads the
format string but uses the stored arguments.

This turned out to be slightly harder than I assumed at first, though
it's of course not impossible. The downside of this approach is of
course the need to maintain normal vsnprintf() and new functions side
by side, with little possibility of reusing the format parsing of
vsnprintf().

I will follow up with some code to demonstrate as soon as I can.


Vegard
-

From: Joe Perches
Date: Monday, September 24, 2007 - 10:00 am

That hash will change with each linux version given the

Looking forward to it.

-

From: Vegard Nossum
Date: Monday, September 24, 2007 - 1:37 pm

But we can keep the old ones too. That shouldn't be much of a problem.
I mean, it probably wouldn't rely on a hash alone. The format string

Okay, so I have one huge file that does more or less what I want. The
main problem with this, as I see it, is that it largely duplicates
vsnprintf() from lib/vsprintf.c (minus the functions I coped verbatim
from the same file). This is bad because the job of maintaining that
is now doubled (or worse). Hopefully it won't change much in the
future either.

For now, the main() function is most important. This demonstrates that
what I want is in fact possible. printf() is now split in two --
args_printf_prepare() and args_snprintf(). The former takes your
arguments as you would pass them to any *printff() function, but only
converts your arguments (and stores them in a struct args). The latter
takes the format string and the stored struct args and puts the whole
thing in a buffer (just like snprintf()).

I know Gmail isn't too friendly with code formatting, but this is just
an example program anyway, so it shouldn't really matter. Also, a
disclaimer: Almost completely untested.


Vegard


#define _GNU_SOURCE
#include <ctype.h>
#include <stdarg.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* This was shamelessly copied form include/asm-generic/div64.h */
# define do_div(n,base) ({                                      \
        uint32_t __base = (base);                               \
        uint32_t __rem;                                         \
        __rem = ((uint64_t)(n)) % __base;                       \
        (n) = ((uint64_t)(n)) / __base;                         \
        __rem;                                                  \
 })

/* This was shamelessly copied from lib/vsprintf.c */
static char* put_dec_trunc(char *buf, unsigned q)
{
	unsigned d3, d2, d1, d0;
	d1 = (q>>4) & 0xf;
	d2 = (q>>8) & 0xf;
	d3 = (q>>12);

	d0 = 6*(d3 + d2 + d1) + (q & 0xf);
	q = (d0 * 0xcd) ...
From: Rob Landley
Date: Monday, September 24, 2007 - 6:18 pm

I point out that the thread started with a comment about how to _reduce_ 
bloat.

Just sayin'.

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-

From: Rob Landley
Date: Monday, September 24, 2007 - 4:51 pm

I actively want to avoid giving message numbers.  My interest is in 
selectively removing messages from the kernel to shrink the binary size (and 
NOT make it up in either a larger userspace utility to translate them, or 
else magic proprietary numbers you can only diagnose if you pay my support 

This does not address my problem.  Spitting out a proprietary hash code 
instead of a human readable message is not a solution for my use case.

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-

From: Joe Perches
Date: Monday, September 24, 2007 - 5:10 pm

What is your problem Rob?

I think message numbers are difficult to produce from format strings.
I think kernel version/file/func/line is enough to identify messages
for normal use but not too useful for embedded.

I think losslessly compressing, not hashing, the format string
would be useful.  The format string would be expanded by printk.
The kernel size would be smaller and more easily fit in minimal RAM.
Losslessly compressing the format strings probably won't reduce
flash image size.

cheers, Joe

-

From: Rob Landley
Date: Monday, September 24, 2007 - 6:46 pm

The single largest space savings in the existing -tiny patches comes from 
removing printk() calls and strings.  I would like to be able to selectively 
do this based on severity level, which is information most existing printk() 
calls already have.  I proposed a minimal change to how printk() works, 
allowing the compiler to remove unused code that wouldn't be below the 
displayed level of printk() anyway in the deployed product so wouldn't 
actually lose any output.

The kernel image is usually already compressed in flash and decompressed to 
dram during boot.  (Not always, sometimes it's run directly out of flash, but 
there's often a speed penalty for doing this, you have to set it up 
specially, and dram is cheaper than flash anyway.)  This means recompressing 
it doesn't help save flash.

If you want to save dram, have printk and associated strings be a function in 
a module that's demand loaded and unloaded again after each call.  Then you 
can foist compression off on userspace, and we're already used to modules 
having to match a given kernel version exactly.  Why come up with new 
infrastructure?

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-

From: Michael Holzheu
Date: Tuesday, September 25, 2007 - 1:57 am

It does not give us message numbers. It allows us to tag printks, which
have more than one line, with exactly one number.

Michael

-

From: Jesse Barnes
Date: Monday, September 24, 2007 - 11:26 am

I wonder if all these levels are still needed (though I really like the 
error/err & warning/warn aliases, those always get me :).

It seems like fewer levels would make things easier on both kernel 
developers and administrators; looking at current counts may help 
figure out which ones could be combined (warning, very naive grep -r 
data):

KERN_EMERG: 371
KERN_ALERT: 236
KERN_CRIT: 602
KERN_ERR: 11961
KERN_WARNING: 6463
KERN_NOTICE: 1142
KERN_INFO: 8491
KERN_DEBUG: 6125

So KERN_ERR is the most common by a pretty large margin, though it seems 
to me that KERN_NOTICE, KERN_INFO and KERN_DEBUG are mostly redundant 
and probably make up a majority of the "SIMD FPU exception support was 
enabled" (as if I care) type messages.  Likewise, ERR, ALERT, CRIT and 
EMERG serve very similar purposes (i.e. something unrecoverable 
occurred), maybe they could be condensed into one or two levels rather 
than four?  So that would drop us to three levels:

KERN_ERR /* something really bad happened, machine is dead or near so */
KERN_WARNING /* you really ought to know about this */
KERN_INFO /* no one but the kernel developer likely cares about this */

But maybe I'm just living in a dream world where then number of printks 
the kernel spits out suddenly drops by 99% and only actually important 
messages make it to my log...

Jesse
-

Previous thread: [PATCH] Add extended crashkernel syntax to kernel-parameters.txt by Bernhard Walle on Saturday, September 22, 2007 - 11:25 am. (1 message)

Next thread: [PATCH] Update version information by Bernhard Walle on Saturday, September 22, 2007 - 12:55 pm. (1 message)