Miklos, Al, Ulrich, Could you please review the following patch. This is a revised version of my earlier (http://article.gmane.org/gmane.linux.man/129/ ) patch to fix non-conformances in the utimensat() implementation. The patch is against 2.6.22-rc2. Miklos wrote a patch that is already in 2.6.22-rc2 to fix the security issues that he saw from my earlier mail. Miklos's patch also introduced a few spec non-conformances, but provided me with some pointers to how to improve this version of my patch. The following paragraphs summarize the rules that this patch implements: Historical permissions rules for target file (utime(), utimes()): [a] The EACCES error (only) occurs if times is NULL: The times argument is a null pointer and the effective user ID of the process does not match the owner of the file and write access is denied. [b] The EPERM error (only) occurs if times is not NULL (i.e., both times are being changed): The times argument is not a null pointer, the calling process' effective user ID does not match the owner of the file, and the calling process does not have appropriate privileges. (As noted in a thread on the security@ list, the current spec for utimensat() needlessly makes mention of "write access" for the EPERM error. I've raise a bug against the spec, and it's recognized as something that needs to be fixed.) My summary of the rules from the draft spec for utimensat() in the upcoming POSIX.1 revision: [c] No error needs to be generated if times == {UTIME_OMIT,UTIME_OMIT}. [d] The times == {UTIMES_NOW,UTIMES_NOW} case should be treated like times == NULL. [e] The times == {UTIMES_NOW,UTIMES_OMIT} and times == {UTIMES_OMIT,UTIMES_NOW} cases should be treated like times == {m,n}. Further historical Linux rules, for files with "ext2" extended file attributes (see chattr(1)). [f] Append-only attribute set: If times == NULL, and we own the ...
For completeness here's the test program again:
/* t_utimensat.c
Copyright (C) 2008, Michael Kerrisk <mtk.manpages@gmail.com>
Licensed under the GPLv2 or later.
A command-line interface for testing the utimensat() system
call.
17 Mar 2008 Initial creation
*/
#define _GNU_SOURCE
#define _ATFILE_SOURCE
#include <stdio.h>
#include <time.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
#define __NR_utimensat 320 /* x86 syscall number */
# define UTIME_NOW ((1l << 30) - 1l)
# define UTIME_OMIT ((1l << 30) - 2l)
static inline int
utimensat(int dirfd, const char *pathname,
const struct timespec times[2], int flags)
{
return syscall(__NR_utimensat, dirfd, pathname, times, flags);
}
static void
usageError(char *progName)
{
fprintf(stderr, "Usage: %s pathname [atime-sec "
"atime-nsec mtime-sec mtime-nsec]\n\n", progName);
fprintf(stderr, "Permitted options are:\n");
fprintf(stderr, " [-d path] "
"open a directory file descriptor"
" (instead of using AT_FDCWD)\n");
fprintf(stderr, " -w Open directory file "
"descriptor with O_RDWR (instead of O_RDONLY)\n");
fprintf(stderr, " -n Use AT_SYMLINK_NOFOLLOW\n");
fprintf(stderr, "\n");
fprintf(stderr, "pathname can be \"NULL\" to use NULL "
"argument in call\n");
fprintf(stderr, "\n");
fprintf(stderr, "Either nsec field can be\n");
fprintf(stderr, " 'n' for UTIME_NOW\n");
fprintf(stderr, " 'o' for UTIME_OMIT\n");
fprintf(stderr, "\n");
fprintf(stderr, "If the time fields are omitted, "
"then a NULL 'times' argument is used\n");
fprintf(stderr, "\n");
exit(EXIT_FAILURE);
}
int
main(int ...The patch looks functionally correct. But there are several things I
How about explicitly turning UTIME_NOW/UTIME_NOW into times = NULL at
the beginning of the function? That would both simplify things and
also make it absolutely sure that the two cases are handled the same
I don't like adding _more_ owner checks to this function. It would be
better if we were removing them: some weird filesystems want to do
their own permission checking and so the owner checks should really be
moved into inode_change_ok().
One way to do that could be to add a pseudo attribute flag,
e.g. ATTR_TIMES_UPDATE, that tells the permission checking code to
check the owner, even when neither ATTR_[AM]TIME_SET flag is set.
Even the check for the owner in the !times case could be removed, by
adding ATTR_TIMES_UPDATE only if we don't have write permission on the
file. That's a cleanup I'd really be happy with.
All this may also be done with the ATTR_FORCE flag, but that would
mean:
- modifying lots of call sites
- making it impossible to selectively check the permission if
multiple attributes are being modified (don't know if any callers
want that though).
Miklos
--
Miklos,
Thanks for the comments.
Okay, I won't go that route...
Regarding your suggestions above, are you meaning something like the
patch below?
The patch is a little less pretty than I'd like because of the need to
return EACCES or EPERM depending on whether (times == NULL). In
particular, these lines in utimes.c:
+ if (!times && error == -EPERM)
+ error = -EACCES;
seem a little fragile (but maybe I worry too much).
Cheers,
Michael
========================
diff -ru linux-2.6.26-rc2/fs/attr.c linux-2.6.26-rc2-utimensat-fix/fs/attr.c
--- linux-2.6.26-rc2/fs/attr.c 2008-01-24 23:58:37.000000000 +0100
+++ linux-2.6.26-rc2-utimensat-fix/fs/attr.c 2008-05-16 21:56:51.000000000 +0200
@@ -51,7 +51,8 @@
}
/* Check for setting the inode time. */
- if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) {
+ if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET |
+ ATTR_TIMES_UPDATE)) {
if (!is_owner_or_cap(inode))
goto error;
}
diff -ru linux-2.6.26-rc2/fs/utimes.c linux-2.6.26-rc2-utimensat-fix/fs/utimes.c
--- linux-2.6.26-rc2/fs/utimes.c 2008-05-15 10:33:20.000000000 +0200
+++ linux-2.6.26-rc2-utimensat-fix/fs/utimes.c 2008-05-16 22:14:31.000000000 +0200
@@ -40,14 +40,9 @@
#endif
-static bool nsec_special(long nsec)
-{
- return nsec == UTIME_OMIT || nsec == UTIME_NOW;
-}
-
static bool nsec_valid(long nsec)
{
- if (nsec_special(nsec))
+ if (nsec == UTIME_OMIT || nsec == UTIME_NOW)
return true;
return nsec >= 0 && nsec <= 999999999;
@@ -102,11 +97,15 @@
if (error)
goto dput_and_out;
+ if (times && times[0].tv_nsec == UTIME_NOW &&
+ times[1].tv_nsec == UTIME_NOW)
+ times = NULL;
+
/* Don't worry, the checks are done in inode_change_ok() */
newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
if (times) {
error = -EPERM;
- if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+ if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;
if (times[0].tv_nsec == ...It's not only fragile, it's ugly as sin. I'd rather do it this way: - initialize error to zero - if no write access then set error, and the ATTR_TIMES_UPDATE(*) flag - set error2 from result of notify_change() - if error is zero then return error2, otherwise return error (*) I've been mulling over the name and perhaps ATTR_OWNER_CHECK would be better, or something that implies that it's not really about updating the times, but about checking the owner. Also could you do the patch against the git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git vfs-cleanups tree, which does big structural cleanups to do_utimes? Thanks, --
Something like this (haven't thought it through, totally untested,
etc...)
Miklos
---
fs/utimes.c | 48 ++++++++++++++++++++++--------------------------
1 file changed, 22 insertions(+), 26 deletions(-)
Index: linux-2.6/fs/utimes.c
===================================================================
--- linux-2.6.orig/fs/utimes.c 2008-05-17 08:50:01.000000000 +0200
+++ linux-2.6/fs/utimes.c 2008-05-19 12:08:18.000000000 +0200
@@ -53,7 +53,8 @@ static bool nsec_valid(long nsec)
return nsec >= 0 && nsec <= 999999999;
}
-static int utimes_common(struct path *path, struct timespec *times)
+static int utimes_common(struct path *path, struct timespec *times,
+ int write_error)
{
int error;
struct iattr newattrs;
@@ -76,11 +77,18 @@ static int utimes_common(struct path *pa
newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
newattrs.ia_valid |= ATTR_MTIME_SET;
}
+ newattrs.ia_valid |= ATTR_OWNER_CHECK;
+ } else if (write_error) {
+ newattrs.ia_valid |= ATTR_OWNER_CHECK;
}
+
mutex_lock(&path->dentry->d_inode->i_mutex);
error = path_setattr(path, &newattrs);
mutex_unlock(&path->dentry->d_inode->i_mutex);
+ if (write_error && error)
+ error = write_error;
+
return error;
}
@@ -97,21 +105,16 @@ static bool utimes_need_permission(struc
static int do_futimes(int fd, struct timespec *times)
{
int error;
+ int write_error = 0;
struct file *file = fget(fd);
if (!file)
return -EBADF;
- if (utimes_need_permission(times)) {
- struct inode *inode = file->f_path.dentry->d_inode;
+ if (!times && !(file->f_mode & FMODE_WRITE))
+ write_error = -EACCES;
- error = -EACCES;
- if (!is_owner_or_cap(inode) && !(file->f_mode & FMODE_WRITE))
- goto out_fput;
- }
- error = utimes_common(&file->f_path, times);
-
- out_fput:
+ error = utimes_common(&file->f_path, times, write_error);
fput(file);
return error;
@@ -121,6 +124,7 @@ static int do_utimes_name(int dfd, char
struct timespec *times, int ...Miklos, You keep moving the goalposts here... First, it was provide an obvious correct fix (for the non-conformances); then: can you cleanup the owner checks; then: can you rewrite against my git tree... My time at the moment is fairly limited, and I have a problem: currently, I'm not a git user. That'll change soon, but I don't have the time to change it now. Can I just download a snapshot tarball of your git changes somehwere? Alternatively, when do you expect your changes to make it into an rc? Cheers, Michael --
Sorry, that was just an idea, but since it's not as simple as it
should be, I guess we should leave that till later. My main objection
was against introducing more is_owner_or_cap() checks. Just doing the
times == NULL case with ATTR_OWNER_CHECK should be fine.
That reminds me, one more comment about the patch: if you are
reindenting the ATTR_* definitions anyway, why not also change them to
Here's the relevant part (dfe9b50d..aeb1fe4b) of that tree as a single
patch. I hope it compiles.
Thanks,
Miklos
diff --git a/fs/attr.c b/fs/attr.c
index 966b73e..e8bd11b 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -108,6 +108,12 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
struct timespec now;
unsigned int ia_valid = attr->ia_valid;
+ if (ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID |
+ ATTR_ATIME_SET | ATTR_MTIME_SET)) {
+ if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+ return -EPERM;
+ }
+
now = current_fs_time(inode->i_sb);
attr->ia_ctime = now;
diff --git a/fs/exec.c b/fs/exec.c
index 1f8a24a..b68682a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1763,7 +1763,7 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
goto close_fail;
if (!file->f_op->write)
goto close_fail;
- if (!ispipe && do_truncate(file->f_path.dentry, 0, 0, file) != 0)
+ if (!ispipe && file_truncate(file, 0, 0) != 0)
goto close_fail;
retval = binfmt->core_dump(signr, regs, file, core_limit);
diff --git a/fs/open.c b/fs/open.c
index a145008..bb604a6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -195,6 +195,13 @@ out:
return error;
}
+/*
+ * do_truncate - truncate (or extend) an inode
+ * @dentry: the dentry to truncate
+ * @length: the new length
+ * @time_attrs: file times to be updated (e.g. ATTR_MTIME|ATTR_CTIME)
+ * @filp: an open file or NULL (see file_truncate() as well)
+ */
int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
struct file *filp)
{
@@ -221,6 +228,17 @@ int ...Hi Miklos,
Here's a further version of the patch, against 2.6.26rc2, with the
2008-05-19 git changes you sent me applied. This patch is based on
the draft patch you sent me. I've tested this version of the patch,
and it works as for all cases except the one mentioned below. But
note the following points:
1) I didn't make use of the code in notify_change() that checks
IS_IMMUTABLE() and IS_APPEND() (i.e., I did not add
ATTR_OWNER_CHECK to the mask in the controlling if statement).
Doing this can't easily be made to work for the
do_futimes() case without reworking the arguments passed to
notify_change(). Actually, I'm inclined to doubt whether it
is a good idea to try to roll that check into notify_change() --
at least for utimensat() it seems simpler to not do so.
2) I've found yet another divergence from the spec -- but this
was in the original implementation, rather than being
something that has been introduced. In do_futimes() there is
if (!times && !(file->f_mode & FMODE_WRITE))
write_error = -EACCES;
However, the check here should not be against the f_mode (file access
mode), but the against actual permission of the file referred to by
the underlying descriptor. This means that for the do_futimes() +
times==NULL case, a set-user-ID root program could open a file
descriptor O_RDWR/O_WRONLY for which the real UID does not have write
access, and then even after reverting the the effective UID, the real
user could still update file.
I'm not sure of the correct way to get the required nameidata (to do a
vfs_permission() call) from the file descriptor. Can you give me a
tip there?
Cheers,
Michael
--- linux-2.6.26-rc2-miklos.git-080519/fs/utimes.c 2008-05-19 17:40:37.000000000 +0200
+++ linux-2.6.26-rc2-miklos.git-080519-utimensat-fix-v3/fs/utimes.c 2008-05-30 16:29:00.000000000 +0200
@@ -53,14 +53,19 @@
return nsec >= 0 && nsec <= 999999999;
}
-static int utimes_common(struct path *path, struct timespec *times)
+static ...Ugh... Could we just omit this part (the if !times and write error then check owner)? I know it was my idea, but a) my ideas are often stupid b) one patch should ideally do just one thing After we fixed the original issue, we can still think about this other Sure, but so could a write(2), so that doesn't seem such a big problem. I think we should leave it this way, since changing it would affect not just utimensat() and futimesat() but utime() and utimes() as well, which are well established, old interfaces. Shanging them could in theory break userspace, which we try to avoid if possible. Thanks, Miklos --
Miklos, Okay, by now quite a bit of my time has been wasted, and my patience is starting to get a little thin. I already fixed most of the isues with utimensat() in my previous version of the patch several days back, and that patch (probably still) applies against current mainline. The one issue that wasn't fixed by my earlier patch was the one below, which I've only just It is a problem, because every portable user program that uses this interfaces, and relies on this corner of behavior, will have to special case for Linux. Can you tell me one good reason why we should do that? (And preserving bugs because fixing them would break the ABI is not a good reason.) The relevant interfaces here are: utimensat() futimesat() utime() utimes() futimens() -- because implemented in glibc via utimensat() with path==NULL. * utime() and utimes() can't be affected by this point: they don't use file descriptors. * futimesat() doesn't matter: it is a non-standad interface that was prematurely added to the kernel, and then promptly replaced with utimensat(). No other OS will add this interface, and no-one will ever use it on Linux. Its manual page will very soon say as much. * utimensat() and futimens() matter, because they are currently broken on this point (as well as a number of others). This is a bug. It is one of *several* bugs in the original implementation of the utimensat()/futimens() interface. All of them should be fixed. I have by now provided fixes for most of them. (Not point 2 above, but with a little help that should be quickly fixed as well.) At this point, I think you need to explain why you think those fixes shouldn't be applied. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
I understand your frustration, but actually I did say this the last time as well: "Sorry, that was just an idea, but since it's not as simple as it should be, I guess we should leave that till later. My main objection was against introducing more is_owner_or_cap() checks. Just doing the times == NULL case with ATTR_OWNER_CHECK should be fine." OK, you're right. I've overlooked this point. So as long as we believe that nobody is using the futime*() interfaces in the way you described, which is highly probable, then we can fix that as well. Which is actually a nice thing, because it means the permission checking for the times == NULL case can move from both do_utimes_name() and do_futimes() into utimes_common(). So let's make two patches, and let's forget about the write_error thing for now: 1) - turn UTIME_NOW/UTIME_NOW into times = NULL - for times != NULL set ATTR_OWNER_CHECK 2) - move times == NULL permission check into utimes_common. For 2) you can just use permission() instead of vfs_permission() which is exactly the same in this case (and consolidated into a common function later in the vfs-cleanups tree). OK? Thanks, Miklos --
Miklos, don't delude people into thinking the vfs-cleanups tree is ever going to get merged in its current state. Al's NAKed the path_* stuff, Christoph's NAKed it too. Ignoring them and putting up a VFS tree of your own is not going to help matters. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Actually, the cleanup patches which I'm asking Michael to base his patches on were not NAKed. And Michael is not even working against the vfs-cleanups tree, but rather just a couple of these utimes cleanup patches, none of which are controversial like the path_* stuff. But even if he was working against the vfs-cleanups tree it wouldn't matter, since porting Do please tell me how else should I work? I post patches for review, gradually, not as a 100 patch series, when we manage to submit apparmor. That far easier for people to handle, and sure enough I get comments are testing for this stuff. As for Al's NAK: the only alternative suggestion he has been able to come up is to move the security hooks into callers. And that has been NAKed (not surprisingly) by the selinux folks. And there's nothing actually _wrong_ with those patches, they don't add complexity (actually they remove complexity), they don't slow down the kernel, and they even fix some bugs. Now what the hell more do we need? Thanks, Miklos --
Miklos, You omitted to answer my question, the last sentence below: On Fri, May 30, 2008 at 8:24 PM, Michael Kerrisk Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
Oh, come on. Where in my last mail did I say that these fixes shouldn't be applied? I was only objecting to putting a fix and an independent cleanup (which by now I realized was not such a good idea) into a single patch. Miklos --
I didn't say you did. However, while saying "I understand your frustration", you simply ignored this question, which I would say was fairly important in the context, and asked me to write another version of the patch against your tree. My takeaway from this is that you attach a higher priority to seeing that your tree isn't broken, than to seeing that these bugs get fixed. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
On Fri, 30 May 2008 20:24:22 +0200 Well yeah. This has been dragging on for ages. I was going to just apply this patch and stomp off again but this: "against 2.6.26rc2, with the 2008-05-19 git changes you sent me applied" stymied me. Please, if poss, just send a standalone, fully-changelogged, not-referential-to-some-random-email-discussion patch and let's get going. (Where's Ulrich, btw?) --
Andrew, On Fri, May 30, 2008 at 10:17 PM, Andrew Morton Yes, I'll send it to you by Tuesday latest. Sorry I probably can't manage sooner. Am travelling part of Monday, and trying to spend That's what I'd like to know.... Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
Could you point me at the right way of doing this? Cheers, Michael --
You don't need nameidata for this at all. Just call permission() with a NULL nameidata. Ugly API? Yes, will be cleaned up if we manage to find some common ground with the VFS maintainers. Miklos --
As soon as I'm done with sysctls... FWIW, I very much doubt that you are right wrt required permissions, though. AFAICS, intent here is "if you can write to file, you can touch the timestamps anyway" and having descriptor opened for write gives that, current permissions be damned. --
The standard is pretty clear on this point: [[ Only a process with the effective user ID equal to the user ID of the file, or with write access to the file, or with appropriate privileges may use futimens( ) or utimensat( ) with a null pointer as the times argument or with both tv_nsec fields set to the special value UTIME_NOW. ]] The crucial words here are "a process ... with write access to the file" -- in other words, the permissions are determined by the process's credentials, not by the access mode of the file descriptor. I was not 100% sure on that to start with, so I did check it out with one of the folk at The Open Group, to make sure of my understanding. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
Is there anything else where the file descriptor's access mode allows doing things on Linux, but the standard requires a permissions check each time? -- Jamie --
Jamie, I can't think of examples offhand -- but I'm also not quite sure what your question is about. Could you say a little more? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html --
"Is anything else equally stupid?", I suspect... AFAICS, behaviour in question is inherited from futimes(2) in one of the *BSD - nothing to do about that now (at least 10 years too late). It's rather inconsistent with a lot of things, starting with "why utimes(2) has weaker requirements with NULL argument", but we are far too late to fix that. --
PS: as far as I can reconstruct what had happened there, they've got these checks buried directly in ufs_setattr() and its ilk, which worked for utimes(2), but had bitten them when they tried to do descriptor-based analog... --
To be fair, having a writable file descriptor only lets you change the mtime to "now", and having a readable file descriptor only lets you change the atime to "now". Changing the times _in general_ can be seen as over-reaching those capabilities and arguably justifies more strict checks. E.g. setting times in the past, you can break some caching systems, Make, etc. Setting times to "now" will not break those things. (A bit analogous with O_APPEND vs. O_WRITE. Someone hands you an O_APPEND descriptor and they can continue to assume you won't clobber earlier records in their file.) -- Jamie --
Which is what all questions about writability apply only to NULL case anyway... --
Oh! So if I have the file descriptor, I can just as well change the mtime by read a byte and write it back. Or even do a zero-length write, on some OSes. -- Jamie --
Can't you just do that independently (for now just put a d_find_alias() in there and be done with it)? If you fix every piece of horrid code that you come across, it'll never be done... Miklos --
There's not much left to do, actually... FWIW, solution goes like this: * introduce structure on the classes of sysctls (currently - root and per-network-namespace). Namely "X is parent of Y", with "if task T sees Y, it also sees X" as defining property. * when adding a sysctl table, find a "parent" one. Which is to say, find the deepest node on its stem that already is present in one of the tables from our class or its ancestor classes. That table will be our parent and that node in it - attachment point. * delay freeing the table headers; have them refcounted and instead of unconditionally freeing the sucker on unregistration just drop the refcount. Now we can keep a pair (reference to header, pointer to ctl table entry) as long as we hold refcount on header. It won't affect unregistration in any way. And at any point we can try to acquire "active" (use) reference to header. If that succeeds, we know that + unregistration hadn't been started + unregistration won't be finished until we unuse the sucker + table entry is alive and will stay alive until then. So we can hold references to those puppies from inodes under /proc/sys without blocking unregistration, etc. What's more, we can associate such pair with each node in sysctl tree. For non-directories that's obvious. For directories, take the tree such that directory belongs to tree \setminus parent of tree. That's pretty much it. Filesystem side is simple - we keep a pointer to class of tree responsible for a node (see directly above) in dentry. And ->d_compare() checks that class of candidate match should be visible for task doing the lookup. ->lookup() tries finding an entry with requested name in sysctl table (found by directory inode) and in case of miss it goes through the list of tables attached at that node, searching in those that ought to be visible to us. As the result, we have direct access to sysctl table entry right from inode, maintain these references accross lookups without going ...
