Re: [PATCHSET 3/4] sysfs: divorce sysfs from kobject and driver model

Previous thread: [PATCHSET 2/4] sysfs: allow suicide by Tejun Heo on Thursday, September 20, 2007 - 12:26 am. (31 messages)

Next thread: none
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Subject: [PATCHSET 3/4] sysfs: divorce sysfs from kobject and driver model

Hello, all.

This is the third patchset of four sysfs update patchset series[1] and
to be applied on top of the second patchset[2].

Currently, sysfs interface is based on kobj.  This made more sense
before because lifetime of sysfs nodes were tracked using kobj
reference counts.  However, this is no longer true.  sysfs nodes are
represented with a sysfs_dirent and external reference is severed
immediately on node removal.  The internal implementation reflects
that too and mostly handles sysfs_dirents.

This patchset divorces sysfs from kobject and driver model by
implementing sysfs_dirent based interface.  This has the following
advantages.

* sysfs becomes a separate module and driver model becomes a user of
  sysfs.  Those two are not entangled anymore.  Things are easier to
  understand and test this way.

* Non-driver model users of sysfs (modules, blkdev, etc...) don't have
  to jump through hoops to use sysfs.  kobj based interface requires
  attribute wrapping and is awkward to use directly.  Also, the user
  is required to create a dummy kobj which doesn't serve much purpose
  than being a token for sysfs reference.  New sysfs-dirent based
  interface is straight forward proc-fs like interface and should be
  easier and more intuitive for those users.

* As kobj didn't really represent what actually populate sysfs,
  interface was a bit messy - there was no way to reference a leaf
  node other than using its textual name and directories which aren't
  associated with a kobj needed separate interface, which in turn,
  made adding new features difficult.  New interface is leaner and
  more flexible.

kobject based interface is reimplemented as wrapper functions on top
of the new sysfs_dirent based interface.  Long term plan is to update
kobject based users one-by-one and deprecate kobject based interface.

This change doesn't intend to replace driver-model based interface or
encourage ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Currently name is implicitly copied for directory and symlink nodes.
Make the behavior flexible by making it a flag.  If SYSFS_COPY_NAME
bit is specified in @mode when calling sysfs_new_dirent(), the name is
copied.

SYSFS_COPY_NAME is defined as S_IFMT so that it can be specified with
@mode bits.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c        |   21 +++++++++++++++------
 fs/sysfs/symlink.c    |    3 ++-
 fs/sysfs/sysfs.h      |    2 +-
 include/linux/sysfs.h |   10 ++++++++++
 4 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 02918d3..584f17c 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -14,6 +14,11 @@
 #include <linux/mutex.h>
 #include "sysfs.h"
 
+/* verify all mode flags are inside S_IFMT */
+#if (SYSFS_MODE_FLAGS & ~S_IFMT)
+#error SYSFS mode flags out of S_IFMT
+#endif
+
 DEFINE_MUTEX(sysfs_mutex);
 DEFINE_MUTEX(sysfs_rename_mutex);
 spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
@@ -275,7 +280,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 
 	if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
 		sysfs_put(sd->s_symlink.target_sd);
-	if (sysfs_type(sd) & SYSFS_COPY_NAME)
+	if (sd->s_flags & SYSFS_FLAG_NAME_COPIED)
 		kfree(sd->s_name);
 	kfree(sd->s_iattr);
 	sysfs_free_ino(sd->s_ino);
@@ -301,11 +306,12 @@ static struct dentry_operations sysfs_dentry_ops = {
 /**
  *	sysfs_new_dirent - allocate and initialize sysfs_dirent
  *	@name: name for the new sysfs_dirent
- *	@mode: mask of bits from S_IRWXUGO
+ *	@mode: mask of bits from S_IRWXUGO | SYSFS_COPY_NAME
  *	@type: one of SYSFS_{DIR|FILE|BIN|LINK}
  *
  *	Allocate and initialize a sysfs_dirent with the specified
- *	parameters.
+ *	parameters.  If SYSFS_COPY_NAME is specified in @mode, @name
+ *	is duplicated.
  *
  *	LOCKING:
  *	Kernel thread context (may sleep).
@@ -316,14 +322,17 @@ static struct dentry_operations sysfs_dentry_ops = {
  */
 struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Currently sysfs_new_dirent() takes @mode and @type.  @mode contains
S_IF* mask which is also duplicately indicated by @type.  This patch
sysfs_new_dirent() determine S_IF* mask and normalize @mode such that
only bits from S_IALLUGO is taken as @mode.

This is to allow later use of unused @mode bits as special flags.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c     |   36 +++++++++++++++++++++++++++++++++++-
 fs/sysfs/file.c    |    2 +-
 fs/sysfs/symlink.c |    2 +-
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index ba631eb..02918d3 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,17 +298,51 @@ static struct dentry_operations sysfs_dentry_ops = {
 	.d_iput		= sysfs_d_iput,
 };
 
+/**
+ *	sysfs_new_dirent - allocate and initialize sysfs_dirent
+ *	@name: name for the new sysfs_dirent
+ *	@mode: mask of bits from S_IRWXUGO
+ *	@type: one of SYSFS_{DIR|FILE|BIN|LINK}
+ *
+ *	Allocate and initialize a sysfs_dirent with the specified
+ *	parameters.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).
+ *
+ *	RETURNS:
+ *	Pointer to the new sysfs_dirent on success, NULL on allocation
+ *	failure.
+ */
 struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 {
 	char *dup_name = NULL;
 	struct sysfs_dirent *sd;
 
+	/* need to copy name? */
 	if (type & SYSFS_COPY_NAME) {
 		name = dup_name = kstrdup(name, GFP_KERNEL);
 		if (!name)
 			return NULL;
 	}
 
+	/* normalize mode */
+	mode &= S_IALLUGO;
+
+	switch (type) {
+	case SYSFS_DIR:
+		mode |= S_IFDIR;
+		break;
+	case SYSFS_KOBJ_ATTR:
+	case SYSFS_KOBJ_BIN_ATTR:
+		mode |= S_IFREG;
+		break;
+	case SYSFS_KOBJ_LINK:
+		mode |= S_IFLNK;
+		break;
+	}
+
+	/* allocate and initialize */
 	sd = kmem_cache_zalloc(sysfs_dir_cachep, GFP_KERNEL);
 	if (!sd)
 		goto err_out1;
@@ -607,7 +641,7 @@ struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
 static int create_dir(struct ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

In the upcoming new sysfs interface, sysfs_root will be exported.  To
ease usage and make dummy declaration easier, make sysfs_root a
pointer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c     |    4 ++--
 fs/sysfs/mount.c   |    8 +++++---
 fs/sysfs/symlink.c |    2 +-
 fs/sysfs/sysfs.h   |    2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index fe8270c..ba631eb 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -651,7 +651,7 @@ int sysfs_create_dir(struct kobject * kobj)
 	if (kobj->parent)
 		parent_sd = kobj->parent->sd;
 	else
-		parent_sd = &sysfs_root;
+		parent_sd = sysfs_root;
 
 	error = create_dir(kobj, parent_sd, kobject_name(kobj), &sd);
 	if (!error)
@@ -832,7 +832,7 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
 
 	mutex_lock(&sysfs_rename_mutex);
 	BUG_ON(!sd->s_parent);
-	new_parent_sd = new_parent_kobj->sd ? new_parent_kobj->sd : &sysfs_root;
+	new_parent_sd = new_parent_kobj->sd ? new_parent_kobj->sd : sysfs_root;
 
 	error = 0;
 	if (sd->s_parent == new_parent_sd)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 465902c..d00d4b9 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -23,7 +23,7 @@ static const struct super_operations sysfs_ops = {
 	.drop_inode	= generic_delete_inode,
 };
 
-struct sysfs_dirent sysfs_root = {
+static struct sysfs_dirent sysfs_root_storage = {
 	.s_name		= "",
 	.s_count	= ATOMIC_INIT(1),
 	.s_flags	= SYSFS_DIR,
@@ -31,6 +31,8 @@ struct sysfs_dirent sysfs_root = {
 	.s_ino		= 1,
 };
 
+struct sysfs_dirent * const sysfs_root = &sysfs_root_storage;
+
 static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
@@ -44,7 +46,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
 	sysfs_sb = sb;
 
 	/* get root inode, initialize and unlock it */
-	inode = sysfs_get_inode(&sysfs_root);
+	inode = sysfs_get_inode(sysfs_root);
 ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Restruct addrm helpers such that it can remove nodes under different
parents at one go.

sysfs_addrm_start() now doesn't take @sd.  It now only initializes
@acxt and lock sysfs_mutex.  Parent inode lookup now lives in
sysfs_addrm_get_parent_inode() and sysfs_add/remove_one() call them
directly.  The function unlocks i_mutex of the previous inode and
grabs and locks the inode for the specified @parent_sd.  This allows
sysfs_add/remove_one() to be called for any node.

This will be used to recursively remove sysfs nodes.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c     |  144 +++++++++++++++++++++++++++++++++------------------
 fs/sysfs/file.c    |    4 +-
 fs/sysfs/inode.c   |    2 +-
 fs/sysfs/symlink.c |    4 +-
 fs/sysfs/sysfs.h   |   10 ++--
 5 files changed, 103 insertions(+), 61 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7a6be9a..b1cf090 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -375,6 +375,27 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 	return NULL;
 }
 
+/**
+ *	sysfs_addrm_start - prepare for sysfs_dirent add/remove
+ *	@acxt: pointer to sysfs_addrm_cxt to be used
+ *
+ *	This function is called when the caller is about to add or
+ *	remove sysfs_dirents.  This function initializes @acxt and
+ *	acquires sysfs_mutex.  @acxt is used to keep and pass context
+ *	to other addrm functions.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).  sysfs_mutex is locked on
+ *	return.
+ */
+void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt)
+{
+	memset(acxt, 0, sizeof(*acxt));
+	acxt->removed_tail = &acxt->removed;
+
+	mutex_lock(&sysfs_mutex);
+}
+
 static int sysfs_ilookup_test(struct inode *inode, void *arg)
 {
 	struct sysfs_dirent *sd = arg;
@@ -382,39 +403,53 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
 }
 
 /**
- *	sysfs_addrm_start - prepare for sysfs_dirent add/remove
- *	@acxt: pointer to sysfs_addrm_cxt to be used
- ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Sysfs is about to get a new interface which is independent from the
driver model and kobject.  Create include sysfs-kobject.h and move all
kobject based interface into it.  Also, create fs/sysfs/kobject.c
which is currently empty but will host compatibility interface
functions based on the new interface.

sysfs-kobject.h is automatically included from sysfs.h for
compatibility for now.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/Makefile             |    2 +-
 fs/sysfs/kobject.c            |   15 +++
 include/linux/sysfs-kobject.h |  200 +++++++++++++++++++++++++++++++++++++++++
 include/linux/sysfs.h         |  197 +++-------------------------------------
 4 files changed, 230 insertions(+), 184 deletions(-)
 create mode 100644 fs/sysfs/kobject.c
 create mode 100644 include/linux/sysfs-kobject.h

diff --git a/fs/sysfs/Makefile b/fs/sysfs/Makefile
index 7a1ceb9..f58bce9 100644
--- a/fs/sysfs/Makefile
+++ b/fs/sysfs/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-y		:= inode.o file.o dir.o symlink.o mount.o bin.o \
-		   group.o
+		   group.o kobject.o
diff --git a/fs/sysfs/kobject.c b/fs/sysfs/kobject.c
new file mode 100644
index 0000000..5ebd755
--- /dev/null
+++ b/fs/sysfs/kobject.c
@@ -0,0 +1,15 @@
+/*
+ * sysfs/kobject.c - compatibility sysfs interface based on kobject
+ *
+ * Copyright (c) 2001,2002 Patrick Mochel
+ * Copyright (c) 2004 Silicon Graphics, Inc.
+ *
+ * This is compatibility interface which wraps the primary interface
+ * defined in linux/sysfs.h to remain compatible with the original
+ * kobject based interface.
+ *
+ * Please see Documentation/filesystems/sysfs.txt for more information.
+ */
+
+#include <linux/sysfs.h>
+#include <linux/sysfs-kobject.h>
diff --git a/include/linux/sysfs-kobject.h b/include/linux/sysfs-kobject.h
new file mode 100644
index 0000000..4c821c2
--- /dev/null
+++ b/include/linux/sysfs-kobject.h
@@ -0,0 +1,200 @@
+/*
+ * sysfs-kobject.h - compatibility sysfs interface based on kobject
+ *
+ * Copyright (c) ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Convert sd->s_dir.kobj to sd->s_dir.data and make it void *, and
implement sd based directory creation function sysfs_add_dir().  Using
this function the caller can create directory node with arbitrary user
data.  Also, name copying is not implicit.  Name is copied iff
SYSFS_COPY_NAME should be specified in @mode.

* sysfs_root is exported.  To be used as @parent when creating a node
  below the sysfs root.

* Kobject-based sysfs_create_dir() is reimplemented in terms of
  sysfs_add_dir() and moved to fs/sysfs/kobject.c.

* Users of sysfs_create_dir() in sysfs are converted to use
  sysfs_add_dir().

* attr and bin_attr still can cope only with kobject sd->s_dir.data,
  so sysfs_add_dir() can't be used with arbitrary pointer yet.  This
  will be changed by following patches.

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/bin.c        |    6 +-
 fs/sysfs/dir.c        |  108 +++++++++++++++++++++++++++----------------------
 fs/sysfs/file.c       |    6 +-
 fs/sysfs/group.c      |    8 ++--
 fs/sysfs/kobject.c    |   26 ++++++++++++
 fs/sysfs/mount.c      |    2 +
 fs/sysfs/sysfs.h      |    8 +--
 include/linux/sysfs.h |   16 +++++++
 8 files changed, 117 insertions(+), 63 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 1c12cf0..91e573f 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -31,7 +31,7 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count)
 {
 	struct sysfs_dirent *attr_sd = dentry->d_fsdata;
 	struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
-	struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
+	struct kobject *kobj = attr_sd->s_parent->s_dir.data;
 	int rc;
 
 	/* need attr_sd for attr, its parent for kobj */
@@ -88,7 +88,7 @@ flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count)
 {
 	struct sysfs_dirent *attr_sd = dentry->d_fsdata;
 	struct bin_attribute *attr = ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Currently, regular sysfs files are called (kobj) attrs or files.  This
is a bit confusing and sd-based interface won't be bound to attribute.
Use 'file' consistently and drop all kobj and attr from symbol names.

This patch doesn't introduce any logic change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c   |    4 +-
 fs/sysfs/file.c  |  140 +++++++++++++++++++++++++++---------------------------
 fs/sysfs/group.c |    2 +-
 fs/sysfs/inode.c |    2 +-
 fs/sysfs/sysfs.h |    6 +-
 5 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b15ade7..170749d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -341,7 +341,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 	case SYSFS_DIR:
 		mode |= S_IFDIR;
 		break;
-	case SYSFS_KOBJ_ATTR:
+	case SYSFS_FILE:
 	case SYSFS_KOBJ_BIN_ATTR:
 		mode |= S_IFREG;
 		break;
@@ -630,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
 		sd->s_sibling = NULL;
 
 		sysfs_drop_dentry(sd);
-		if (sysfs_type(sd) == SYSFS_KOBJ_ATTR)
+		if (sysfs_type(sd) == SYSFS_FILE)
 			sysfs_file_check_suicide(sd);
 		sysfs_deactivate(sd);
 		sysfs_put(sd);
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 3b74912..ae4d7cb 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -55,7 +55,7 @@ static struct sysfs_ops subsys_sysfs_ops = {
  * files.
  *
  * filp->private_data points to sysfs_buffer and
- * sysfs_dirent->s_attr.open points to sysfs_open_dirent.  s_attr.open
+ * sysfs_dirent->s_file.open points to sysfs_open_dirent.  s_file.open
  * is protected by sysfs_open_dirent_lock.
  */
 static spinlock_t sysfs_open_dirent_lock = SPIN_LOCK_UNLOCKED;
@@ -103,7 +103,7 @@ void sysfs_file_check_suicide(struct sysfs_dirent *sd)
 
 	spin_lock(&sysfs_open_dirent_lock);
 
-	od = sd->s_attr.open;
+	od = sd->s_file.open;
 	if (od) {
 		list_for_each_entry(buffer, &od->buffers, list) {
 			if (buffer->accessor != current)
@@ -144,8 ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Currently, sysfs symlinks are either called symlinks or links.  Rename
everything to link and drop attr and _sd postfix as new interface
won't be bound to kobj.

This patch doesn't introduce any logic change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c     |    6 +++---
 fs/sysfs/inode.c   |    4 ++--
 fs/sysfs/symlink.c |    9 ++++-----
 fs/sysfs/sysfs.h   |   16 ++++++++--------
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 263d346..a20beff 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -277,8 +277,8 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 	 */
 	parent_sd = sd->s_parent;
 
-	if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
-		sysfs_put(sd->s_symlink.target_sd);
+	if (sysfs_type(sd) == SYSFS_LINK)
+		sysfs_put(sd->s_link.target);
 	if (sd->s_flags & SYSFS_FLAG_NAME_COPIED)
 		kfree(sd->s_name);
 	kfree(sd->s_iattr);
@@ -345,7 +345,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 	case SYSFS_BIN:
 		mode |= S_IFREG;
 		break;
-	case SYSFS_KOBJ_LINK:
+	case SYSFS_LINK:
 		mode |= S_IFLNK;
 		break;
 	}
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index d303e62..8df357e 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -168,8 +168,8 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
 		inode->i_size = sd->s_bin.size;
 		inode->i_fop = &sysfs_bin_file_operations;
 		break;
-	case SYSFS_KOBJ_LINK:
-		inode->i_op = &sysfs_symlink_inode_operations;
+	case SYSFS_LINK:
+		inode->i_op = &sysfs_link_inode_operations;
 		break;
 	default:
 		BUG();
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 9c15a32..2c3e4f7 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -82,12 +82,11 @@ int sysfs_create_link(struct kobject * kobj, struct kobject * target, const char
 		goto out_put;
 
 	error = -ENOMEM;
-	sd = sysfs_new_dirent(name, S_IRWXUGO | SYSFS_COPY_NAME,
-			      ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

This patch implements new sysfs_remove(), which takes @sd as target
argument and can be used on any type of node.  Also, it recurses into
sub directories.

* sysfs_remove_file(), sysfs_remove_file_from_group(),
  sysfs_remove_bin_file() are reimplemented using sysfs_remove() in
  fs/sysfs/kobject.c.

* To keep backward compatibility sysfs_remove_dir() is reimplemented
  using internal function __sysfs_remove() with @recurse argument set
  to zero so that it doesn't descend into subdirectories.

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/bin.c        |   13 -----
 fs/sysfs/dir.c        |  125 ++++++++++++++++++++++++++++++++++---------------
 fs/sysfs/file.c       |   35 --------------
 fs/sysfs/group.c      |    4 +-
 fs/sysfs/kobject.c    |   92 ++++++++++++++++++++++++++++++++++++
 fs/sysfs/symlink.c    |   13 -----
 fs/sysfs/sysfs.h      |    3 +-
 include/linux/sysfs.h |    6 ++
 8 files changed, 189 insertions(+), 102 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 247ea19..1c12cf0 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -237,17 +237,4 @@ int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
 	return sysfs_add_file(kobj->sd, &attr->attr, SYSFS_KOBJ_BIN_ATTR);
 }
 
-
-/**
- *	sysfs_remove_bin_file - remove binary file for object.
- *	@kobj:	object.
- *	@attr:	attribute descriptor.
- */
-
-void sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
-{
-	sysfs_hash_and_remove(kobj->sd, attr->attr.name);
-}
-
 EXPORT_SYMBOL_GPL(sysfs_create_bin_file);
-EXPORT_SYMBOL_GPL(sysfs_remove_bin_file);
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b1cf090..7cb3f1e 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -816,64 +816,113 @@ const struct inode_operations sysfs_dir_inode_operations = {
 	.setattr	= sysfs_setattr,
 };
 
-static void remove_dir(struct sysfs_dirent *sd)
+/**
+ ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Currently, binary sysfs files are called (kobj) bin attrs.  sd-based
interface won't be bound to bin_attribute.  Use 'bin' as name and drop
all kobj and attr from symbol names.

This patch doesn't introduce any logic change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/bin.c   |   54 +++++++++++++++++++++++++++---------------------------
 fs/sysfs/dir.c   |    2 +-
 fs/sysfs/inode.c |    6 +++---
 fs/sysfs/sysfs.h |    8 ++++----
 4 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 20c2e78..dad891c 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -29,20 +29,20 @@ struct bin_buffer {
 static int
 fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count)
 {
-	struct sysfs_dirent *attr_sd = dentry->d_fsdata;
-	struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
-	struct kobject *kobj = attr_sd->s_parent->s_dir.data;
+	struct sysfs_dirent *sd = dentry->d_fsdata;
+	struct bin_attribute *attr = sd->s_bin.bin_attr;
+	struct kobject *kobj = sd->s_parent->s_dir.data;
 	int rc;
 
-	/* need attr_sd for attr, its parent for kobj */
-	if (!sysfs_get_active_two(attr_sd))
+	/* need sd for attr, its parent for kobj */
+	if (!sysfs_get_active_two(sd))
 		return -ENODEV;
 
 	rc = -EIO;
 	if (attr->read)
 		rc = attr->read(kobj, attr, buffer, off, count);
 
-	sysfs_put_active_two(attr_sd);
+	sysfs_put_active_two(sd);
 
 	return rc;
 }
@@ -86,20 +86,20 @@ read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
 static int
 flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count)
 {
-	struct sysfs_dirent *attr_sd = dentry->d_fsdata;
-	struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
-	struct kobject *kobj = attr_sd->s_parent->s_dir.data;
+	struct sysfs_dirent *sd = dentry->d_fsdata;
+	struct bin_attribute *attr = sd->s_bin.bin_attr;
+	struct kobject *kobj = sd->s_parent->s_dir.data;
 	int rc;
 
-	/* need attr_sd for attr, its parent for ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Implement sysfs_find_child() which finds a child of a sysfs_dirent by
name.  This function does not grab reference of the found child.  The
caller is supposed to have reference if the child exists.  This
function is useful for callers which own the sysfs_dirent to be looked
up but don't wanna keep a pointer to it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c        |   33 +++++++++++++++++++++++++++++++++
 include/linux/sysfs.h |    9 +++++++++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 584f17c..7a6be9a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -647,6 +647,39 @@ struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
 	return sd;
 }
 
+/**
+ *	sysfs_find_child - find sysfs_dirent with the given name
+ *	@parent: sysfs_dirent to search under
+ *	@name: name to look for
+ *
+ *	This is exported version of sysfs_find_dirent().  This
+ *	function doesn't grab reference to the found dirent.  The
+ *	caller must already have a reference to it.  This function is
+ *	useful for callers which own the sysfs_dirent to be looked up
+ *	but don't wanna keep a pointer to it.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).
+ *
+ *	RETURNS:
+ *	Pointer to the looked up sysfs_dirent on success, NULL if
+ *	there's no such entry, ERR_PTR(-EBADF) is @parent is ERR_PTR()
+ *	value.
+ *
+ */
+struct sysfs_dirent *sysfs_find_child(struct sysfs_dirent *parent,
+				      const char *name)
+{
+	struct sysfs_dirent *sd;
+
+	mutex_lock(&sysfs_mutex);
+	sd = sysfs_find_dirent(parent, name);
+	mutex_unlock(&sysfs_mutex);
+
+	return sd;
+}
+EXPORT_SYMBOL_GPL(sysfs_find_child);
+
 static int create_dir(struct kobject *kobj, struct sysfs_dirent *parent_sd,
 		      const char *name, struct sysfs_dirent **p_sd)
 {
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 5646e56..f030dc6 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -32,10 ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

The function name sysfs_add_file() will be used for sd-based file
interface.  Rename the internal function to __sysfs_add_file().  Note
that the internal function will be removed once the new interface is
in place, so double underscore prefix should do it for the time being.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/bin.c   |    2 +-
 fs/sysfs/file.c  |    6 +++---
 fs/sysfs/group.c |    2 +-
 fs/sysfs/sysfs.h |    4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 91e573f..20c2e78 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -234,7 +234,7 @@ int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
 {
 	BUG_ON(!kobj || !kobj->sd || !attr);
 
-	return sysfs_add_file(kobj->sd, &attr->attr, SYSFS_KOBJ_BIN_ATTR);
+	return __sysfs_add_file(kobj->sd, &attr->attr, SYSFS_KOBJ_BIN_ATTR);
 }
 
 EXPORT_SYMBOL_GPL(sysfs_create_bin_file);
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 8154fbb..3b74912 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -568,7 +568,7 @@ const struct file_operations sysfs_file_operations = {
 };
 
 
-int sysfs_add_file(struct sysfs_dirent *dir_sd, const struct attribute *attr,
+int __sysfs_add_file(struct sysfs_dirent *dir_sd, const struct attribute *attr,
 		   int type)
 {
 	umode_t mode = attr->mode & S_IALLUGO;
@@ -602,7 +602,7 @@ int sysfs_create_file(struct kobject * kobj, const struct attribute * attr)
 {
 	BUG_ON(!kobj || !kobj->sd || !attr);
 
-	return sysfs_add_file(kobj->sd, attr, SYSFS_KOBJ_ATTR);
+	return __sysfs_add_file(kobj->sd, attr, SYSFS_KOBJ_ATTR);
 
 }
 
@@ -623,7 +623,7 @@ int sysfs_add_file_to_group(struct kobject *kobj,
 	if (!dir_sd)
 		return -ENOENT;
 
-	error = sysfs_add_file(dir_sd, attr, SYSFS_KOBJ_ATTR);
+	error = __sysfs_add_file(dir_sd, attr, SYSFS_KOBJ_ATTR);
 	sysfs_put(dir_sd);
 
 	return error;
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index e4993fd..5c5aab0 100644
--- ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Implement __kobject_set_name() which takes pre-allocated @new_name and
renames the kobject without failing.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 include/linux/kobject.h |    1 +
 lib/kobject.c           |   13 +++++++++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index a8a84fc..f7a7734 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -68,6 +68,7 @@ struct kobject {
 	struct sysfs_dirent	* sd;
 };
 
+extern void __kobject_set_name(struct kobject *kobj, const char *new_name);
 extern int kobject_set_name(struct kobject *, const char *, ...)
 	__attribute__((format(printf,2,3)));
 
diff --git a/lib/kobject.c b/lib/kobject.c
index a280c62..1623125 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -230,6 +230,19 @@ int kobject_register(struct kobject * kobj)
 	return error;
 }
 
+/**
+ *	__kobject_set_name - Set the name of an object to preallocated string
+ *	@kobj:		object.
+ *	@new_name:	pointer to pre-allocated string for the new name
+ *
+ *	Set the name of @kobj to @new_name.  @new_name is used
+ *	directly and will be freed using kfree() on @kobj release.
+ */
+void __kobject_set_name(struct kobject *kobj, const char *new_name)
+{
+	kfree(kobj->k_name);
+	kobj->k_name = new_name;
+}
 
 /**
  *	kobject_set_name - Set the name of an object
-- 
1.5.0.3


-

From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

sysfs_add_link() takes parent sd, name, mode and the target sd and
creates a link accordingly.  Currently, sysfs links can only point to
directories but this limitiation is artificial to avoid inflating the
sysfs_dirent structure by one pointer with future changes and can be
easily removed.

Kobject based sysfs_create_link() is reimplemented using
sysfs_add_link().

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/kobject.c    |   47 +++++++++++++++++++++++++++
 fs/sysfs/symlink.c    |   83 +++++++++++++++++-------------------------------
 include/linux/sysfs.h |   10 ++++++
 3 files changed, 87 insertions(+), 53 deletions(-)

diff --git a/fs/sysfs/kobject.c b/fs/sysfs/kobject.c
index a34c54e..7400575 100644
--- a/fs/sysfs/kobject.c
+++ b/fs/sysfs/kobject.c
@@ -355,6 +355,53 @@ void sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_bin_file);
 
+/*
+ * kobject link interface
+ */
+
+/**
+ *	sysfs_create_link - create symlink between two objects.
+ *	@kobj:	object whose directory we're creating the link in.
+ *	@target:	object we're pointing to.
+ *	@name:		name of the symlink.
+ */
+int sysfs_create_link(struct kobject *kobj, struct kobject *target,
+		      const char *name)
+{
+	struct sysfs_dirent *parent_sd, *target_sd, *sd;
+
+	BUG_ON(!name);
+
+	if (!kobj)
+		parent_sd = sysfs_root;
+	else
+		parent_sd = kobj->sd;
+
+	if (!parent_sd)
+		return -EFAULT;
+
+	/* target->sd can go away beneath us but is protected with
+	 * sysfs_assoc_lock.  Fetch target_sd from it.
+	 */
+	spin_lock(&sysfs_assoc_lock);
+	target_sd = NULL;
+	if (target->sd)
+		target_sd = sysfs_get(target->sd);
+	spin_unlock(&sysfs_assoc_lock);
+
+	if (!target_sd)
+		return -ENOENT;
+
+	sd = sysfs_add_link(parent_sd, name, SYSFS_COPY_NAME, target_sd);
+
+	sysfs_put(target_sd);
+
+	if (IS_ERR(sd))
+		return PTR_ERR(sd);
+	return ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Reimplement group interface in terms of sd-based interface in
fs/sysfs/kobject.c.

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/Makefile  |    3 +-
 fs/sysfs/group.c   |   87 ----------------------------------------------------
 fs/sysfs/kobject.c |   72 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 89 deletions(-)
 delete mode 100644 fs/sysfs/group.c

diff --git a/fs/sysfs/Makefile b/fs/sysfs/Makefile
index f58bce9..b25ed0d 100644
--- a/fs/sysfs/Makefile
+++ b/fs/sysfs/Makefile
@@ -2,5 +2,4 @@
 # Makefile for the sysfs virtual filesystem
 #
 
-obj-y		:= inode.o file.o dir.o symlink.o mount.o bin.o \
-		   group.o kobject.o
+obj-y		:= inode.o file.o dir.o symlink.o mount.o bin.o kobject.o
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
deleted file mode 100644
index bb94f6a..0000000
--- a/fs/sysfs/group.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/*
- * fs/sysfs/group.c - Operations for adding/removing multiple files at once.
- *
- * Copyright (c) 2003 Patrick Mochel
- * Copyright (c) 2003 Open Source Development Lab
- *
- * This file is released undert the GPL v2. 
- *
- */
-
-#include <linux/kobject.h>
-#include <linux/module.h>
-#include <linux/dcache.h>
-#include <linux/namei.h>
-#include <linux/err.h>
-#include "sysfs.h"
-
-
-static void remove_files(struct sysfs_dirent *dir_sd,
-			 const struct attribute_group *grp)
-{
-	struct attribute *const* attr;
-
-	for (attr = grp->attrs; *attr; attr++)
-		sysfs_hash_and_remove(dir_sd, (*attr)->name);
-}
-
-static int create_files(struct sysfs_dirent *dir_sd,
-			const struct attribute_group *grp)
-{
-	struct attribute *const* attr;
-	int error = 0;
-
-	for (attr = grp->attrs; *attr && !error; attr++)
-		error = __sysfs_add_file(dir_sd, *attr, SYSFS_FILE);
-	if (error)
-		remove_files(dir_sd, grp);
-	return error;
-}
-
-
-int sysfs_create_group(struct kobject * kobj,
-		       const ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

sysfs_rename() takes target @sd, @new_parent and @new_name and rename
@sd to @new_name and move it under @new_parent.  @new_parent and/or
@new_name can be NULL if the specific operation is not needed.

To handle both move and rename && prepare for multiple renames in one
shot for easier symlink handling and shadow dirents, sysfs_rename() is
implemented to be able to handle arbitrary number of rename/move
operations.

During sysfs_prep_rename(), it acquires all the resources it will need
during the operations including dentries and copied names.  After all
are acquired, all needed i_mutexes are locked.  Deadlock is avoided by
using trylock.  If any lock acquisition fails, it releases all
i_mutexes and retries after 1ms.  Because i_mutexes are used very
lightly in sysfs, almost like spinlocks just to satisfy VFS locking
rules, I don't think there will be any starvation issues.

This makes rename a heavy operation but sysfs_rename() may fail and
it's shady-side-of-the-moon cold path where programming convenience
dominates performance by all measures.

sysfs_rename() can be called on any type of sysfs_dirent and always
copies @new_name.

Kobject based sysfs_rename_dir() and sysfs_move_dir() are
reimplemented using sysfs_remove().

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c        |  433 +++++++++++++++++++++++++++++++++----------------
 fs/sysfs/kobject.c    |   31 ++++
 include/linux/sysfs.h |    9 +
 3 files changed, 337 insertions(+), 136 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 986718c..d0eb9bf 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -11,6 +11,7 @@
 #include <linux/idr.h>
 #include <linux/completion.h>
 #include <linux/mutex.h>
+#include <linux/delay.h>
 #include "sysfs.h"
 
 /* verify all mode flags are inside S_IFMT */
@@ -948,142 +949,6 @@ void sysfs_remove(struct sysfs_dirent *sd)
 }
 EXPORT_SYMBOL_GPL(sysfs_remove);
 
-int ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Kill now unused sysfs_hash_and_remove().

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/inode.c |   22 ----------------------
 fs/sysfs/sysfs.h |    1 -
 2 files changed, 0 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8df357e..0aeea4b 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -202,25 +202,3 @@ struct inode * sysfs_get_inode(struct sysfs_dirent *sd)
 
 	return inode;
 }
-
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
-{
-	struct sysfs_addrm_cxt acxt;
-	struct sysfs_dirent *sd;
-
-	if (!dir_sd)
-		return -ENOENT;
-
-	sysfs_addrm_start(&acxt);
-
-	sd = sysfs_find_dirent(dir_sd, name);
-	if (sd)
-		sysfs_remove_one(&acxt, sd);
-
-	sysfs_addrm_finish(&acxt);
-
-	if (sd)
-		return 0;
-	else
-		return -ENOENT;
-}
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 62239e3..69e1451 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -137,7 +137,6 @@ static inline void sysfs_put(struct sysfs_dirent *sd)
  */
 struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
 int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
 
 /*
  * file.c
-- 
1.5.0.3


-

From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Kill now unused __sysfs_add_file().

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/file.c  |   23 -----------------------
 fs/sysfs/sysfs.h |    2 --
 2 files changed, 0 insertions(+), 25 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 4c0e29f..1d93940 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -556,26 +556,3 @@ struct sysfs_dirent *sysfs_add_file(struct sysfs_dirent *parent,
 	return sysfs_insert_one(parent, sd);
 }
 EXPORT_SYMBOL(sysfs_add_file);
-
-int __sysfs_add_file(struct sysfs_dirent *dir_sd, const struct attribute *attr,
-		   int type)
-{
-	umode_t mode = attr->mode & S_IALLUGO;
-	struct sysfs_addrm_cxt acxt;
-	struct sysfs_dirent *sd;
-	int rc;
-
-	sd = sysfs_new_dirent(attr->name, mode, type);
-	if (!sd)
-		return -ENOMEM;
-	sd->s_file.data = (void *)attr;
-
-	sysfs_addrm_start(&acxt);
-	rc = sysfs_add_one(&acxt, dir_sd, sd);
-	sysfs_addrm_finish(&acxt);
-
-	if (rc)
-		sysfs_put(sd);
-
-	return rc;
-}
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 16ecd6a..62239e3 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -145,8 +145,6 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
 extern const struct file_operations sysfs_file_operations;
 
 void sysfs_file_check_suicide(struct sysfs_dirent *sd);
-int __sysfs_add_file(struct sysfs_dirent *dir_sd,
-		     const struct attribute *attr, int type);
 
 /*
  * bin.c
-- 
1.5.0.3


-

From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

Rename sysfs_rename_mutex to sysfs_op_mutex and protect operations
which modify tree with it.  ie.

  sysfs_op_mutex : above i_mutexes in the lock hierarchy and
		   guarantees exclusion against all tree
		   modifications.

  sysfs_mutex    : under i_mutexes in the lock hierarchy and protects
		   vfs tree walking from actual tree modification.

So, when one wants to modify tree structure, it should first grab
sysfs_op_mutex mutex, at which point tree structure is guaranteed to
not change beneath it, and then sysfs_mutex when it actually modifies
the tree.

This widened op mutex will be used to make using symlinks easier and
the extended mutex coverage won't add any noticeable contention (only
one extra locking and unlocking around sysfs_mutex in add/remove
paths), not that it would matter even if it actually does.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c   |   38 +++++++++++++++++++++++++-------------
 fs/sysfs/sysfs.h |    2 +-
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index a20beff..986718c 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -18,8 +18,18 @@
 #error SYSFS mode flags out of S_IFMT
 #endif
 
+/* sysfs_op_mutex is above i_mutexes in the lock hierarchy and
+ * guarantees exclusion against operations which might change tree
+ * structure (add, remove and rename).  sysfs_mutex provide exclusion
+ * between tree modifying operations and vfs tree walking and is below
+ * i_mutexes in the lock hierarchy.
+ *
+ * If a thread is holding sysfs_op_mutex, no one else will change the
+ * tree structure beneath it.  When the thread actually wants to
+ * change the tree structure, it needs to grab sysfs_mutex too.
+ */
+DEFINE_MUTEX(sysfs_op_mutex);
 DEFINE_MUTEX(sysfs_mutex);
-DEFINE_MUTEX(sysfs_rename_mutex);
 spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
 
 static spinlock_t sysfs_ino_lock = SPIN_LOCK_UNLOCKED;
@@ -87,7 +97,7 @@ static void sysfs_unlink_sibling(struct ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

sysfs_assoc_lock which protects kobj <-> sd association is now only
used in fs/sysfs/kobject.c.  Move it there and make it static.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c     |    1 -
 fs/sysfs/kobject.c |    2 ++
 fs/sysfs/sysfs.h   |    1 -
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index d0eb9bf..a74ca4a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -31,7 +31,6 @@
  */
 DEFINE_MUTEX(sysfs_op_mutex);
 DEFINE_MUTEX(sysfs_mutex);
-spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
 
 static spinlock_t sysfs_ino_lock = SPIN_LOCK_UNLOCKED;
 static DEFINE_IDA(sysfs_ino_ida);
diff --git a/fs/sysfs/kobject.c b/fs/sysfs/kobject.c
index 55b884b..16e10de 100644
--- a/fs/sysfs/kobject.c
+++ b/fs/sysfs/kobject.c
@@ -19,6 +19,8 @@
 
 #define to_sattr(a) container_of(a,struct subsys_attribute, attr)
 
+static spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
+
 static int sysfs_remove_child(struct sysfs_dirent *parent, const char *name)
 {
 	struct sysfs_dirent *sd;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 69e1451..732b292 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,7 +89,6 @@ extern struct kmem_cache *sysfs_dir_cachep;
  */
 extern struct mutex sysfs_op_mutex;
 extern struct mutex sysfs_mutex;
-extern spinlock_t sysfs_assoc_lock;
 
 extern const struct file_operations sysfs_dir_operations;
 extern const struct inode_operations sysfs_dir_inode_operations;
-- 
1.5.0.3


-

From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

sysfs_add_file() creates a text sysfs file which is equivalent to
attribute file of the original API.  Each sysfs file has its own ops
containing show and store, and can have arbitrary private data.  Both
show and store methods are given two private data arguments - one for
its own, the other for its parent's.  This is primarily to support the
original kobject based API where private data was only available for
directory nodes but can be used for other purposes too.  As with
sysfs_add_dir(), SYSFS_COPY_NAME flag can also be used.

sysfs_notify_file() is equivalent to sysfs_notify() of the original
API and can be called only on file dirents.

sysfs_chmod() is extended version of sysfs_chmod_file().  It lives in
fs/sysfs/dir.c and can be called on any type of node.

The new interface makes sysfs files very straight forward compared to
the original overly and uglily objectified API.  A sysfs file lives
under a sysfs directory, has two opertaions - show and store which
take buffer and its own and parent's private data.  No hidden method
selection or enforcing the same methods for all attributes under a
directory; thus, no extra wrapping or forced de-multiplexing in
show/store methods is necessary.

Kobject based interface works by setting parent_data to kobj and data
to attr.  All kobject-based file related API functions are
reimplemented using the above functions in fs/sysfs/kobject.c.

This patch doesn't introduce any behavior change to the original API
except that sysfs_pos is now determined on each read/write operation
instead of on each open.  This shouldn't cause any noticeable
difference.

This patch increase the size of sysfs_dirent by one pointer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/dir.c        |   47 +++++++
 fs/sysfs/file.c       |  335 +++++++++++++++++--------------------------------
 fs/sysfs/kobject.c    |  192 ++++++++++++++++++++++++++++
 fs/sysfs/sysfs.h      |    3 +-
 include/linux/sysfs.h |   28 ++++
 5 files changed, 385 ...
From: Tejun Heo
Date: Thursday, September 20, 2007 - 1:05 am

sysfs_add_bin() creates a binary sysfs file which is equivalent to
binary attribute file of the original API.  As with file, bin ops take
private data of both itself and parent to support kobject based API
and kobject based interface works by setting parent_data to kobj and
data to battr.

Other than interface change, internal implementation is mostly
identical.

This patch doesn't introduce any behavior change to the original API.

Signed-off-by: Tejun Heo <htejun@gmail.com>
---
 fs/sysfs/bin.c        |   77 +++++++++++++++++++++++--------------------
 fs/sysfs/inode.c      |    5 +--
 fs/sysfs/kobject.c    |   87 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/sysfs/sysfs.h      |    4 ++-
 include/linux/sysfs.h |   18 ++++++++++
 5 files changed, 150 insertions(+), 41 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index dad891c..d6cc7b3 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/fs.h>
 #include <linux/kernel.h>
-#include <linux/kobject.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/mutex.h>
@@ -26,21 +25,21 @@ struct bin_buffer {
 	int		mmapped;
 };
 
-static int
-fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count)
+static int fill_read(struct dentry *dentry, char *buffer, loff_t off,
+		     size_t count)
 {
 	struct sysfs_dirent *sd = dentry->d_fsdata;
-	struct bin_attribute *attr = sd->s_bin.bin_attr;
-	struct kobject *kobj = sd->s_parent->s_dir.data;
+	const struct sysfs_bin_ops *bops = sd->s_bin.ops;
 	int rc;
 
-	/* need sd for attr, its parent for kobj */
+	/* need sd for bops and data, its parent for parent_data */
 	if (!sysfs_get_active_two(sd))
 		return -ENODEV;
 
-	rc = -EIO;
-	if (attr->read)
-		rc = attr->read(kobj, attr, buffer, off, count);
+	rc = -EACCES;
+	if (bops->read)
+		rc = bops->read(buffer, off, count, sd->s_bin.data,
+				sd->s_parent->s_dir.data);
 
 	sysfs_put_active_two(sd);
 
@@ -83,21 ...
From: Greg KH
Date: Tuesday, September 25, 2007 - 3:17 pm

This is not good, I don't like this :(

As we spoke a few weeks ago, the non-driver model users of sysfs are ok.
Yes, it's not trivial to use sysfs in this manner, and it should be made
easier, but we still need to keep our tree of objects.  Using a kobject
for sysfs access is a good thing as it provides a tiny grasp on keeping
the usage of sysfs under control.

So while I like the separation of sysfs and kobjects from an
architectural and conceptual level for testing and understanding, I do
not want to allow the use of sysfs without creating a backing kobject
like we do today.

I'm all for making the "raw" kobject access easier, cleaning up the
attribute "mess" that you need to go through.  The cleanups that Kay and
I have been doing in the kset and subsystem area are steps in that
direction and I have more I want to do there to help make it easier to
use and understand.

So, I'll try to pick and choose from this patchset what I feel is ok for
now.

Or does it depend on the second set of patches that are yet to be
applied due to disagreements about module lifetimes?

thanks,

greg k-h
-

From: Tejun Heo
Date: Thursday, September 27, 2007 - 4:35 am

Hello, Greg.

Sorry about the late reply.  I'm sandwiched between several release
dates (I bet you know) and sudden burst of family/personal events (all
kinds of them - good, annual and bad).



I suppose I failed to sell new sysfs_dirent based interface idea
face-to-face.  I'll try it one more time on-line.  I tend to do these
things better on-line, especially in English.  So, please spare some
more time on the subject.

IMHO, removal of kobject from sysfs interface is a logical and necessary
step toward easier driver model, not an unnecessary because-we-can
modification.  I need to go back to what a "kobject" is to explain this.

1. What is a kobject?

If I understood it correctly, kobject was separated out from device
driver model to allow entities outside of driver model to use sysfs, so
it's a part of device driver object which is necessary to interact with
sysfs.

Originally, driver model objects and their sysfs representation was
tightly coupled.  This is what made kobject a "kobject" not
"sysfs_something".  Driver model and sysfs shared the same object to
represent kernel and sysfs-side.  kobject was a base class of all driver
model objects, interaction with sysfs was through this base object and
implementation of sysfs also depended on kobject.

The functionality served by kobject can be broken down into the
following two.

F-a. To serve as an entity both subsystems can share lifespan
management.  ie. both subsystems reference count on kobject.

F-b. To serve as an entity both subsystems can base their internal
representation on.  (base object in OO term).

2. Implementation of immediate detach of sysfs nodes

Unfortunately, this tight coupling caused several problems.  One of the
most annoying problems was that userland was allowed to interfere
directly with lifespan management of kernel objects which formed basis
of driver model, causing quite some number of problems directly and
indirectly and unfortunately the problem couldn't be contained ...
From: Eric W. Biederman
Date: Thursday, September 27, 2007 - 12:25 pm

I still need to look at the code in detail but I have some concerns
I want to inject into this conversation of future sysfs architecture.

- If we want to carefully limit sysfs from going to wild code review
  is clearly not enough.  We need some technological measures to
  assist us.  As the experience with sysctl has shown.

  I discovered that something like 10% of the sysctl entries were
  buggy and had been for years when I added basic runtime sanity
  checks.

  I had also found one instance in the kernel and had one instance
  from outside the kernel where people had created files under
  /proc/sys not as sysctls but as using the infrastructure from
  proc_generic.c because it happened to work.

  So while it very well may be we don't want to use the kobject
  interface anymore.  I expect that we want to have the sysfs_dirent
  interface not exported to modules, and only allow direct 
  access from code compiled into the kernel.

  Mostly I am thinking that any non-object model users should have
  their own dedicated wrapper layer.  To help keep things consistent
  and to make it hard enough to abuse the system that people will
  find that it is usually easier to do it the right way.

- The network namespace work scheduled to be merged in 2.6.24 is 
  currently has a dependency in Kconfig that is "&& !SYSFS"
  because sysfs is currently very much a moving target.

  Does it look like we can resolve Tejun's work for 2.6.24?
  If not does it make sense to push my patches that allow
  multiple mounts of sysfs for 2.6.24?  So I can allow
  network namespaces in the presence of sysfs.

  Outside of sysfs and the device model I'm only talk maybe 30 lines
  of code...    So I could easily merge that patch later in the
  merge window after the other pieces have gone in.

- Farther down the road we have the device namespace.
  The bounding requirements are:
  - We want to restrict which set of devices a subset of process
    can access.
  - When we migrate an ...
From: Tejun Heo
Date: Saturday, September 29, 2007 - 3:06 pm

Hello, Eric.


Hmmm...  I think most current non-driver-model sysfs users are deep in
kernel anyway, but I think not exporting sysfs interface at all might be
a bit too restrictive.  I think we need to examine the current
non-driver-model sysfs users thoroughly to determine what to do about
this.  But, yes, I do agree that we need to put restrictions one way or

I think it would be better if namespace comes after interface update and
other new features, especially symlink renaming, but, under the current
circumstance, it might delay namespace unnecessarily and I have no
problem with your patches going in first.  My concerns are...

* Do you think you can use new rename implementation contained in this
series?  It borrows basic ideas from the implementation you used for
namespace but is more generic.  It would be great if you can use it
without too much modification.

* I'm still against using callbacks to determine namespace tags because
callbacks need to be coupled with sysfs internals more tightly and are

Ah... Coming few months will be fun, won't they?  :-)

-- 
tejun
-

From: Greg KH
Date: Thursday, October 4, 2007 - 11:23 pm

I totally agree.  You should see the ways that people have tried to
circumvent the current kobject/sysfs code over the past years.  It's so

I would be interested in seeing what your patches look like.  I don't
think that we should take any more sysfs changes for 2.6.24 as we do
have a lot of them right now, and I don't think that Tejun and I agree
on the future direction of the outstanding ones just yet.

But I don't think that your multiple-mount patches could make it into


That is going to be interesting to see how you come up with a way to do

Um, no, that's not going to happen.  /dev/sda will _always_ have the
same major:minor number, as defined by the LSB spec.  You can not break
that at all.  So while you might not want to show all mounts
/sys/devices/block/sda/ the ones that you do, will all have the LSB
defined major:minor number assigned to it.

thanks,

greg k-h
-

From: Eric W. Biederman
Date: Friday, October 5, 2007 - 5:12 am

Hmm.  If that is in the LSB it must come from
Documentation/devices.txt  I'm not after changing the user
visible major/minor assignments.

Let me see if a concrete example will help.  Suppose I have
have a SAN with two disks:  disk-1 and disk-2.  I have
two machines A and B.  On machine A I get the mapping:
sda -> disk-1, sdb ->disk-2.  On machine B I wind up with
a different probe order so I get the mapping: sda -> disk-2
sdb ->disk-1.

To be very clear by sda I mean the block device with major 8 and
minor 0, and by sdb I mean the block device with major 8 and minor
16.

So I decide I want an environment on machine B that looks just
like the environment on machine A, so I can bring transfer over
a running program or whatever.  So I run around looking at UUID
labels and what not and I discover that the machine B knows disk-1 as
sdb and that machine A knows disk-1 as sda.  So I want to say:
/sys/devices/block/sdb show up in this other device namespace as
/sys/devices/block/sda.

In that instance a running program won't notice the difference.

Eric
-

From: Kirill Korotaev
Date: Friday, October 5, 2007 - 6:03 am

Imho environments to be migratable should have no direct access to the devices.
You can use any of stacked virtual filesystems to hide real
device from container.
You will have problems much bigger than this one otherwise
(imagine access to video, sound etc.)

Kirill

-

From: Eric W. Biederman
Date: Friday, October 5, 2007 - 6:24 am

What I am primarily concern about is when you can make the case that
the hardware we are talking is present before and after the migration.

When you are directly accessing a device.  For times when it makes
sense to directly access hardware in a container (think infiniband
OS-bypass NICs).  We need to tell user space that the device was
unplugged and another one was plugged in.  If user space can cope with
that things should continue to work.

There are some very specific cases that we can support:
- Stateless devices like /dev/zero and dev/random.
- Virtual devices like ttys, ramdisks, loop devices
- Remote block devices like SCSI disks on a san, iSCSI, nbd, ATAoE.
- Local pseudo block devices like the backing devices for virtual
  filesystems.

There are very specific limits in which this can work and be useable,
and I don't claim to have looked at all of the details, but for the
block device case in particular we export the block device number
to user space in stat.  There are some common applications which do
memorize stat data for files things like: git, incremental backup
software, and intrusion detection software.

Frankly the times when this matters is rare enough I don't put a
big priority on getting this done quickly.  However a strong case
has been made for all of this filtering so we can run things like
udev in a container.  Initially I only expect stateless character
devices and ttys to be allowed in a namespace, and I don't have
a clue what device number we will use in st_dev for stat in the
case our block device isn't in the user space interface.  I just know
that it sounds like where we want to be eventually and thinking
about it now isn't a problem.

Eric

-

From: Greg KH
Date: Tuesday, October 9, 2007 - 3:51 pm

Ah, but if you do that then the "other" device namespace would have
/sys/devices/block/sda/dev be 8:16, right?  And that is not valid as sda
for that namespace must always map to the device with a 8:0 major:minor
as per the LSB spec.

So, no, that's not going to be allowed, sorry.

thanks,

greg k-h
-

From: Eric W. Biederman
Date: Wednesday, October 10, 2007 - 6:16 am

My above sentence is slightly misleading.  That should have been.
I am not after changing the device name to major:minor assignments
as specified in Documentation/devices.txt.

So within a single device namespace everything is normal and as it
always has been.  Weirdness only ensues when you look across device

No. The "other" device namespace I would construct on machine B to
look just like the device namespace that existed on machine A.
Making /sys/devices/block/sda would still be 8:0.

So to be very clear on machine B when talking about disk-1 I would have.
initial device namespace:
  /sys/devices/block/sdb
  /sys/devices/block/sdb/dev 8:16

"other" device namespace:
  /sys/devices/block/sda
  /sys/devices/block/sda/dev 8:0

Similarly on machine B when talking about disk-2 I would have.
initial device namespace:
  /sys/devices/block/sda
  /sys/devices/block/sda/dev 8:0

"other" device namespace:
  /sys/devices/block/sdb
  /sys/devices/block/sdb/dev 8:16

So between the two devices namespaces on machine B the two disks
would exchange their user visible identities.

Eric
-

From: Greg KH
Date: Wednesday, October 10, 2007 - 1:44 pm

Ah, ok, that makes more sense.

And seems quite difficult to do, good luck with that :)

greg k-h
-

From: Eric W. Biederman
Date: Wednesday, October 10, 2007 - 2:16 pm

Thanks.   At least now all I have to do is worry about the
details when we get that far instead of selling the big picture...

My gut feel is that sysfs is probably the hardest part to deal with,
and maybe half of the problem.  Just intercepting the lookup by
device number is fairly simple, I think there is one spot
for block devices and another for character devices.

Anyway once the network namespace support is in with any luck that
will have solved half the sysfs problem.

Eric
-

From: sukadev
Date: Tuesday, October 16, 2007 - 3:18 pm

Eric W. Biederman [ebiederm@xmission.com] wrote:
| Greg KH <greg@kroah.com> writes:
| 
| > On Fri, Oct 05, 2007 at 06:12:41AM -0600, Eric W. Biederman wrote:
| >> Greg KH <greg@kroah.com> writes:
| >> >
| >> >>   Also fun is that the dev file implementation needs to be able to
| >> >>   report different major:minor numbers based on which mount of
| >> >>   sysfs we are dealing with.
| >> >
| >> > Um, no, that's not going to happen.  /dev/sda will _always_ have the
| >> > same major:minor number, as defined by the LSB spec.  You can not break
| >> > that at all.  So while you might not want to show all mounts
| >> > /sys/devices/block/sda/ the ones that you do, will all have the LSB
| >> > defined major:minor number assigned to it.
| >> 
| >> Hmm.  If that is in the LSB it must come from
| >> Documentation/devices.txt
| >
| > Yes, that is the requirement.
| >
| >> I'm not after changing the user visible major/minor assignments.
| >
| > Oh, I misunderstood what you wrote above then.
| 
| My above sentence is slightly misleading.  That should have been.
| I am not after changing the device name to major:minor assignments
| as specified in Documentation/devices.txt.
| 
| So within a single device namespace everything is normal and as it
| always has been.  Weirdness only ensues when you look across device
| namespaces.
| 
| >> Let me see if a concrete example will help.  Suppose I have
| >> have a SAN with two disks:  disk-1 and disk-2.  I have
| >> two machines A and B.  On machine A I get the mapping:
| >> sda -> disk-1, sdb ->disk-2.  On machine B I wind up with
| >> a different probe order so I get the mapping: sda -> disk-2
| >> sdb ->disk-1.
| >
| > Ok.
| >
| >> To be very clear by sda I mean the block device with major 8 and
| >> minor 0, and by sdb I mean the block device with major 8 and minor
| >> 16.
| >
| > Ok.
| >
| >> So I decide I want an environment on machine B that looks just
| >> like the environment on machine A, so I can bring transfer over
| ...
From: Eric W. Biederman
Date: Tuesday, October 16, 2007 - 4:54 pm

No.  It is worse you need to access a filesystem and probably
a block device that is available on both machine A and machine B.
With care we can introduce appropriate namespaces and namespace semantics
so we can make the names be what we need.

For a classic tricky case think what it would require to migrate
a git archive with checked out files and not need to say
"git-update-index --refresh" before you work with the files.

I used names like disk-1 and disk-2 instead of UUIDs because it
was easier for me to type and think about.  You do need some
kind of absolute disk or filesystem identity you can refer back to.

Eric
-

From: Eric W. Biederman
Date: Friday, October 5, 2007 - 5:44 am

Well I have posted them all earlier.  At this point I it makes most
sense to wait until after the big merge happen and every rebases on
top of that.  Then everyone will have network namespace support and
it is easier to look through all of the patches.  Especially since
it looks like the merge window will open any day now.

I will quickly recap the essence of what I am looking at:
On directories of interest I tag all of their directory
entries with which namespace they belong to.  On a mount
of sysfs I remember which namespace we were in when we
mounted sysfs.  The I filter readdir and lookup based upon
the namespace I captured at mount time.  I do my best
to generalize it so that the logic can work for different
namespaces.

Currently the heart of the patch from the network namespace
is below (I sniped the part that does the capture at mount time).
Basically the interface to users of this functionality is just
providing some way to go from a super block or a kobject to
the tag sysfs is using to filter things.

So I get one sysfs_dirent tree, but each super_block has it's
own tree of dcache entries.

Everything else is pretty much details in checking and propagating
the tags into the appropriate places.

Eric

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 5adfdc2..a300f6e 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -435,6 +437,23 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }
 
+static const void *net_sb_tag(struct sysfs_tag_info *info)
+{
+	return info->net_ns;
+}
+
+static const void *net_kobject_tag(struct kobject *kobj)
+{
+	struct net_device *dev;
+	dev = container_of(kobj, struct net_device, dev.kobj);
+	return dev->nd_net;
+}
+
+static const struct sysfs_tagged_dir_operations net_tagged_dir_operations = {
+	.sb_tag = net_sb_tag,
+	.kobject_tag = net_kobject_tag,
+};
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -444,6 ...
From: Greg KH
Date: Tuesday, October 9, 2007 - 3:53 pm

thanks,

greg k-h
-

From: Greg KH
Date: Thursday, October 4, 2007 - 11:18 pm

Same here, I'm swamped with stuff, and now am on vacation for a few

Yes, it was done so that we could have /sys/block for block devices.  Al



Yes, but a kobject is still needed internally for the lifespan



No.  I think you are missing a number of things that kobjects provide
and allow:
	- a structurual heirachy of devices.  Combine kobjects with
	  ksets and ktypes, and you have a very powerful system to
	  categorize objects and their representation to each other.
	- a consistant and easy interface to userspace through
	  uevents/hotplug of the creation and removal of these objects.
	  This keeps the different parts of the kernel that need this
	  interface from having to create it every time on their own.
	- a way to easily create and export attributes in sysfs
	  automatically.
	- a way to provide working reference counting for a variety of
	  different objects.


I don't mind the removal of kobjects from sysfs in order to make sysfs
and kobjects work better/simpler.  However the majority of the patches
you created to do this end up with more code overall, and are of no


I will not deny that the current use of kobjects/ksets/ktypes (subsystem
is now gone) is difficult and extreemly painful.  I am currently working
to fix this issue.  But don't think that the reason this is hard to use
means that it should be abolished alltogether.

Rather, it means that this interface to using kobjects needs to be fixed

No, I think that we have been lucky so far that it is so hard to get
sysfs representation working properly for "raw" kobjects.  It has made
people really think why they want to add things there, and usually just
give up and go and put things into the proper place in the /sys/devices/
tree.

Also, not everything that people keep wanting to put in /sys should go
there.  The perfict example of that is the recent BDI stuff.  It belongs
in the driver tree, not in a new /sys/bdi/ location.  If sysfs were
"easier" to use, then it would be abused this ...
From: Tejun Heo
Date: Friday, October 5, 2007 - 1:00 am

Hello, Greg.

I think this definitely needs more discussion, so here we go...


Yes, exactly - "internally" to the driver model (or drivers which ride
along).  To sysfs, it has no function other than being an opaque token.


Yes, which only needs to be used _inside_ the driver model

Things can be much easier than now.  Also, the above paragraph is
inconsistent with the rest of your argument or am I misunderstanding

This is and should be the function of the driver model not kobjects.

To me, this just feels wrong and does more harm than it helps.  I really
think we shouldn't have multi-role flash light at the core of our driver
model (inside driver model proper, no problem, but not as exported

For #1 and #3, I agree if you limit the scope to driver model proper but
I am not arguing kobject and all its friends should be abolished.  I'm
arguing that there is no reason to export it as API because it doesn't
add any value exported.

For #4, I don't know.  This can be a matter of taste but I don't think
#4 alone stands as justification for kobject as external API.

For #2, I might be misundertanding.  Please elaborate if I am.  For
uevent issue, I'll talk about more later.

So, I honestly don't think the above four arguments successfully counter
the original arguments.  If I'm missing something, feel free to hammer

There is no code reduction or functionality improvement yet because all
of them are still using the compatibility interface.  Properly
converted, sysfs handling code all over the kernel can be _much_
simplified and more robust.  I bet there are numerous bugs in sysfs
creation failure handling path all over mid/low level drivers.  New

The thing is that functionality-wise, kobject and its friends don't
serve anything anymore outside of driver model implementation proper
(I'll talk about uevent later) and thus there is no reason to use it
outside of driver model implementation anymore in the long term.

If something is needed but bypassed, it's circumvented ...
From: Cornelia Huck
Date: Tuesday, October 9, 2007 - 2:29 am

On Fri, 05 Oct 2007 17:00:48 +0900,


I think that's the heart of the question: We first need to agree what
use the different components should have.

(a) The driver model

The driver model serves as a unified layer for all devices managed by
the kernel, organized in trees, and the drivers handling them. This
includes busses, matching of devices and drivers, attributes and so on.
Userspace expects to see these devices, drivers, busses and attributes
by looking under /sys/devices/. /sys/class/ and /sys/bus/ provide
additional views on this data.

(b) kobjects, ktypes, ksets

kobjects provide a mechanism to arrange kernel objects into a tree-like
structure. ktypes and ksets are mechanisms to further order these
objects. Changes in the kobject hierarchy are communicated to userspace
via uevents.

The driver core is implemented using this infrastructure.

(c) krefs

krefs provide a generic reference counting mechanism.

The kobject infrastructure uses krefs for its reference counting needs.

(d) sysfs

sysfs is a virtual filesystem. It exports information on kernel objects
to user space. (IMO, that's the key: sysfs is userspace representation.)

That said, it is logical that kobjects are made visible to userspace
via sysfs. If someone is trying to make things show up in sysfs and has
to jump through hoops to cook up kobjects, they're probably using the
wrong infrastructure.

There are two big problems with the tight coupling of sysfs and kobjects:

- lifetime rules; but this fortunately hugely improved with the
previous patches :)

- relaying implementation details to userspace so that they cannot be
easily changed. We would need to allow kobjects not showing up in sysfs
and making symlinks a sysfs facility not relying on kobjects to help

But krefs are used for kobject reference counting, or am I

There are use cases outside of the driver model prober where you may

I see uevents as a notifier for changes in the kobject hierarchy, so
they belong to that ...
From: Greg KH
Date: Tuesday, October 9, 2007 - 3:26 pm

Huh?  Why would you want a kobject to not show up in sysfs?

And yes, I agree we could use some more "automatic" help in regards to
symlinks in sysfs when we change kobjects around.  That would make






i agree.

thanks,

greg k-h
-

From: Roland Dreier
Date: Tuesday, October 9, 2007 - 4:20 pm

> > - relaying implementation details to userspace so that they cannot be
 > > easily changed. We would need to allow kobjects not showing up in sysfs
 > > and making symlinks a sysfs facility not relying on kobjects to help
 > > there.
 > 
 > Huh?  Why would you want a kobject to not show up in sysfs?

I think the text you replied to explained it perfectly: becase you
don't want internal implementation details to be exposed the userspace
and become an unchangeable part of the kernel/userspace interface.
-

From: Greg KH
Date: Tuesday, October 9, 2007 - 4:28 pm

But a kobject represents "something" in the kernel.  If it's there, then
it shows up in sysfs.  But if it isn't, or it changes somehow, then it
no longer is in sysfs, which is fine, and your userspace tools have to
be able to handle that by virtue of the rules of how to use sysfs from
userspace.

thanks,

greg k-h
-

From: Cornelia Huck
Date: Wednesday, October 10, 2007 - 2:11 am

On Tue, 9 Oct 2007 16:28:04 -0700,

The rules for using sysfs information are currently only laid out for
the driver model. We would need similar rules for every other user of
kobjects.

This only works if we make the following the general rules:

- You may not rely on any information in sysfs to be present.
- Exceptions to that rule are documented for that special user of sysfs.
-

From: Cornelia Huck
Date: Wednesday, October 10, 2007 - 2:05 am

On Tue, 9 Oct 2007 15:26:38 -0700,

If we consider everything that shows up in sysfs as API, we cannot go
ahead and change it without risking userspace breakage. The hierarchy
of objects may very well be an "implementation detail". Otherwise, we
would need to lay out for every user of the kobject infrastructure
which information userspace can rely on (like it is documented for the

Yes, non-automatic symlink handling is a large source of pain and
errors. They should probably be tied to sysfs_dirents instead of

That's what I tried to say as well :)
-

From: Greg KH
Date: Tuesday, October 9, 2007 - 3:48 pm

I think you must be misunderstanding what I mean here.

In the past, to add something like "userspace notification of hotplug"
to different kernel subsystems, we had to add it all over the place.
Now, with a unified way to represent objects in the kernel (kobjects and
then 'struct device') we only have to add the logic in one place for the
core code, and all subsystems have automatic access to it.  That's one

No, kobjects need to do this, not just the driver model.  What about
things that do not currently use the driver model (like block devices).

Well, that's what struct kref does, as Cornelia pointed out.  That's


Sorry, struct kref does that.  You want a kobject to show hierarchy and

The "original" argument was that I don't want to allow users of sysfs
who are not using a struct kobject.  Perhaps we should just focus on
that issue instead of debating the relative merits of kobject and struct

But as long as we stick with using kobjects for the sysfs interface, it


I don't think that things under /sys/fs/ or /sys/module/ or /sys/kernel/
or /sys/firmware/ should be forced to use the driver model.  Instead,
they should use kobjects, which make much more sense there to show the

Well, what is the probem with the current driver core code (becides the
whole scsi mess which I think is now cleaned up a bunch) that is keeping
libata from doing what it wants to do?

It might be easier to focus on this, than debating the relative merits
of splitting sysfs away from kobjects, as specific examples are much

Heh, ok :)

I really don't want the kobject interface to be such an obfuscated mess,
and am trying to fix that.  It's just taking time, as it is such a gross
mess in places.  But we have the framework layed out for where we want

The rules for sysfs files are the following:
	- one value, in text format, per file.
	- no action apon open/close
	- binary files are only allowed for "pass-through" type files
	  that the kernel does not touch (like for firmware and ...
From: Alan Stern
Date: Wednesday, October 10, 2007 - 8:38 am

You have to stretch this a little for the power/ subdirectory every 
device gets.  The only kobject it corresponds to is the one in the 
device, which already corresponds to the parent directory.

Alan Stern

-

From: Cornelia Huck
Date: Wednesday, October 10, 2007 - 9:16 am

On Wed, 10 Oct 2007 11:38:50 -0400 (EDT),

Yes, this should be "directories should be associated with a kobject or
generated by an attribute group". This way you won't get deep nesting
either.
-

From: Martin Bligh
Date: Wednesday, October 10, 2007 - 10:24 am

So you'll be removing:

/sys/devices/system/node/node?/meminfo

then?

along with:

/sys/devices/system/node/node?/distance
/sys/devices/system/node/node?/numastat

and all the other things that violate the rules?

(which I do agree with ... I just don't think sysfs works for
performance stats as we've discussed several times before ;-))

M.
-

From: Greg KH
Date: Wednesday, October 10, 2007 - 10:30 am

I would love to do that :)

And that goes to show how trying to enforce these kinds of rules is damm
hard.  Things slip by that I never notice because they are only for odd

I agree that this doesn't work too, but also that if it's really needed,
it can be done, just let us know about it (like
/sys/block/BLOCKDEV/stat)

thanks,

greg k-h
-

From: Martin Bligh
Date: Wednesday, October 10, 2007 - 11:26 am

Is there no way to enforce that in the sysfs interface?

OK. Would be nice if we could get rid of /sys/devices/system at
some point, which seems like a fairly crazy path, but still ;-)

M.
-

From: Greg KH
Date: Wednesday, October 10, 2007 - 11:44 am

Hm, we could parse the buffer from the response and complain if we

I agree, getting rid of the sysdev stuff is on the list of things I want
to see changed, that code is not nice...

thanks,

greg k-h
-

Previous thread: [PATCHSET 2/4] sysfs: allow suicide by Tejun Heo on Thursday, September 20, 2007 - 12:26 am. (31 messages)

Next thread: none