jcochett's blog

Last Block on an Ext2 Inode

Submitted by jcochett
on August 13, 2006 - 12:39am

This was a very complex task to complete. An even now I am not 100% sure that my solution is correct. The task was is to get a pointer to the the last block of data in an ext2 inode. This means I need to transverse the direct and indirect blocks of the ext2 inode. It took me a solid week and some serious brain segment faults but I think I have my solution. If you are reading this and know a better way, please post a comment.

The first concept was to search from begining to end. But to do this is a serious waste of time and cpu. The second concept was to start from the end and work my way up to the begining, skipping any indirect pointers that were zero. That means if the triple indirect pointer was zero skip it and look at the double indirect pointer. But this was still a waste of cpu cycles. Then I re-found the i_blocks variable in the ext2 inode structure. It states that:

include/linux/ext2_fs.h
   __le32  i_blocks;                /* Blocks count */

Blocks count is the number of blocks that inode is using. So all I would have to do is calculate where that block is. I could not find code that did this within the linux kernel. After a quick glance at the code, I decided to knock it out myself real quick. That was 8 days ago.

Lets start with i_blocks, it really is the number of blocks of size 512, but the block size of the standard installition of my linux is 4096. So the number of blocks (bn) is really i_blocks divided by eight.

bn = i_blocks / 8;

Next we know that the ext2 inode has direct blocks, indirect blocks, double indirect blocks, and triple indirect blocks. They are all contained in the i_block structure.

#define EXT2_NDIR_BLOCKS 12
#define EXT2_IND_BLOCK   EXT2_NDIR_BLOCKS      // 12
#define EXT2_DIND_BLOCK (EXT2_IND_BLOCK + 1)   // 13
#define EXT2_TIND_BLOCK (EXT2_DIND_BLOCK + 1)  // 14
#define EXT2_N_BLOCKS   (EXT2_TIND_BLOCK + 1)  // 15
   __le32  i_block[EXT2_N_BLOCKS];  /* Pointers to blocks */

So from i_block[0] to i_block[11] are the direct blocks, i_block[12] is the indirect block pointer, i_block[13] is the double indirect pointer, and i_block[14] is the triple indirect pointer. What this means that that the block number if greater than 12 is offset by 12 within each indirect section. The indirect block group holds 4096 pointers, the double indirect holds 4096 x 4096 pointers, and the triple indirect holds 4096 x 4096 x 4096 pointers. To normalize the we have to add EXT2_NDIR_BLOCKS plus the previous indirect size as needed to the block number (bn) defined above. Thus we get the normalized block number (nbn).

if (bn < db)                             // direct
   nbn = bn;
else if (bn < 4096 + db)                 // indirect
   nbn = bn – db;
else if (bn < (4096 << 12) + 4096 + db)  // double indirect
   nbn = bn – pagesize – db;
else                                     // triple indirect
   nbn = bn – (4096 << 12) – 4096 – db;

Haveing the normalize block number allows us to calculate the indirect, double indirect, and triple indirect index.

if (bn < db)                              // direct
{
   index = bn;
}
else if (bn < 4096 + db)              // indirect
{
   nbn = bn – db;
   index = nbn;
}
else if (bn < (4096 << 12) + 4096 + db)  // double indirect
{
   nbn = bn – 4096 – db;
   doubleindex = nbn / 4096;
   index = nbn % 4096;
}
else                                      // triple indirect
{
   nbn = bn – (4096 << 12) – 4096 – db;
   tripleindex = nbn / (4096 << 12);
   doubleindex = nbn / 4096;
   index = nbn % 4096;
}

If these indexes calculations are correct then we can pull out the block.

struct buffer_head *bh;
char *lbp;  // Last block pointer

if (bn < db)                              // direct
{
   index = bn;
   bh = sb_bread(sb, e->block[index]);
}
else if (bn < 4096 + db)              // indirect
{
   nbn = bn – db;
   index = nbn;
   bh = sb_bread(sb, e->block[EXT2_IND_BLOCK]);
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[index]));
}
else if (bn < (4096 << 12) + 4096 + db)  // double indirect
{
   nbn = bn – 4096 – db;
   doubleindex = nbn / 4096;
   index = nbn % 4096;
   bh = sb_bread(sb, e->block[EXT2_DIND_BLOCK]);
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[doubleindex]));
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[index]));
}
else                                      // triple indirect
{
   nbn = bn – (4096 << 12) – 4096 – db;
   tripleindex = nbn / (4096 << 12);
   doubleindex = nbn / 4096;
   index = nbn % 4096;
   bh = sb_bread(sb, e->block[EXT2_TIND_BLOCK]);
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[tripleindex]));
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[doubleindex]));
   bh = sb_bread(sb, ((__u32 *)bh->b_data)[index]));
}

lbp = bh->b_data;

Well that is it. I am still checking to see if it works.

Inode number to struct ext2_inode and beyond

Submitted by jcochett
on August 5, 2006 - 9:31pm

This is a repeat of code, but it is basically the ext2_get_inode function from the fs/ext2/inode.c file. I inserted another function and got rid of the goto's.

To call the function you will need a pointer to the VFS super block, an inode number, an empty pointer to a buffer_head struct. Basically the can be achieved like this:

struct ext2_inode *e;
struct buffer_head *bh;

ts = get_current();
e = ino_to_ext2inode(ts->fs->root->d_sb, inode_number, &bh);

Here is the code ino_to_ext2inode:

#include < linux/fs.h >
#include < linux/ext2_fs.h >
#include < linux/buffer_head.h >

static struct ext2_inode *ino_to_ext2inode( struct super_block *sb, ino_t ino, struct buffer_head **p) { struct buffer_head * bh; unsigned long block_group; unsigned long block; unsigned long offset; struct ext2_group_desc * gdp; unsigned long group_desc; struct ext2_group_desc * desc; struct ext2_sb_info *sbi = EXT2_SB(sb); *p = NULL; if ((ino != EXT2_ROOT_INO && ino < EXT2_FIRST_INO(sb)) || ino > le32_to_cpu(EXT2_SB(sb)->s_es->s_inodes_count)) return NULL; block_group = (ino - 1) / EXT2_INODES_PER_GROUP(sb); if (block_group >= sbi->s_groups_count) return NULL; group_desc = block_group >> EXT2_DESC_PER_BLOCK_BITS(sb); offset = block_group & (EXT2_DESC_PER_BLOCK(sb) - 1); if (!sbi->s_group_desc[group_desc]) return NULL; desc = (struct ext2_group_desc *) sbi->s_group_desc[group_desc]->b_data; bh = sbi->s_group_desc[group_desc]; gdp = (struct ext2_group_desc *) (desc + offset); if (!gdp) return NULL; offset = ((ino - 1) % EXT2_INODES_PER_GROUP(sb)) * EXT2_INODE_SIZE(sb); block = le32_to_cpu(gdp->bg_inode_table) + (offset >> EXT2_BLOCK_SIZE_BITS(sb)); if (!(bh = sb_bread(sb, block))) return NULL; // include linux/buffer_head.h *p = bh; offset &= (EXT2_BLOCK_SIZE(sb) - 1); return (struct ext2_inode *) (bh->b_data + offset); }

Now with a ext2 inode pointer we can find out what type of inode it is. But really we are only concerned with getting regular files. Additionally we should make sure that the file doesn't have a deletion time, because the functin above will retreive a pointer to an inode even if it was deleted. As a side note, ext2 does not delete the inode and its data, it just marks it as free.

if (S_ISREG(ext2inode->i_mode) && !ext2inode->i_dtime)
   printk(“Useable inode number!\n”);

If it passes this test we can do a simple check to see if it has enough slack space, in the case of this project we just need 1024 or more.

if (ts->fs->root->d_sb->s_blocksize – 
   (ext2inode->i_size % ts->fs->root->d_sb->s_blocksize) > 1024)
   printk(“Enough slack space\n”);

Coding Continues

Submitted by jcochett
on August 2, 2006 - 10:53pm

It has been April since my last post. I have been coding up the structures to handle the basic FAT filesystem. I am at the point where all I have left is complex kernel code to figure out. These issues are as follows.

  1. Bringing an inode by number into memory from disk, or accessing it already there. Up till now I had a file struct pointer.
  2. Identifying what that inode is (regular file, directory, irregular file,...:)
  3. Determining if slack space exists
  4. Getting to the last logical block to read or write
  5. Writing the inode in cache back down to disk (aka The Cache Incoherency Problem)

I have a few other issues that are minor and are not worth mentioning unless they really start to cause problems. More later.

SFS Coding: user and kernel interaction

Submitted by jcochett
on April 24, 2006 - 9:54pm

It has been a while, and coding has had it's ups and down, so have I. I am currently wrapping up the code between the userspace program and the kernel function that communicate to each other via a single proc file. What I decided to do is to have the kernel module keep state so that it can expect what the next read or write to proc will be if it needs to. More details later.

SFS Coding Begins

Submitted by jcochett
on April 4, 2006 - 9:56am

I have started some software engineering to get ready to start coding up my Slack Filesystem. Basically it will be a virtual filesystem meaning that it will reside on top of ext2. I have been told that I should use the term "virtual filesystem" because linux uses this to term to mean a layer of the kernel subsystem that facilitates an interface for other filesystems. What I mean when I use Virtual Filesystem, that in essence or effect is a filesystem but not to the fact that it is formatted on the disk itself. Thus the harddisk or even parts of the harddisk are not formatted. The Slack Filesystem is virtually there.

But I can see confusion with using the same term, so I will create a new one for the Slack Filesystem. Let's say that is a parasitic pseudo-filesystem. Parasitic cause without the slack space of another filesystem it can't exist, and pseudo because it is "apparently similar" to a real filesystem.

On to my point. I will be finishing my 'software engineering' phase of the project and soon be moving into the actually coding. I am taking my time and attempting to think ahead, which the lack of has created problems in past projects.

I still have the issue with the cache incoherency. I will squeeze in some time to figure it out.

System Call Redirection 2.6 Kernel Code That Works

Submitted by jcochett
on March 30, 2006 - 9:12pm

I have read a lot of code on system call redirection. None of it worked. I narrowed down my problem to the calling of the original function. After several weeks of misery, I found that the problems was merely a casting problems. I was very please to have some working code. It is a big relief. My project was basically at a brick wall. My teacher stated that I could take the easy way out, I am glad I stuck to it. I can't take full credit for the code, the casting solution I found with in some code called sandbox.c. I will research the product later. You will have to substitute your System Map address. I have code that does it, i will post it laster too. The inode is the number of a file I was testing, it is really unneeded, but it was good for testing purposes. Right now how 'bout some working code:


/****************************************************************************** 
 * Name: mod.c
 * Info: A few system call hijackings for a Linux Kernel 2.6 LKM.
 *
 * Usage: insmod mod.ko && tail /var/log/messages 
 *
 * Compile: Use Makefile
 * Makefile
 *
 * obj-m += mod.o
 * compile:
 *      make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
 * clean:
 * 	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
 * 
 *
 *
 ******************************************************************************/ 

#include < linux/module.h > 
#include < linux/kernel.h > 
#include < linux/init.h > 
#include < asm/unistd.h > 
#include < linux/syscalls.h >		// sys_close

MODULE_LICENSE("GPL"); 

void **sys_call_table; 

// -----------------------------------------------------------------------------
// Sys Call Table Address 
// -----------------------------------------------------------------------------
unsigned long **find_sys_call_table(void) 
{
   unsigned long **sctable;
   unsigned long ptr;
   extern int loops_per_jiffy;

   sctable = NULL;
   for (ptr = (unsigned long)&loops_per_jiffy;
        ptr < (unsigned long)&boot_cpu_data; 
        ptr += sizeof(void *))
   {
      unsigned long *p;
      p = (unsigned long *)ptr;
      if (p[__NR_close] == (unsigned long) sys_close)
      {
         sctable = (unsigned long **)p;
         return &sctable[0];
      }
   }
   return NULL;
}


// -----------------------------------------------------------------------------
// WRITE
// -----------------------------------------------------------------------------
asmlinkage int (*o_write) (int fd, const void *buf, size_t count); 
asmlinkage int my_write(int fd, const void *buf, size_t count) 
{ 
   int r;
   printk(KERN_INFO "Syscall WRITE Hijacked\n"); 
   r = o_write(fd, buf, count); 
   return r;
} 



// -----------------------------------------------------------------------------
// UNLINK
// -----------------------------------------------------------------------------
asmlinkage int (*o_unlink) (const char *pathname); 
asmlinkage int my_unlink(const char *pathname) 
{ 
   int r;
   printk(KERN_INFO "Syscall UNLINK Hijacked\n"); 
   r = o_unlink(pathname); 
   return r;
} 



// -----------------------------------------------------------------------------
// CHMOD
// -----------------------------------------------------------------------------
asmlinkage int (*o_chmod) (const char *filename, mode_t mode); 
asmlinkage int my_chmod(const char *filename, mode_t mode)
{
   int r;
   printk(KERN_INFO "Syscall CHMOD Hijacked\n"); 
   r = o_chmod(filename, mode);
   return r;
}



// -----------------------------------------------------------------------------
// READ
// -----------------------------------------------------------------------------
asmlinkage int (*o_read) (int fd, void *buf, size_t count);
asmlinkage int my_read(int fd, void *buf, size_t count)
{
   int r;
   printk(KERN_INFO "Syscall READ Hijacked\n"); 
   r = o_read(fd, buf, count);
   return r;
}



// -----------------------------------------------------------------------------
// INIT
// -----------------------------------------------------------------------------
static int __init init_lkm (void) 
{ 
   sys_call_table = find_sys_call_table();
   if (sys_call_table == NULL)
   {
      printk(KERN_ERR "Cannot find the system call address\n"); 
      return -1;  // do not load
   }

   //WRITE
   o_write = (int(*)(int fd, const void *buf, size_t count))(sys_call_table[__NR_write]); 
   sys_call_table[__NR_write] = (unsigned long) my_write; 

   //UNLINK
   //o_unlink = (int(*)(const char *pathname))(sys_call_table[__NR_unlink]); 
   //sys_call_table[__NR_unlink] = (unsigned long) my_unlink; 

   //CHMOD
   //o_chmod = (int(*)(const char *filename, mode_t mode))(sys_call_table[__NR_chmod]); 
   //sys_call_table[__NR_chmod] = (unsigned long) my_chmod; 

   //READ
   //o_read = (int(*)(int fd, void *buf, size_t count))(sys_call_table[__NR_read]); 
   //sys_call_table[__NR_read] = (unsigned long) my_read; 

   return 0; 
} 



// -----------------------------------------------------------------------------
// EXIT
// -----------------------------------------------------------------------------
static void __exit exit_lkm (void) 
{ 
   //WRITE
   sys_call_table[__NR_write] = (unsigned long) o_write; 

   //UNLINK
   //sys_call_table[__NR_unlink] = (unsigned long) o_unlink; 

   //CHMOD
   //sys_call_table[__NR_chmod] = (unsigned long) o_chmod; 

   //READ
   //sys_call_table[__NR_read] = (unsigned long) o_read; 
} 

module_init (init_lkm); 
module_exit (exit_lkm); 

2.6 Syscall Redirection

Submitted by jcochett
on March 26, 2006 - 3:28pm

I have shifted gears from finishing my cache coherency problems, to working on a fundamental part of the project: syscall redirection. Originally this was semi-finished, since we were using a stock 2.4 kernel. Now is has become a big obstacle now that we are working on a 2.6 kernel. After doing some research, I am unable at this time to fully implement syscall redirection (syscall hijacking) for the 2.6 kernel. The basic algorithm looks like this (for the 2.6 kernel):

Cache Coherence

Submitted by jcochett
on March 19, 2006 - 6:09pm

If you are keeping up to date on my blogging then you will know I have a program that opens a file, pulls the file's contents into memory (pulls a full block), gets a pointer to the file's contents, and then reads or overwrites at that location. The problem is that I need to get the data that is overwritten back down to the disk.

The procedure to do this has to be: 1. Mark the buffer (page?) dirty, then 2.

Porting to 2.6

Submitted by jcochett
on March 15, 2006 - 10:33pm

I wrote a small kernel module to bring a file's contents into memory. Once there, I verify the contents and attempt to make a small change (i.e. like replace a's with b's). After I change the contents, I mark the buffer dirty and attempt to write the buffer back to disk. Unfortunately it currently does not write it back to disk right now. My teacher stated he wanted to look at my code. I sent it to him but he attempted to compile it on a 2.6 kernel, but it was written for 2.4. I eventually was able to port it to 2.6, but it wasn't easy. The 2.4 kernel provided a very easy way of getting the logical block number of a file. It was removed in the 2.6 kernel. I had to pull out code from the ext2 inode.c file to convert a regular inode to an ext2 inode. I was sure that it wasn't going to work, but luckly it did. Finally I replaced the 2.4 bread() function with 2.6 __bread() function. Great, now I can read the file's contents in memory again.

The Successful Writing of Slack Within the Kernel

Submitted by jcochett
on March 6, 2006 - 11:44pm

Well I did write into slack before my failure of not access the file data in memory. With a kernel module (2.6) it worked like this:

1. Save the origial size of the file
2. Open the file to write and append
3. Write something
4. Close file
5. Open file for read
6. Change the size file back to original
7. Close file
8. Invoke the write_inode_now() function

I am not sure why this approached was overlooked as an option, but it did work very nicely. And I had to use the the get_fs() and set_fs() functions:

Direct Memory Access and Back to Bmap

Submitted by jcochett
on March 6, 2006 - 11:33pm

After some research and a host of failed attempts, I was able to find a simple procedure to read the contents of a file sitting in memory. It should be noted that I am currently working with a 2.4.18-3 or otherwise known as redhat 7.3 stock kernel. After opening a file, I use the function bread() to pull the buffer_head structure into memory. The bread() function looks like:

struct buffer_head *bread(parition device, logical block number, block size);

Bmap

Submitted by jcochett
on March 5, 2006 - 9:11pm

After a few weeks of research, my teacher and i discovered very few slack space tools. The one worth mentioning is bmap. It basically opens up the partition (i.e. /dev/hda2), lseek to the logical block number of a file passed in from the commandline, and read/writes to the slack space of that block. All from user space. This is the basis of my approach for my graduate project except I want to do it within the kernel.

What has been done prior to the blog's beginnings.

Submitted by jcochett
on March 5, 2006 - 8:15pm

It is the beginning of march. Since about the begining of the year my teacher and grad advisor have been solidfy the basic structure of the slack space filesytem. I originally based one the ext2 file system but in the end we decided to go with one that resembled a FAT system. Basically there are four main structures a 'superblock', a bitmap for which inodes have slack, a file allocation table establishing a slack block link-list, and a file table for cross-referencing a filename and its starting inode.

The blog begins

Submitted by jcochett
on March 4, 2006 - 10:16pm

I really don't understand the fascination with a blog. Not sure why I am starting one within kerneltrap except to kind-of journel my progress to the end of my project. I have embarked on a project to write a slack space file system as part of a graduate project. When I tell people about what I am doing it is usually followed by "why?" or "what is it good for?". Those questions delayed my development for over a year, before i talked with a very smart teacher who told me that the answer to those questions.