Hi,
i have a problem with vmalloc() and vm_ops.page_mkwrite().
ReadOnly access works, but on a write access the VM will
endless invoke the vm_ops.page_mkwrite() handler.I tracked down the problem to the
struct page.mapping pointer,
which is NULL.The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
which will be in an extra thread handled, to perform the IO and clean and
write protect all pages with page_clean().I am not sure if the is a feature of the new rmap code or a bug.
Is there an way to get a similar functionality? Currently, i have no
idea
how to get the ptep from a page alloced with vmalloc().Greetings,
StefaniHere is a small sample driver:
#include <linux/module.h>
#include <linux/errno.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>#define DEVICE_NAME "mydrv"
#define DEVICE_MAJOR 240static u8 *mydrv_memory;
static const u_long mydrv_memory_size=1024*1024; // 1 Megabyte#ifdef MODULE
static u_long vmas=0;
#endifstatic void mydrv_vma_open(struct vm_area_struct *vma)
{
#ifdef MODULE
if (vmas++==0)
try_module_get(THIS_MODULE);
#endif
}static void mydrv_vma_close(struct vm_area_struct *vma)
{
#ifdef MODULE
if (--vmas==0)
module_put(THIS_MODULE);
#endif
}struct page *mydrv_vma_nopage(struct vm_area_struct *vma,unsigned long
address,int *type)
{
unsigned long offset;
struct page *page;offset = (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
printk("--> mydrv_vma_nopage:%lu(%08lx)\n",offset,offset);if (offset >= mydrv_memory_size)
return NOPAGE_SIGBUS;page = vmalloc_to_page(mydrv_memory + offset);
if (!page)
return NOPAGE_OOM;get_page(page);
if (type)
*type = VM_FAULT_MINOR;
return page;
}int mydrv_mkwrite(struct vm_area_struct *vma,struct page *page...
(cc's added)
-
Hi,
An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
I understood from the thread that PeterZ is looking into page_mkclean
changes which I guess went into 2.6.23. I'm also happy to help in any
way if the way we're doing fb_defio needs to change.Thanks,
jaya
-
OK, seems I can't read. Or at least, I missed a large part of the
problem.page_mkclean() hasn't changed, it was ->page_mkwrite() that changed. And
looking at the fb_defio code, I'm not sure I understand how its
page_mkclean() use could ever have worked.The proposed patch [1] only fixes the issue of ->page_mkwrite() on
vmalloc()'ed memory. Not page_mkclean(), and that has never worked from
what I can make of it.Jaya, could you shed some light on this? I presume you had your display
working.[1] which I will clean up and resend after this issue is cleared up -
and preferably tested by someone who has this hardware.-
I thought I had it working. I saw the display update after each
mmap/write sequence to the framebuffer. I need to check if there's an
munmap or anything else going on in between write sequences that would
cause it to behave like page_mkclean was working.Is it correct to assume that page_mkclean should mark the pages
read-only so that the next write would again trigger mkwrite? Even if
the page was from a vmalloc_to_page()?Thanks,
jaya
-
That is the crux, I only ever implemented it for file pages.
-
Hmm, so these vmalloc pages are mapped into user-space with
remap_pfn_range(), which doesn't have any form of rmap. That is, given a
pfn there is no way to obtain all ptes for it. So the interface to
page_mkclean() could never work for these (as it only provides a struct
page *).[ also, remap_vmalloc_range() suffers similar issues, only file and anon
have proper rmap ]I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps
we could assist drivers in maintaining and using vma lists.I think page_mkclean_one() would work if you'd manually set page->index
and iterate the vmas yourself. Although atm I'm not sure of anything so
don't pin me on it.-
In the case of defio, I think it's no trouble to build a list of vmas
at mmap time and then to iterate through them when it's ready for
mkclean time as you suggested. I don't fully understand page->index
yet. I had thought it was only used by swap cache or file map.On an unrelated note, I was looking for somewhere to stuff a 16 bit
offset (so that I have a cheap way to know which struct page
corresponds to which framebuffer block or offset) for another driver.
I had thought page->index was it but I think I am wrong now.Thanks,
jaya
-
Yeah, page->index is used along with vma->vmpgoff and vma->vm_start to
determine the address of the page in the given vma:address =3D vma->vm_start + ((page->index - vma->vm_pgoff) << PAGE_SHIFT)=
;and from that address the pte can be found by walking the vma->vm_mm
page tables.So page->index does what you want it to, identify which part of the
framebuffer this particular page belongs to.
Ok. I'm attempting to walk the code sequence. Here's what I think:
- driver loads
- driver vmalloc()s its fb
- this creates the necessary pte entries
then...
- app mmap(/dev/fb0)
- vma is created
- defio mmap adds this vma to private list (equivalent of
address_space or anon_vma)
- app touches base + pixel(128,128) = base + 16k
- page fault
- defio nopage gets called
- defio nopage does vmalloc_to_page(base+16k)
- that finds the correct struct page corresponding to that vaddr.
page->index has not been set by anyone so far, right?
* ah... i see, you are suggesting that this is where I could set the
index since i know the offset i want it to represent. right?
- defio mkwrite get called. defio adds page to its list. schedules delayed work
- app keeps writing the page
- delayed work occurs
- foreach vma { foreach page { page_mkclean_one(page, vma) }
- cycle repeats...Thanks,
jaya
-
well, one set thereof, the kernel mappings, which for this purpose are
this installs a user space page table entry for your page; this is the
Not quite, you would set that right after vmallocing, just set an
increasing page->index starting with 0 for the first page.Then ensure your vma->vm_pgoff is 0 (which should be the case since
userspace will most likely mmap the whole thing, and if not it stillYeah, page_mkclean_one(page, vma) will use vma_address() to obtain an
user-space address for the page in this vma using page->index and the
formula from the last email, this address is then used to walk the page
tables and obtain a pte.This will be the user-space pte installed by your nopfn handler. Not the
Oops, sorry that I missed that. Now I understand. I think:
page->mapping = vma->vm_file->f_mapping
page->index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoffat nopage time and then before the driver vfrees, I'll clear mapping
for all those pages.Thanks,
jaya
-
Hmm, there is a thought. I had not considered that mapping a chardev
would have that effect.I'd have to have a look at the actual code, but yeah, that might very
well work out. How silly of me.Thanks!
Hi,
the question is how can i get all pte's from a vmalloc'ed memory. Due to
the zeroed mapping pointer i dont see how to do this?-
The mapping pointer is zeroed because you've done nothing to set it.
Below is how I answered you a week ago. But this is new territory
(extending page_mkclean to work on more than just pagecache pages),
I'm still unsure what would be the safest way to do it.Interesting. You need to ask Jaya (CC'ed) since he's the one
who put that code into fb_defio.c, exported page_mkclean, andpage_mkclean was written in the belief that it was being used on
pagecache pages. I'm not sure how deeply engrained that belief is.If it can easily and safely be used on something else, that may be
nice: though there's a danger we'll keep breaking and re-breaking
it if there's only one driver using it in an unusual way. CC'edA pagecache page would have page->mapping initialized to point to
the struct address_space of the vma, and page->index to the offset
(in PAGE_SIZE units): see mm/filemap.c:add_to_page_cache. Without
page->mapping set, page_mkclean_file won't be able to find the vmas
in which the page might appear; and without page->index set, it
won't be able to find where the page should be in those vmas.If such a driver does not put its pages into the page cache (the
safer course? I'm unsure), then it needs to set page->mapping and
page->index appropriately (and reset page->mapping to NULL before
freeing).Hugh
-
Quite, I think manual usage of page_mkclean_one() on the vma gotten from
mmap() along with properly setting page->index is the simplest solution
to make work.Making page_mkclean(struct page *) work for remap_pfn/vmalloc_range()
style mmaps would require extending rmap to work with those, which
includes setting page->mapping to point to a anon_vma like object.But that sounds like a lot of work, and I'm not sure its worth the
overhead, because so far all users of remap_pfn/vmalloc_range() have
survived without.
Yeah, its the truncate race stuff introduced by Nick in
d0217ac04ca6591841e5665f518e38064f4e65bdI'm a bit at a loss on how to go around fixing this. One ugly idea I had
was to check page->mapping before going into page_mkwrite() and when
that is null, don't bother with the truncate check.-
Something like this
---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2300,6 +2300,8 @@ static int __do_fault(struct mm_struct *
* to become writable
*/
if (vma->vm_ops->page_mkwrite) {
+ struct address_space *mapping = page->mapping;
+
unlock_page(page);
if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
ret = VM_FAULT_SIGBUS;
@@ -2314,7 +2316,7 @@ static int __do_fault(struct mm_struct *
* reworking page_mkwrite locking API, which
* is better done later.
*/
- if (!page->mapping) {
+ if (mapping != page->mapping) {
ret = 0;
anon = 1; /* no anon but release vmf.page */
goto out;-
I think it's a fine minimal patch. Maybe add a comment to say exactly
what we're doing here (pagecache generally just uses !mapping to test
for truncate).-
| Artem Bityutskiy | [PATCH 10/44 take 2] [UBI] debug unit implementation |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Dave Young | Re: Linux v2.6.24-rc1 |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Linus Torvalds | Re: [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
