On Wed, May 14, 2008 at 06:11:22AM +0200, Nick Piggin wrote:I assume by coherent domains, your are actually talking about system images. Our memory coherence domain on the 3700 family is 512 processors on 128 nodes. On the 4700 family, it is 16,384 processors on 4096 nodes. We extend a "Read-Exclusive" mode beyond the coherence domain so any processor is able to read any cacheline on the system. We also provide uncached access for certain types of memory beyond the coherence domain. For the other partitions, the exporting partition does not know what virtual address the imported pages are mapped. The pages are frequently mapped in a different order by the MPI library to help with MPI collective operations. For the exporting side to do those TLB flushes, we would need to replicate all that importing information back to the exporting side. Additionally, the hardware that does the TLB flushing is protected by a spinlock on each system image. We would need to change that simple spinlock into a type of hardware lock that would work (on 3700) outside the processors coherence domain. The only way to do that is to use uncached addresses with our Atomic Memory Operations which do the cmpxchg at the memory controller. The uncached accesses are an order of magnitude or more slower. But it isn't that we are having a problem adapting to just the hardware. One of the limiting factors is Linux on the other partition. zap_page_range calls unmap_vmas which walks to vma->next. Are you saying that can be walked without grabbing the mmap_sem at least readably? I feel my understanding of list management and locking completely shifting. Are you suggesting the sending side would not need to sleep or the receiving side? Assuming you meant the sender, it spins waiting for the remote side to acknowledge the invalidate request? We place the data into a previously agreed upon buffer and send an interrupt. At this point, we would need to start spinning and waiting for completion. Let's assume we never run out of buffer space. The receiving side receives an interrupt. The interrupt currently wakes an XPC thread to do the work of transfering and delivering the message to XPMEM. The transfer of the data which XPC does uses the BTE engine which takes up to 28 seconds to timeout (hardware timeout before raising and error) and the BTE code automatically does a retry for certain types of failure. We currently need to grab semaphores which _MAY_ be able to be reworked into other types of locks. Thanks, Robin --
| David Miller | Re: [patch 7/8] fdmap v2 - implement sys_socket2 |
| Sean | Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation,pathname matching |
| Andi Kleen | Re: missing madvise functionality |
| Alan Cox | [PATCH 03/57] ali: watchdog locking and style |
git: | |
| Guido Ostkamp | [PATCH] Fix Solaris Workshop Compiler issues |
| David Lang | Re: mingw, windows, crlf/lf, and git |
| Johannes Schindelin | Re: [kernel.org users] [RFD] On deprecating "git-foo" for builtins |
| Johannes Schindelin | Re: [PATCH] Fix off by one error in prep_exclude. |
| Marco Peereboom | Re: Real men don't attack straw men |
| patrick keshishian | SMTP flood + spamdb |
| Marcos Laufer | dmesg IBM x3650 OpenBSD 4.3 |
| Nick Holland | Re: The Atheros story in much fewer words |
| Hans de Goede | Re: cat /proc/net/tcp takes 0.5 seconds on x86_64 |
| Stephen Hemminger | [RFC] TCP illinois max rtt aging |
| Tilman Schmidt | Re: 2.6.25-rc8: FTP transfer errors |
| Evgeniy Polyakov | Re: Network/block layer race. |
| high memory | 15 hours ago | Linux kernel |
| semaphore access speed | 18 hours ago | Applications and Utilities |
| the kernel how to power off the machine | 19 hours ago | Linux kernel |
| Easter Eggs in windows XP | 22 hours ago | Windows |
| Shared swap partition | 22 hours ago | Linux general |
| Root password | 23 hours ago | Linux general |
| Where/when DNOTIFY is used? | 1 day ago | Linux kernel |
| How to convert Linux Kernel built-in module into a loadable module | 1 day ago | Linux kernel |
| Linux 2.6.24 and I/O schedulers | 1 day ago | Linux kernel |
| USB Driver -- Interrupt Polling -- A Little Help Please | 1 day ago | Linux general |
