Hi, Foreword: I'm the OP of bug #7372. I just want to say/add that: 1) I'm running the per-bdi patch since about 30 days on a master mysql server under somewhat mild load without any adverse effect I could notice. 2) I _still_ don't get the "performances" of 2.6.17, but since that's the better combination I could get, I think there is IMHO progress in the right direction (to be compared to no progress since 2.6.18, that's better :-)). To be honest, a vanilla 2.6.17 not tuned at all (ie vfs_cache_pressure and other knobs in /proc/sys/vm like swappiness and dirty_*) is still better than any other upcoming kernel I tested. Thus I still think 2.6.18 added a big regression (which unfortunately I couldn't find). Read the full bug report for any background information if needed. Unfortunately it isn't practical to git-bisect my issue as the server is a production server that can't be rebooted/stopped whenever I want (and since I found workarounds of the issue...). Thanks for showing interest in this issue. Please CC: me on any answers as I'm not subscribed to the list. -- Brice Figureau -
If you could characterize your workload well (e.g. how many disks, what file systems, what load on mysql) perhaps it would be possible to reproduce the problem with a test program or a mysql driver. Then it could be bisected. -Andi -
Hi Andi, My server is a Dell Poweredge 2850 (bi-Xeon EM64T 3GHz running without HT, 4GB of RAM), with a Perc 4/Di (a LSI megaraid with a BBU of 256MB). The hardware RAID card has 2 channels, one is connected to 2 10k RPM 146GB SCSI disk that are mirrored in a RAID 1 array on which the system resides (/dev/sda). The second channel is connected to 4 10k RPM 146GB disks, on a RAID 10 array which contains the database files and database logs (/dev/sdb). The kernel and userspace are 64bits. Above the hardware RAID arrays there is LVM2 with two physical groups (one per array). The RAID10 has only one logical volume. The database volume (the RAID10) is an ext3 volume mounted with rw,noexec,nosuid,nodev,noatime,data=writeback. The I/O scheduler on all arrays is deadline. /proc knobs with values other than defaults are: /proc/sys/vm/swappiness = 2 /proc/sys/vm/dirty_background_ratio = 1 /proc/sys/vm/dirty_ratio = 2 /proc/sys/vm/vfs_cache_pressure = 1 The only thing running on the server is mysql. Mysql memory footprint is about 90% of physical RAM. Mysql is configured to use exclusively InnoDB. Mysql accesses its database files in O_DIRECT mode. Since the database fits in RAM, the only kind of access Mysql is doing is writing to the innodb log, the mysql binlog and finally to the innodb database files. There are certainly a whole lot of fsync'ing happening. All the database reads are done from the innodb in-RAM cache. During all my kernel tests (see the original bug report) the machine was not swapping (so that's not the reason of the stuttering). If that helps: db1:~# cat /proc/meminfo MemTotal: 4052420 kB MemFree: 23972 kB Buffers: 54420 kB Cached: 168096 kB SwapCached: 1541744 kB Active: 3723468 kB Inactive: 157180 kB SwapTotal: 11863960 kB SwapFree: 10193064 kB Dirty: 320 kB Writeback: 0 kB AnonPages: 3657744 kB Mapped: 20508 kB Slab: 119964 ...
binlog is written using buffered IO. for InnoDB, binlog is synced first, then innodb log. on restart (in 5.0) these are synced back up so you don't get inconsistencies. and from a quick look at the innobase source, only data file is using yes. Keep in mind that the binlog grows in file size too... so this has to sync all the metadata as well (ick, i know). --=20 Stewart Smith, Senior Software Engineer MySQL AB, www.mysql.com Office: +14082136540 Ext: 6616 VoIP: 6616@sip.us.mysql.com Mobile: +61 4 3 8844 332 Jumpstart your cluster: http://www.mysql.com/consulting/packaged/cluster.html
It might be an interesting experiment to see if it still happens with the file system remounted as ext2. ext2 has a much more benign fsync than ext3. -Andi -
Back in the first days of my original bug report I moved the binlogs to another disk and it didn't change anything to my issue. Is it possible to perform a live remount of the fs on ext2 ? Beside that, the RAID card has a battery backed RAM in write-back mode, I was told that fsync don't really hurt in this case (moreover the fs is mounted in journal=writeback mode). I'll post soon blktrace files in the original bug report, this will show exactly what is the disk workload in the baseline case _and_ in the underload atypical case. Maybe that will help to shed some lights on the issue? Anyway, thanks, -- Brice Figureau <brice+lklm@daysofwonder.com> -
