Re: [PATCH 00/23] per device dirty throttling -v8

Previous thread: [Resend][PATCH] PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION (updated) by Rafael J. Wysocki on Sunday, August 5, 2007 - 10:06 am. (10 messages)

Next thread: Re: Page Cache question by Adnan Khaleel on Sunday, August 5, 2007 - 10:24 am. (1 message)
From: Brice Figureau
Date: Sunday, August 5, 2007 - 10:22 am

Hi,



Foreword: I'm the OP of bug #7372. 

I just want to say/add that:
 1) I'm running the per-bdi patch since about 30 days on a master mysql server
under somewhat mild load without any adverse effect I could notice.

 2) I _still_ don't get the "performances" of 2.6.17, but since that's the
better combination I could get, I think there is IMHO progress in the right
direction (to be compared to no progress since 2.6.18, that's better :-)).

To be honest, a vanilla 2.6.17 not tuned at all (ie vfs_cache_pressure and other
knobs in /proc/sys/vm like swappiness and dirty_*) is still better than any
other upcoming kernel I tested. Thus I still think 2.6.18 added a big regression
(which unfortunately I couldn't find).
Read the full bug report for any background information if needed.

Unfortunately it isn't practical to git-bisect my issue as the server is a
production server that can't be rebooted/stopped whenever I want (and since I
found workarounds of the issue...).

Thanks for showing interest in this issue.

Please CC: me on any answers as I'm not subscribed to the list.

--
Brice Figureau

-

From: Andi Kleen
Date: Sunday, August 5, 2007 - 3:17 pm

If you could characterize your workload well (e.g. how many disks,
what file systems, what load on mysql) perhaps it would be possible
to reproduce the problem with a test program or a mysql driver.
Then it could be bisected.

-Andi
-

From: Brice Figureau
Date: Monday, August 6, 2007 - 1:40 am

Hi Andi,


My server is a Dell Poweredge 2850 (bi-Xeon EM64T 3GHz running without
HT, 4GB of RAM), with a Perc 4/Di (a LSI megaraid with a BBU of 256MB). 
The hardware RAID card has 2 channels, one is connected to 2 10k RPM
146GB SCSI disk that are mirrored in a RAID 1 array on which the system
resides (/dev/sda). The second channel is connected to 4 10k RPM 146GB
disks, on a RAID 10 array which contains the database files and database
logs (/dev/sdb).

The kernel and userspace are 64bits.
Above the hardware RAID arrays there is LVM2 with two physical groups
(one per array). The RAID10 has only one logical volume.

The database volume (the RAID10) is an ext3 volume mounted with
rw,noexec,nosuid,nodev,noatime,data=writeback.

The I/O scheduler on all arrays is deadline.

/proc knobs with values other than defaults are:
/proc/sys/vm/swappiness = 2
/proc/sys/vm/dirty_background_ratio = 1
/proc/sys/vm/dirty_ratio = 2
/proc/sys/vm/vfs_cache_pressure = 1

The only thing running on the server is mysql. 
Mysql memory footprint is about 90% of physical RAM. Mysql is configured
to use exclusively InnoDB.

Mysql accesses its database files in O_DIRECT mode.
Since the database fits in RAM, the only kind of access Mysql is doing
is writing to the innodb log, the mysql binlog and finally to the innodb
database files.
There are certainly a whole lot of fsync'ing happening.
All the database reads are done from the innodb in-RAM cache.

During all my kernel tests (see the original bug report) the machine was
not swapping (so that's not the reason of the stuttering).

If that helps:
db1:~# cat /proc/meminfo 
MemTotal:      4052420 kB
MemFree:         23972 kB
Buffers:         54420 kB
Cached:         168096 kB
SwapCached:    1541744 kB
Active:        3723468 kB
Inactive:       157180 kB
SwapTotal:    11863960 kB
SwapFree:     10193064 kB
Dirty:             320 kB
Writeback:           0 kB
AnonPages:     3657744 kB
Mapped:          20508 kB
Slab:           119964 ...
From: Stewart Smith
Date: Monday, August 13, 2007 - 6:44 pm

binlog is written using buffered IO.

for InnoDB, binlog is synced first, then innodb log. on restart (in 5.0)
these are synced back up so you don't get inconsistencies.

and from a quick look at the innobase source, only data file is using

yes. Keep in mind that the binlog grows in file size too... so this has
to sync all the metadata as well (ick, i know).
--=20
Stewart Smith, Senior Software Engineer
MySQL AB, www.mysql.com
Office: +14082136540 Ext: 6616
VoIP: 6616@sip.us.mysql.com
Mobile: +61 4 3 8844 332

Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html
From: Andi Kleen
Date: Monday, August 13, 2007 - 7:25 pm

It might be an interesting experiment to see if it still happens
with the file system remounted as ext2. ext2 has a much more 
benign fsync than ext3.

-Andi
-

From: Brice Figureau
Date: Tuesday, August 14, 2007 - 12:59 am

Back in the first days of my original bug report I moved the binlogs to
another disk and it didn't change anything to my issue.


Is it possible to perform a live remount of the fs on ext2 ?

Beside that, the RAID card has a battery backed RAM in write-back mode,
I was told that fsync don't really hurt in this case (moreover the fs is
mounted in journal=writeback mode).

I'll post soon blktrace files in the original bug report, this will show
exactly what is the disk workload in the baseline case _and_ in the
underload atypical case. Maybe that will help to shed some lights on the
issue?

Anyway, thanks,
-- 
Brice Figureau <brice+lklm@daysofwonder.com>

-

Previous thread: [Resend][PATCH] PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION (updated) by Rafael J. Wysocki on Sunday, August 5, 2007 - 10:06 am. (10 messages)

Next thread: Re: Page Cache question by Adnan Khaleel on Sunday, August 5, 2007 - 10:24 am. (1 message)