PostgreSQL ships with a simple database benchmarking tool named pgbench, in what's labeled the contrib section (in many distributions it's a separate package from the main server/client ones). I see there's been some work done already improving how the PostgreSQL server works under the new scheduler (the "Poor PostgreSQL scaling on Linux 2.6.25-rc5" thread). I wanted to provide you a different test case using pgbench that has taken a sharp dive starting with 2.6.23, and the server improvement changes in 2.6.25 actually made this problem worse. I think it will be easy for someone else to replicate my results and I'll go over the exact procedure below. To start with a view of how bad the regression is, here's a summary of the results on one system, an AMD X2 4600+ running at 2.4GHz, with a few interesting kernels. I threw in results from Solaris 10 on this system as a nice independant reference point. The numbers here are transactions/second (TPS) running a simple read-only test over a 160MB data set, I took the median from 3 test runs: Clients 2.6.9 2.6.22 2.6.24 2.6.25 Solaris 1 11173 11052 10526 10700 9656 2 18035 16352 14447 10370 14518 3 19365 15414 17784 9403 14062 4 18975 14290 16832 8882 14568 5 18652 14211 16356 8527 15062 6 17830 13291 16763 9473 15314 8 15837 12374 15343 9093 15164 10 14829 11218 10732 9057 14967 15 14053 11116 7460 7113 13944 20 13713 11412 7171 7017 13357 30 13454 11191 7049 6896 12987 40 13103 11062 7001 6820 12871 50 12311 11255 6915 6797 12858 That's the CentOS 4 2.6.9 kernel there, while the rest are stock ones I compiled with a minimum of fiddling from the defaults (just adding support for my SATA RAID card). You can see a major drop with the recent kernels at high client loads, and the changes in 2.6.25 seem to have really hurt even the low client count ones. The other recent hardware I have here, an Intel Q6600 based system, gives even more maddening results. On successive benchmark runs, you can watch it break down only sometimes once you get just above 8 clients. At 10 and 15 clients, when I run it a few times, I'll sometimes get results in the good 25-30K TPS range, while others will give the 10K slow case. It's not a smooth drop off like in the AMD case, the results from 10-15 are really unstable. I've attached some files with 5 quick runs at each client load so you can see what I'm talking about. On that system I was also able to test 2.6.26-rc2 which doesn't look all that different from 2.6.25. All these results are running everything on the server using the default local sockets-based interface, which is relevant in the real world because that's how a web app hosted on the same system will talk to the database. If I switch to connecting to the database over TCP/IP and run the pgbench client on another system, the extra latency drops the single client case to ~3100TPS. But the high client load cases are great--about 26K TPS at 50 clients. That result is attached as q6600-remote-2.6.25.txt, the remote client was running 2.6.20. Since recent PostgreSQL results were also fine with sysbench as the benchmark driver, this suggests the problem here is actually related to the pgbench client itself and how it gets scheduled relative to the server backends, rather than being inherent to the server. Replicating the test results ---------------------------- Onto replicating my results, which I hope works because I don't have too much time to test potential fixed kernels myself (I was supposed to be working on the PostgreSQL code until this sidetracked me). I'll assume you can get the basic database going, if anybody needs help with that let me know. There is one server tunable that needs to be adjusted before you can get useful PostgreSQL benchmarks from this (and many other) tests. In the root of the database directory, there will be a file named postgresql.conf. Edit that and changed the setting for the shared_buffers parameter to 256MB to mimic my test setup. You may need to bump up shmmax (this is the one list where I'm happy I don't have to explain what that means!). Restart the server and check the logs to make sure it came back up, if shmmax is too low it will just tell you how big it needs to be and not start. Now the basic procedure to run this test is: -dropdb pgbench (if it's already there) -createdb pgbench -pgbench -i -s 10 pgbench (makes about a 160MB database) -pgbench -S -c <clients> -t 10000 pgbench The idea is that you'll have a large enough data set to not fit in L2 cache, but small enough that it all fits in PostgreSQL's dedicated memory (shared_buffers) so that it never has to ask the kernel to read a block. The "pgbench -i" initialization step will populate the server's memory and while that's all written to disk, it should stay in memory afterwards as well. That's why I use this as a general CPU/L2/memory test as viewed from a PostgreSQL context, and as you can see from my results with this problem it's pretty sensitive to whether your setup is optimal or not. To make this easier to run against a range of client loads, I've attached a script (selecttest.sh) that does the last two steps in the above. That's what I used to generate all the results I've attached. If you've got the database setup such that you can run the psql client and pgbench is in your path, you should just be able to run that script and have it give you a set of results in a couple of minutes. You can adjust which client loads and how many times it runs each by editing the script. Addendum: how pgbench works ---------------------------- pgbench works off "command scripts", which are a series of SQL commands with some extra benchmarking features implemented as a really simple programming language. For example, the SELECT-only test run above, what you get when passing -S to pgbench, is implemented like this: \\set naccounts 100000 * :scale \\setrandom aid 1 :naccounts SELECT abalance FROM accounts WHERE aid = :aid; Here :scale is detected automatically by doing a count of a table in the database. The pgbench client runs as a single process. When pgbench starts, it iterates over each client, parsing the script until it hits a line that needs to be sent to the server. At that point, it issues that command as an asynchronous request, then returns to the main loop. Once every client is primed with a command, it enters a loop where it just waits for responses from them. The main loop has all the open client connections in a fd_set. Each time a select() on that set says there's been a response to at least one of the clients from the server, it sweeps through all the clients and feeds the next script line to any that are ready for one. This proceeds until the target transaction count is reached. This design is recognized as being only useful for smallish client loads. The results start dropping off very hard even on a fast machine with >100 simulated clients as the single pgbench process struggles to respond to everyone who is ready on each pass through all the clients who got responses. This makes pgbench particularly unsuitable for testing on systems with a large number of CPUs. I find pgbench just can't keep up with the useful number of clients possible somewhere between 8 and 16 cores. I'm hoping the PostgreSQL community can rewrite it in a more efficient way before the next release comes out now that such hardware is starting to show up more running this database. If that's the only way to resolve the issue outlined in this message, that's not intolerable, but a kernel fix would obviously be better. I wanted to submit this here regardless because I'd really like for current versions to not have a big regression just because they were using a newer kernel, and it provides an interesting scheduler test case to add to the mix. The fact that earlier Linux kernels and alternate ones like Solaris give pretty consistant results here says this programming approach isn't impossible for a kernel to support well, I just don't think this specific type of load has been considered in the test cases for the new scheduler yet. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
| Linus Torvalds | Re: LSM conversion to static interface |
| Ingo Molnar | [patch 03/13] syslets: generic kernel bits |
| Ingo Molnar | Re: [PATCH 6/6] sched: disabled rt-bandwidth by default |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| David Miller | [GIT]: Networking |
| Gregory Haskins | [RFC PATCH 00/17] virtual-bus |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
