Back in March I posted some MySQL benchmarks after we switched to a 1:1
threading model in -current *. I've spent a lot of time tuning the pthread
library so I thought I'd post a followup. The original benchmark that I used
(supersmack) now performs much better on -current that it did a few months
ago, so I picked something else this time: MySQL sysbench.
Most of the sysbench runs that I've seen to date have sysbench running on
the same machine as the database. That's a good test but with the exception
of small installations and out-of-band activity, production setups rarely
look like that. So I ran sysbench itself on a seperate dual core system.
Here are the results, comparing NetBSD 3 with NetBSD-current:
http://www.netbsd.org/~ad/sysbench/netbsd.png
And NetBSD-current compared to other systems:
http://www.netbsd.org/~ad/sysbench/netbsd-and-others.png
Note this is stock NetBSD-current with FreeBSD's malloc() (jemalloc) in
libc. I'll be merging that some time soon.
With the vmlocking CVS branch and Mindaugas' new scheduler NetBSD peaks
around 500 TPS. There is a very gradual fall off in the number of TPS
achieved as the number of connections begins to ramp up. I suspect that
could be due to a weakness somewhere in the network stack, so I'm hopeful
that a bit of time spent profiling with large numbers of connections could
yield good results.
Thanks,
Andrew
* http://mail-index.netbsd.org/tech-kern/2007/03/02/0005.html
Can you talk more about the malloc replacement? Also- an interesting thing about benchmarks in the past was the long-running stability of netbsd. Did you see anything like that?
There's a good bit of information at the URL below and the imlementation is in FreeBSD's CVS. The main advantage to jemalloc is that it works well with large numbers of threads. http://people.freebsd.org/~jasone/jemalloc/ Joerg has suggested what we try a few other BSD licensed allocators and see Well, what do you mean by stability? :-). The majority of the kinks have been ironed out of the scheduler and thread library now, so the results on NetBSD are constant given the same test setup and conditions. The one issue that exists is that we are dropping a few TPS for every connection that's added. NetBSD holds up like that until 900 simultaneous client threads. At around 900 threads, some quite odd (and as yet unknown) behaviour is tickled and the rate collapses to about 100tps. Thanks, Andrew
Something interesting's happening in the Linux line on the graph right at the right edge of the plotted region (20 threads). Could you perhaps run NetBSD-current against Linux again with the maximum number of threads ramping up to 40, to see what the two curves look like as we head in that direction? Either we degrade a lot more gracefully than Linux under load, or there's an artifact in the Linux graph. The current plot makes it impossible to tell which, though. Thor
I have also tried 10-100 and 100-1000 client connections. I don't have the numbers at hand, but Linux peaks around 550 tps somewhere around 100 client connections. The numbers I was getting from Linux were quite erratic and I had to throw out a few sets of results where the downward spikes were so bad In the long run Linux will beat NetBSD. That said it the behaviour I saw on this test cannot be called graceful! Thanks, Andrew
I think that this is because the Linux graph is so unpredictable - it is all over the place in the graphs collected by Andy, and which someone else agreed was the case under Linux - it has more spikes than my son's hair. Anyway, because it's so unpredictable, it can be used to prove that Linux performs better than any other operating system at any point in the graph (whilst at the same time handwaving away that it's performing worse). QED. Regards, Al the statistician
In message <Pine.NEB.4.64.0710012230280.900@S.culver.net>, Hi, There are at least two ways to take Andrew's quoted text. One way is that over time, Linux will do better than NetBSD (in the long haul, which is how I first read the quoted line). The other way is that at this point in time, if we look at datapoints beyond the right edge of Andrew's graphs from last week, Linux does better than NetBSD. From off-list discussion with Andrew, I am sure he means the second, not the first. Andrew's comments in another message, referring to a gradual drop-off with increasign number of connections and suggesting kernel profiling in that regime, to find the source of the gradual drop-off, also support thet second reading. Andrew can say more on this score if he chooses.
Right, it was badly worded. With the later peaks that Linux shows, and with NetBSD's gradual fall off, continuing on to (say) 500 threads will show Linux achieving a higher transaction rate. Andrew
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Regards - ----------------------------------------- Adam Hamsik jabber: haad@jabber.org icq: 249727910 Proud NetBSD user. We program to have fun. Even when we program for money, we want to have fun as well. ~ Yukihiro Matsumoto -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iD8DBQFG/VUXlIxPgX3Go0MRAkH6AKDXWaUGLR6whdxzqkPBb9vO4ERwXwCfbKVL HekCuq6oCF8THzJbwWDYO80= =00NO -----END PGP SIGNATURE-----
Which kernel config did you use for the FreeBSD results? In tests that have been run on p4 hardware, the FreeBSD system's graph looks more like NetBSD's than the one presented here. FreeBSD's kernel has a lot of debugging options that hurt performance on by default. Also, FreeBSD's malloc defaults to 'AJ' in head, which would result in reduced performance. Warner
I took the generic config, removed the debugging options (INVARIANTS, I can try turning off debugging in the allocator. What else would you like me to try? I would like to provide remote access to the two systems but unfortunatley my Internet link is unreliable and I'm not in a position to leave them on 24x7. Some details on the test. I grabbed my.cnf from Jeff Roberson's weblog: http://people.freebsd.org/~jeff/bsd.cnf Relevant bits of dmesg from the MySQL host: total memory = 2047 MB avail memory = 2008 MB cpu0: Intel Pentium III Xeon (686-class), 701.64 MHz, id 0x6a1 cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR> cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX> cpu0: features 383fbff<FXSR,SSE> cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way cpu0: L2 cache 1 MB 32B/line 8-way cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way fxp0 at pci1 dev 6 function 0: i82559 Ethernet, rev 8 fxp0: interrupting at ioapic0 pin 3 (irq 3) fxp0: Ethernet address 00:02:a5:45:a6:48 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto The disk subsystem doesn't matter since I was running the read-only test, and with 10000 rows everything fits in core. I compiled MySQL by hand on each system: ./configure --prefix=/local/mysql --with-pthread --with-innodb Everything but necessary processes were killed on the two systems, so they were running at most sshd, screen, sysbench and the minimum to be able to log in. I did a warm-up run and then started testing: for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do echo "=> ${i} THREADS" sysbench --test=oltp --db-driver=mysql --mysql-host=${HOST} \ --mysql-user=root --mysql-table-engine=innodb --num-threads=${i} \ --max-time=60 --max-requests=0 --oltp-read-only=on run | \ tee -a ...
You should rebuild malloc with MALLOC_PRODUCTION defined (edit
lib/libc/stdlib/malloc.c) as well as making sure that either
/etc/malloc.conf is removed or symlinked to 'aj'. This is pretty important.
Could you also provide a copy of your FreeBSD kernel configuration file
OK, the only difference to my config is that I have
innodb_log_file_size=900M
OK. The FreeBSD port also defines
--enable-thread-safe-client
--without-debug
--enable-assembler
(and some other options that don't look relevant). --with-pthread might
enable the first option but if not it could cause performance
anomalies (i.e. this is relevant for the client, of course). For
example I accidentally built postgresql without threaded client support
recently and spent a while trying to work out why sysbench suddenly ran
I use
sysbench --test=oltp --num-threads=$1 --mysql-user=root --max-time=120
--max-requests=0 --oltp-read-only=on --db-driver=mysql
--mysql-host=192.168.5.120 run
which seems to be equivalent (the default table engine is innodb in our
config).
Can you run 'vmstat -w 1' for e.g. 30 seconds on your FreeBSD system
when the test is running? I see total CPU usage at 100%, with system at
I tested on a quad 500 MHz p3 (i.e. 30% slower clock speed than your
system), via 100Mbps em0. Performance was already at the level of the
FreeBSD curve on your graph (about 320 tps across a range of loads), and
if I scale up by 700/500 then it's about the same as your NetBSD curve.
I suspect that this will actually underestimate performance a bit
because the CPU is an older generation than yours, so the difference is
not just clock speed. One thing that is kind of interesting is that
some of the locking optimizations that we have not yet committed don't
make a difference on this machine and workload, apparently they are only
important at 8 CPUs and above.
Anyway, this all suggests to me that something is going wrong on your ...When does this get turned on for normal FreeBSD builds? Just those that are "releases" (vs current)? Darren
In message: <47020C5F.3060703@netbsd.org> Yes. -HEAD has that turned off so that we maximum sanity testing during development cycles. After we branch, one of the things done on the branch before a release is to turn off all the performance degrading debugging/sanity code. Once off on a branch, it stays off for the life of the branch. Warner
It turns out that this was due to debugging in malloc(). As suggested I recompiled FreeBSD's libc without the debugging, and FreeBSD's performance is much better: as of right now, NetBSD and FreeBSD are fairly closely matched on my 4 way system. From two single runs with both NetBSD and FreeBSD using SCHED_4BSD: http://www.netbsd.org/~ad/sysbench/sysbench-4bsd.png Here with SCHED_ULE and with NetBSD using Mindaugas' experimental scheduler. Like ULE, it uses per-CPU run queues. Among other things that means threads tend to migrate less. http://www.netbsd.org/~ad/sysbench/sysbench-pcpu.png Thanks, Andrew
That certainly does look impressive. Good work. Do you have any indication of performance scaling vs number of processors? -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."
