> [ I'll post per function analysis as i complete them, as a reply to
[ i'll do a separate mail for every function analyzed, the discussion
spreads better that way. ]
This is the Well-known pattern of user-copy overhead, which centers
around this single REP MOVS instruction:
nr-of-hits
.........
ffffffff80341eea: 42 83 e2 07 and $0x7,%edx
ffffffff80341eed: 677398 f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff80341ef0: 3642 89 d1 mov %edx,%ecx
ffffffff80341ef2: 16260 f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff80341ef4: 6554 31 c0 xor %eax,%eax
ffffffff80341ef6: 1958 c3 retq
ffffffff80341ef7: 0 90 nop
ffffffff80341ef8: 0 90 nop
That's to be expected - tbench shuffles 3.5 GB of effective data
to/from sockets. That's 7.5 GB due to double-copy. So for every 64
bytes of data transferred we spend 1.4 CPU cycles in this specific
function - that is OK-ish.
Ingo
--