I have loaded router (~650 Mbps In+Out), based on 2xAMD Opteron 248, Sun Fire
X4100. HPET timer available (TSC seems not available on this platform).
Network interfaces is onboard, connected over PCI-X.
Right now i am using only one processor, cause using only one interface and
interrupts stick to it. Other is almost not used.
At peak time i notice in mpstat, that this processor is almost "dead", and if
i run minor application consuming resources - ping over this router will be
terrible. For me it is clear - system overloaded. I did oprofile, and here is
result (at low load time, but at peak time it is very similar).
CPU: AMD64 processors, speed 2193.74 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
samples| %|
------------------
2679376 71.9851 vmlinux
287212 7.7163 e1000
278674 7.4870 ip_tables
259923 6.9832 nf_conntrack
29699 0.7979 iptable_nat
26752 0.7187 nf_nat
26093 0.7010 nf_conntrack_ipv4
16525 0.4440 iptable_mangle
14988 0.4027 oprofiled
CPU: AMD64 processors, speed 2193.74 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 100000
samples % symbol name
1031727 37.1736 getnstimeofday
230457 8.3035 __napi_schedule
122154 4.4013 __do_softirq
110036 3.9647 dev_queue_xmit
88800 3.1995 net_rx_action
71163 2.5640 ip_route_input
52232 1.8819 local_bh_enable
43804 1.5783 get_next_timer_interrupt
43387 1.5633 ip_forward
35501 1.2791 nf_iterate
35212 1.2687 __slab_alloc
34652 1.2485 default_idle
32375 1.1665 kfree
28127 1.0134 kmem_cache_alloc
What is bothering me, why getnstimeofday called so much? Even i remove HTB
shaper, it still takes 30-40% of whole vmlinux time. From other
applications - only zebra is running.
Any ideas?
--
On Friday 22 August 2008, Denys Fedoryshchenko wrote: Most significant event types where i notice getnstimeofday at top of list. Additions: Counted MEMORY_REQUESTS events (Memory requests by type) with a unit mask of 0x01 (Requests to non-cacheable (UC) memory) count 5000 samples % samples % symbol name 129 31.0843 596 31.1879 getnstimeofday 54 13.0120 251 13.1345 __napi_schedule 36 8.6747 178 9.3145 default_idle 34 8.1928 164 8.5819 irq_entries_start 23 5.5422 143 7.4830 __do_softirq and CPU: AMD64 processors, speed 2193.74 MHz (estimated) Counted INTERRUPTS_MASKED_CYCLES events (Cycles with interrupts masked (IF=0)) with a unit mask of 0x00 (No unit mask) count 5000 samples % symbol name 630015 62.4741 getnstimeofday 28634 2.8394 get_next_timer_interrupt 23279 2.3084 __slab_alloc 15775 1.5643 schedule 14765 1.4641 __slab_free 11154 1.1061 native_read_tsc 10953 1.0861 kmem_cache_alloc 10918 1.0827 tick_nohz_stop_sched_tick 10752 1.0662 update_wall_time 10430 1.0343 net_rx_action 10220 1.0134 __do_softirq 9895 0.9812 __update_sched_clock --
This function is really used in many places, and these profiles are not enough at least to me, but it seems you could have a lot of softirqs (and probably hrtimers) scheduling, so maybe you should try if e.g. disabling hrtimers or changing kernel HZ makes any difference. Jarek P. --
One user is shapers, it is ok for me. I am not sure, but maybe another user is softlockup debug option... and if there is a lot of task switches maybe it will cause excessive load of timers slow? --
The question is if you really need so exact shaping at a cost of Maybe. Anyway, you could try if lower HZ (with longer jiffies) can help with processing more skbs without rescheduling. Jarek P. --
Thats maybe another reason to have your patch in mainline :-) I will try it today with this case, if it will help. Maybe it can be optional, and enabled via kernel parameter and /sys , so it can be useful in case of crashes when TSC used and when timer is too slow. Because it is not so useful just to disable hrtimers completely, if you need them for some other task... --
Maybe it could be enough to use current parameters like: "highres=off" according to Documentation/kernel-parameters.txt? Jarek P. --
Hmm.. it isn't actually answer to your question, sorry. As I said before I think we need to have more people interested in using such additional options, and btw. I understood from your message that disabling htb didn't solve the problem? Jarek P. --
Only HTB - no. If i disable softlockup debug - seems the load is less (i must make sure), and if i remove HTB - it is becoming low. I will try to give exact numbers in recent days. --
So maybe you could try again this htb patch for limiting qdisc_watchdog_schedule()? Jarek P. --
Yes, and i am going to take snapshops from system load with different boot flags. It will take time but, cause it is major router. --
Do you have any packet sockets in this system? Like running dhcp daemon? -- Evgeniy Polyakov --
Another way to see this problem can be to start a sniffer on the machine, even with a restrictive pcap filter, to check if performance change or not. (It should decrease) For example, I believe that running "ping" could have the same effect (increasing netstamp_needed variable : every incoming packet has to be timestamped) So beware of pings, traceroute and other networking tools... --
Or just check /proc/net/packet iirc. Anyway, having at least one packet socket ends up with timestamping of Yup, this innocent toys can end up with this such behaviour on modern highly loaded machines. -- Evgeniy Polyakov --
Yes, when i run tcpdump even without promisc at peak time, machine will be almost dead. Transit traffic will be 100ms+. I know that it is timestamping There is very short list of tasks. Attached.
Can you put debug print into net_enable_timestamp()/net_disable_timestamp() to determine if someone -- Evgeniy Polyakov --
It is busybox udhcpd... i guess it is innocent. Even i kill it - it doesn't change anything at all. Only who possible listen multicast socket - it is ripd, i cannot kill him. But i think it doesn't matter much too... --
It depends... If it turns timestamps on, then you will have this behaviour. Please check if timestamps are actually enabled, so we could remove one (im)possible case. -- Evgeniy Polyakov --
I and also other people had some patches to move the time stamp measuring into the socket. This way the time stamping didn't need to be enabled on all packets, only on those that actually end up at a socket that requires the time stamp. Unfortunately DaveM didn't like it because some bank wanted different semantics, see the discussion in http://thread.gmane.org/gmane.linux.network/91679 Perhaps you can find out which bank it was and send them a bill for your CPU time ;-) -Andi -- ak@linux.intel.com --
Those banks really want to crank down on latency - to the point they start disabling interrupt coalescing. I bet they'd toss anything out they could to shave another microsecond. rick jones --
This change would actually likely lower their latency. -Andi --
I'm guessing you mean increase their latency? I agree, it could - depends entirely on the PPS in production I suspect. rick jones ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt I should probably refresh/update that one of these days --
No, moving the time stamps into the socket decreases latency for all packets that don't need time stamps. And they likely have some packets which don't need time stamps too. As a secondary effect if they use a RT kernel it might be also beneficial to do the (depending on the platform) costly time stamp in the lower priority socket context than in the high priority interrupt thread. -Andi --
Ah, since that part of the discussion wasn't in the quoted text I assumed you were talking about the disabling of interrupt coalescing. --
Doing the expensive timestamping in a possibly delayed thread (ie some milliseconds after hardware notification) is wrong/useless. Better use plain xtime instead of getnstimeofday() in this case. We could provide a sysctl setting so that admin can chose between precise timestamps (current behavior) or fast but low resolution timestamping (xtime based) --
We had this discussion earlier, please review the thread I linked to. Note that interrupts can be arbitarily delayed too (both by cli and by interrupt mitigation), even on a non RT kernel. If you want exact notification (packet arriving at your NIC's buffers) you need NIC hardware support (and more and more NICs have it[1]). If you do it in software then even the interrupt is at the end of a long queue with a pretty much arbitary delay. Doing it in socket context is just one queue more. It's pretty much all arbitary. The argument for doing it as late as possible is the prohibitive cost on some systems as people notice all the time. -Andi [1] Unfortunately not necessarily synchronized with system time. --
From: Andi Kleen <andi@firstfloor.org> This is a much different kind of delay compared to sleeping for seconds or longer on the socket lock while a GFP_KERNEL allocation is being satisfied by swapping tons of crap out to disk. Your socket solution is not a workable scheme. --
When this happens then new incoming packets will be lost anyways because there will be no new packets fed back into the RX ring because their allocation will either stall or fail too. I don't think time stamps of dropped packets are very useful ;-) -Andi -- ak@linux.intel.com --
From: Andi Kleen <andi@firstfloor.org> They want the timestamps, but they want it to match when the packet arrived at their system as closely as is reasonably possible. Socket based solutions don't do that, because we can be sleeping on GFP_KERNEL memory or similar with the socket locked, and thus not be able to set the timestamp until the task wakes up and processes the backlog. --
Then they should use hardware time stamps which are increasingly available (e.g. current Intel e1000 design has them and I expect others too). -Andi --
Would it make sense to make a new option for these socket timestamps and encourage some apps move over to it? --
From: Nick Piggin <nickpiggin@yahoo.com.au> We don't have support to using these specific hardware provided timestamps sources yet, so it's kind of premature to recommend the facility to applications. :) --
Dang, that was a really badly quoted. I was reading the thread and got to the end and just fired off my reply from there... Sorry -- what I meant to ask was, would it make sense to have a new option to enable time stamp measuring in the socket receive layer as in the patchset that Andi referenced, but without removing existing support for early timestamping? --
On Wed, 27 Aug 2008 14:54:12 +0200
Look at /proc/net/ptype to see if any AF_PACKET sockets are open.
There are several causes of this:
* Applications like DHCP use AF_PACKET when they could use something else
* AF_PACKET API was poorly designed and always has timestamps
* The choice was made to get more accurate timestamps by stamping early in
receive code. A better alternative would be to do it in protocol handler
after the socket filter. Sorry, Andi socket layer is too late.
* No driver is using hardware mechanisms to get accurate/free timestamps.
I was working on sky2, but never was stable/complete.
Easist advice now is to fix userspace.
--
And what is working advice? Why exactly admin can't chose between 2 alternatives here? Jarek P. --
From: Andi Kleen <andi@firstfloor.org> By the time you get to the socket, it might be eons (relatively speaking) later, decreasing the usefulness of the timestamp. As just an odd example if the TCP socket is user locked at the moment, because the user is blocked on a GFP_KERNEL allocation, it could be a very long time before we actually process the packet and timestamp it. UDP now does similar socket locking so could potentially hit the same kind of problem. That was my argument against such a change. I find it amusing that nobody it talking about fixing the tools that are creating the timestamp requests when they have no real reason for having them in the first place. --
It's a *socket* option. It's named SO_TIMESTAMP. Users of it ought to *expect* that it records the time the packet hits the socket, not the time the frame hits the device. If banks want to know when frames are hitting their devices, that's fine, but setsockopt() is the wrong layer for controlling that sort of I don't agree that the tools are broken. Some of them may have frivolous reasons for wanting timestamps, but they're asking for something at the socket layer, with the scope of a single socket, and it's hardly their fault that we respond to that by doing something expensive and global at a much lower level. --
From: Jason Uhlenkott <juhlenko@akamai.com> When expectations equal reality, and then we change reality, that's called breaking things. What might (and I do mean "might") save us is how other systems implement this. A quick check of BSD shows that at least OpenBSD fetches the timestamp inside of the RAW and UDP usrreq handler, which is basically socket receive. Our man pages simply say "reception" as when the timestamp is from, which may also give us some more leeway. From: Jason Uhlenkott <juhlenko@akamai.com> Every application using AF_PACKET sockets gets timestamps by default. And we do know of several specific cases where the timestamps are unnecessary. Even for other cases, why in the world does a DHCP client need accurate timestamps? Give me a break. :) --
I've worked with systems where SO_TIMESTAMP has been used for H.323 videoconferencing systems to synchronize audio and video where remote systems' timestamps on the protocol streams proved to be inaccurate (based off of different, unsynchronized clocks). I can't see any other realistic use of this, but trying to get timestamps for quasi-realtime protocols may be an important use case - and in that case, you want the time when it hits the interface, NOT when it hits the socket. What utility does the time of hitting the socket get you? --
But didn't you really want a "end2end" time stamp in this case, as in really at the end of all kernel/hardware queues on your side. A packet roughly travels this way on a normal NIC before it hits recvmsg() wire -> NIC on die buffers -> NIC RX ring -> interrupt handler -> NAPI or per CPU queue -> softirq socket lookup -> socket queue -> recvmsg These all do their own queuing and all queues can add delays depending on the load. Right now SO_TIMESTAMP is in the interrupt handler, but it's just an arbitary position in a multitude of queues. For video conferencing (or e.g. in general if you implement a retransmit timeout in user space) scheduling delays on the local box surely need to be taken into account too because they all add to the final timing of the packets on the wire. The queues inside the system are really part of the network too. In Linux for example the algorithms who size the TCP buffer space know that and especially take account for it I think it's the other way round. Why would the real time protocol care when it hits some arbitary queue in the network stack instead of the time when the application can really SO_TIMESTAMP was originally invented for passive network monitoring as in tcpdump (for which PACKET sockets were designed originally, DHCP is really just abusing them imho). There it makes some sense to do the time stamp as near on the wire as possible but really a hardware time stamp would be better because it is even nearer. But for anything that does end2end it's the wrong semantics anyways because ignoring local queueing delays would be just a bug, and SO_TIMESTAMP ignores them currently. -Andi -- ak@linux.intel.com --
On Thursday 28 August 2008, Andi Kleen wrote: I hit one more bug, while deleting root class for htb on ifb0 i got tc stuck (and all operations related to tc), but there was some fixes for this things in net-2.6, so i tried to update git tree. It seems i cannot test current net-2.6, because it is broken for me on USB part (fixed by workaround in init scripts), HPET totally broken in net-2.6, but works for latest main git from torvalds tree. I have to wait when net-2.6 rebased to current torvalds tree, then i will try to test. --
You could always pull net-2.6 to Linus' tree by yourself. ...And about the workflow, net-2.6 isn't rebased, instead Linus just pulls it in to his tree. -- i. --
From: Denys Fedoryshchenko <denys@visp.net.lb> Make a clone of Linus's tree, then pull in the net-2.6 tree. This is always how you should test things especially if you want to make sure you have whatever non-networking bug fixes your machine might require. --
My small IMHO regarding SO_TIMESTAMP. 1)Right now i have 400-500 Mbps passing router. If i will run 5 "pings" ,simultaneous ,under _USER_ privileges(i know ping is suid), instead of free 20% CPU time, i will have 1-2% free CPU time. Sure i know ping is suid program, but it is has been "like this" since long time. By security psychos it will be caled DoS. 2)Usefullness of this option. What is a difference if on almost idle machine timestamp retrieved on higher level or lower level? And why we need on highly loaded server so high precision timestamp (with expensive timer), if in my case enabling any socket with SO_TIMESTAMP creating delays more than 10ms(up to 100ms)? 3)Who is most users of SO_TIMESTAMP? iputils which is installed on almost _ANY_ linux machine? busybox which is using same option? Many others userspace multiplatform applications? Or banks? I dont take much in account dhcpd, who is maybe abusing this option. So there is few good solutions available (IMHO): 1)Introduce some SO_REALTIMESTAMP (anyway even SO_TIMESTAMP not defined in any standard) for banks and ntp folks, who need them. And even give them timespec instead timeval, so they will be even more happy with resolution. 2)Provide sysctl,kernel boot, or even "build time" option for "banks" to have high resolution(and expensive) SO_TIMESTAMP. --
The skb timestamp overhead does not add up, it's either on or off. If multiple pings make the router slower it must be something else. -Andi -- ak@linux.intel.com --
So... if using ping on your machine has direct an noticeable effect on cpu load, problem is elsewhere
(if no ping is running, you dont have skb timestamping, but still getnstimeofday() is the top function in oprofile)
1) Do you have any netfilter rule using xt_time ?
(This module also calls __net_timestamp(skb))
2) You maybe have a bad program that do something expensive relative to kernel time services.
bad_program()
{
while (1) {
struct timeval t0,t1;
gettimeofday(&tv0, NULL); // or whatever function that calls getnstimeofday()
do_small_work();
gettimeofday(&tv1, NULL); // or whatever function that calls getnstimeofday()
add_stat_event(&tv1, &tv0);
Your setup is probably not common.
You want a PersonnalComputer class machine acts as a SuperCiscoDevice(TM),
while most PC machines dont use more than 10% of CPU power in average...
Many existing programs depend on current SO_TIMESTAMP.
kernel already provides nanosecond resolution :)
Check SO_TIMESTAMPNS and SCM_TIMESTAMPNS
--
No, process list is very short, it is custom semi-embedded linux distro i made, so i know each process running there. Here is process list (kernel processes/threads and running shell(busybox ash) removed) 1 root /bin/sh /init 1119 root init 2451 root /sbin/syslogd -R 80.83.17.2 2453 root /sbin/klogd 3168 squid /usr/sbin/zebra -d 3175 squid /usr/sbin/ripd -d 3195 root /usr/sbin/snmpd -c /config/snmpd.conf 3208 root udhcpd /config/udhcp.office.conf -S 3550 root /usr/sbin/sshd -b /etc/banner 3566 root /sbin/getty 38400 tty1 3567 root /sbin/getty 38400 tty2 3570 root /sbin/getty 38400 tty3 I dont think i am alone, and almost sure there is many guys trying to run linux as high-performance router. But most of them dont know about netdev@ :-) Well, thats called "Increasing resources use efficiency and system productivity". It is never a shame to utilize resources more efficiently. Plus i am not using PC class machine. For example this one with HPET, is Sun Fire X4100, which costs us that time a lot of bucks, and mostly because it is reliable hardware (very good IPMI/remote kvm/... onboard, good cooling, 4 e1000, dual power supply). I can use also PC class, but i will face some issues, like building proper cooling system and maybe even it will not work well, cause some chips not designed for "heavy duty", and on load they will not be able to dissipate heat inside the chip and will be broked soon. But sometimes it is even worth to try. And most important, many routers is already "soft"-routers. What is Cisco 7206+NPE G1/G2? It is MIPS CPU with relatively large L2 cache. There is seems no ASIC for routing offloading. Means Linux can do same or better job. And means Vyatta can beat Cisco on this market, and be far away forward from Cisco soon. As result more jobs for opensource guys. Linux must enter "heavy I think it wouldn't break. But sure we must be very careful and on my side i can test all ...
Nope... the contrary :) Kernel timestamping has nanosec resolution. SO_TIMESTAMP needs a divide (by 1000), while SO_TIMESTAMPNS is native. --
I did already. Even because most of programs (except ripd/zebra) can be killed, and i kill them, it doesn't change almost anything. it seems heavy things causing instability: 1)HTB (resolution can be lowered to improve performance, i will try Jarek patch soon) 2)ocassionally ping/tcpdump other SO_TIMESTAMP users 3)Probably softlockup detection. Disabled already, i will come back to it soon, if it is required. One of other issues i notice - "CACHE MISS" cause maybe almost 5-10% in oprofile in u32, but i am not sure it is interesting subject to discuss. I have to optimize all my iproute2 rules first. --
On Thu, 28 Aug 2008 22:55:29 +0300 If you are doing HTB it also calls clock to get timing information. Each packet dequeue in htb calls psched_get_time() and that becomes another call nano-second real time clock. If your embedded processor has really expensive clock, you probably just want to provide an alternative cheaper time source with less resolution. --
From: Denys Fedoryshchenko <denys@visp.net.lb> The performance hit hurts, but changing the default to lower resolution after it having been high resolution for 10+ years is a regression and something we really can't do. --
Agree. Then maybe to add way to choose, because choice is high resolution vs performance. For example Intel dynamically throttling interrupts on e1000*, and it saves me in this case. They leave also option for users who wants low latency/high troughput. So maybe there must be a way for specific functions who uses get(ns)timeofday to use specific timers (cheap and less precise), by option. Or to limit amount of calls to timer by them. --
No. That adds variance, and packets aren't comparable because they may suffer different kernel/hardware delays. The goal is to approximate original sendtime when the application-level timestamps are unreliable. The more queueing delays that can be For retransmit timeouts, that might be interesting, and might be one case where it is interesting. But then what value does SO_TIMESTAMP have, since you could call gettimeofday() immediately after receipt, and also include application scheduling delays? For videoconferencing, one wants to know when to display a packet Because one would want to ignore even network scheduling delays Why would you want to do end-to-end with SO_TIMESTAMP, vs. gettimeofday after recv? --
And there are no "different kernel/hardware delays" in the network? If your RTT measurement method cannot handle some variance (using standard sampling and data smoothing techniques similar to TCP) then it just needs to be fixed. Besides measuring in the interrupt handler doesn't protect you against local variances anyways because the interrupt timing has variability (e.g due to irq off regions or due to interrupt mitigation or The local delays add to the user experience too. It's unclear why you want to ignore those. -Andi -- ak@linux.intel.com --
Joe Malicki
Software Engineer
MetaCarta, Inc.
Noone's measuring RTT... what ever made you think that?
I should explain the application of SO_TIMESTAMP better.
Video camera -> Video jack -> Digitization -> Compression ->
Packetization -> NIC -> Ethernet -> NIC -> Interrupt Handler -> Queue -> Application
Microphone -> MIC jack -> Digitization -> Compression ->
Packetization -> NIC -> Ethernet -> NIC -> Interrupt Handler -> Queue -> Application
One wants to know the original time sound and light waves hit the camera
and microphone, because one wants to know when they should hit the soundcard and
video on the other end (i.e. any delays should be synchronized) but one only has control
over the receiving system. There are timestamps at the application level for this...
unfortunately, many implementations in the real world have independent clocks that skew
relative to each other, with little correction on the sending system.
Yeah, that's broken, but one has to be liberal in what one accepts from popular products.
One way to mitigate the skew between the clocks is to take measurements on the receiving
host, which you do control, and compare the average skew between the two streams and
correct for it. Interrupt handler time has variance, but it's less than application-level
You don't want to ignore them, you want to compensate for them
by getting an earlier timestamp.
--
Just a note from that one who really developed real-time audio and video processing engines: _no_one_ really relies to the timestamps attached to the received packet. By no one I really mean NO ONE. It is ust wrong, broken and stupid. There are so many queues in the data path, that it just can not be reliable by definition. Instead sending path incapsulates packet sequence number into appropriate packet header (like, and the most cases the only, RTP header), and receiving path just multiplies this sequence number by the compression rate and size of the packet. This numbers differ from design to design, but overall approach is the same: no one really depends on the hardware timestamp attached on the receiver, only sender's data is reliable. If someone depends on it, it is broken and just waits for the appropriate attack vector to inect broken data into the dataflow (such users do not use tcp, since it "introduces unneded delays" or similar marketing and compeltely untested things). So this overall discussion of the timestamp option is meaningless: we just bloody can not change it as is, since so many applications really depend on it (even if they should not). We can force lower resolution in terms of xtime or similar counter, which will be default timestamp in case of some syscall (turned off by default), but since so far no one sent a patch, this looks very subtle. -- Evgeniy Polyakov --
The earliest time the application could have been expected to start processing the request. Until it hits the socket, it might as well be somewhere in the cloud. By that reasoning of course, one could argue that a gettimeofday() call immediately following recv() would suffice. Earlier in the thread mention was made of financial services types. If someone has knowledge of the (probably) arcane rules under which they must operate it would be great to hear more. Does some entity like the SEC (Securities and Exchange Commission in the United States) mandate some sort of timestamp for when the trading request "arrives at the trading system" and do they define that "arriving at the trading system" means? rick jones --
From: Rick Jones <rick.jones2@hp.com> The issue is the ordering of processing the requests. So if request A arrived on interface 1 before request B arrived on interface 2, the trade described in A should be performed before the one in B. This is not "arcance" as you seem to suppose it might be, but rather pretty clear fair handling or requests sent between trading desks. --
Has the request "hit the trading system" when it hits the NIC, or when it hits the application executing the trade? If the SEC calls for when it hits the NIC, then none of what is done today is really accurate/correct and one would need to start using NIC HW timestamps, synchronized with the host and the other NICs in the system no? The way things are today, there really isn't much guarantee that hitting NIC 1 before NIC 2 will result in a driver-generated timestamp for the NIC 1 packet which is before the driver-generated timestamp for the NIC 2 packet. It will be luck of the interrupt coalescing interaction with other traffic on the NIC and/or polling out of NAPI right? rick jones --
From: Rick Jones <rick.jones2@hp.com> The SEC isn't mandating anything here, stop framing it that way :-) People simply won't trade with a firm if they find out that trades there are executed out of order. They are simply trying to make things as fair as possible. --
Must be my DC upbringing. I figured that if the logic wasn't 100% But that is the very crux of the question - exactly where is "in order" to be determined? Is it supposed to be arrival time at the NIC HW, initial notice by the driver, or initial notice by the trading application? Given that there are no guarantees that a packet arriving on NIC 1 and timestamped either by the NIC HW or the driver will actually hit the application before a packet arriving on NIC2, just how long are these financial services applications going to wait around before executing the trade carried in the packet arriving on NIC1? rick jones --
From: Rick Jones <rick.jones2@hp.com> I have no idea. They also care about trade processing latency btw. --
As a totally pragmatic point - if the market is in such free-fall that it matters that your order got in a 10 thousandth of a second after somebody else's, instead of before, you probably lost at least 5 to 10 times as much during the time it took somebody to type the damn order in and hit enter. At that point, you have *bigger* things to worry about.
From: Valdis.Kletnieks@vt.edu Many trades are made programaticcally using formulas and computer algorithms in response to market activity and other trades of the same security. There is no typing involved :) I don't think anyone in this thread can even pretend to understand how any of this stuff works, that's why I'm starting to consider this thread completely pointless. If the financial folks say they need this stuff, then unless we're prepared to become experts in financial markets and how the IT stuff for them are designed and run, we might as well just trust them on this one. --
Still the same issue - the time delay in getting the ticker tape values back, making the decision, and launching the transaction are *way* bigger than the packet queueing order. Think - trading a few billion shares a day, if that ticker is even 5 seconds behind, *that* is a much bigger issue than The toughest part of systems analysis is getting the user to shut up about what they say they need long enough for you to find out what it is they are actually trying to do. Quite frankly, unless somebody *IS* planning to become an expert on how the IT stuff for them are designed and run, we *should not* be doing any code changes that we don't understand, just because they say so and we should trust them...
From: Valdis.Kletnieks@vt.edu Exactly! We should not change where and how packet timestamps are taken! Thanks! It is what I have been advocating this whole thread :-) --
So so last week I started asking FSI contacts I've built-up while answering their "how to improve latency" questions. The intent is to get some direct input from them on just what their and their regulator's expecations are wrt timestamping of packets and/or trades. rick jones --
On Fri, 22 Aug 2008 04:57:40 +0300 What kernel version is this? There was a fix to AF_PACKET about a year ago to reduce this. --
git net-2.6 based on 2.6.27-rc3. Means very fresh. --
