On Monday 29 October 2007, David Brownell wrote:I was asked just what that overhead *is* ... and it surprised me. A summary of the results is appended to this note. Fortuntely it turns out those problems all go away if the gpiolib code uses a *raw* spinlock to guard its table lookups. With a raw spinlock, any performance impact of gpiolib seems to be well under a microsecond in this bitbang context (and not objectionable). Preempt became free; enabling debug options had only a minor cost. That's as it should be, since the only substantive changes were to grab and release a lock, do one table lookup a bit differently, and add one indirection function call ... changes which should not have any visible performance impact on per-bit codepaths, and one might expect to cost on the order of one dozen instructions. So the next version of this code will include a few minor bugfixes, and will also use a raw spinlock to protect that table. A raw lock seems appropriate there in any case, since non-sleeping GPIOs should be accessible from hardirq contexts even on RT kernels. If anyone has any strong arguments against using a raw spinlock to protect that table, it'd be nice to know them sooner rather than later. - Dave SUMMARY: Using the i2c-gpio driver on a preempt kernel with all the usual kernel debug options enabled, the per-bit times (*) went up in a bad way: from about 6.4 usec/bit (original GPIO code on this board) up to about 11.2 usec/bit (just switching to gpiolib), which is well into "objectionable overhead" territory for bit access. Just enabling preempt shot the time up to 7.4 usec/bit ... which is also objectionable (it's all-the-time overhead that is clearly needless), but much less so. Converting the table lock to be a raw spinlock essentially removed all non-debug overheads. It took enabling all those debug options plus internal gpiolib debugging overhead to get those times up to the 7.4 usec/bit that previously applied even with just preempt. (*) Those times being eyeballed medians; I didn't make time to find a way to export a few thousand measurements from the tool and do the math. The typical range was +/- one usec. The numbers include udelay() calls, so the relevant point is the time *delta* attributable only to increased gpiolib costs, not the base time (with udelays). The delta probably reflects on the order of four GPIO calls: set two different bits, clear one of them, and read it to make sure it cleared. -
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| James Bottomley | Re: Announce: Linux-next (Or Andrew's dream :-)) |
| David Miller | [GIT]: Networking |
| Antonio Almeida | HTB accuracy for high speed |
| Ingo Molnar | iwlwifi: fix build bug in "iwlwifi: fix LED stall" |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Avi Kivity | Re: [RFC PATCH 14/17] kvm: add a reset capability |
git: | |
