Re: [RFC] globmatch() helper function

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: George Spelvin
Date: Thursday, December 18, 2008 - 1:55 am

Eureka!

Never mind all of this angst; I've figured out a non-recursive way
to do it.  Thanks for pushing me to think a bit harder and find it.
(I was actually looking for an example of the exponential pathology I
kept referring to when it dawned on me that it's actually impossible
without the additional expressive power of regexps.)

Somewhat abusing TeX's line-breaking terminology, consider a shell glob to
consist of a series of "boxes" and "glue"  A box is a run of character
classes (of which literal characters and ? are degenerate cases).  The
point is, a box has a well-defined width, the number of characters in the
string that it must match.

A *, on the other hand, is infinitely elastic "glue" between boxes that
matches an unknown number of characters.

So consider a pattern of the form box1*box2*box3*...
Assuming box1 has matched, we search forward, trying various widths of
the glue *, to find a point where box2 matches.  Then we start searching
for the width of the second *.  But!  If we can't match the tail of the
pattern box3*... at any position to the right of the end of box2, there
is no point trying to move box2 forward and search again.  Any solution
where box2*box3*... would match starting k characters later in the string
would also be a match with box2 (only) shifted k characters left, i.e.
in its original position.

So we can match boxes greedily: once we've found the leftmost place
where one matches (that is still to the right of all previous boxes),
we never need to try any other positions.  Moving a box to the right
can never produce a match which doesn't exist in its previous position.

I'll rewrite it to be non-recusrive and quadratic in the worst case.


Thanks; I feel stupid for not maving looked this up, but smart for having
thought of it myself.

(For a discussion of how to get exponential backtracking in a regular
expression, see http://swtch.com/~rsc/regexp/regexp1.html)
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] globmatch() helper function, George Spelvin, (Wed Dec 17, 3:42 am)
Re: [RFC] globmatch() helper function, Andi Kleen, (Wed Dec 17, 6:28 am)
Re: [RFC] globmatch() helper function, Peter Zijlstra, (Wed Dec 17, 8:15 am)
Re: [RFC] globmatch() helper function, George Spelvin, (Wed Dec 17, 8:37 am)
Re: [RFC] globmatch() helper function, Steven Rostedt, (Wed Dec 17, 8:47 am)
Re: [RFC] globmatch() helper function, George Spelvin, (Wed Dec 17, 9:04 am)
Re: [RFC] globmatch() helper function, Steven Rostedt, (Wed Dec 17, 9:13 am)
Re: [RFC] globmatch() helper function, Andi Kleen, (Wed Dec 17, 9:15 am)
Re: [RFC] globmatch() helper function, Tejun Heo, (Wed Dec 17, 9:22 am)
Re: [RFC] globmatch() helper function, Steven Rostedt, (Wed Dec 17, 9:31 am)
Re: [RFC] globmatch() helper function, Tejun Heo, (Wed Dec 17, 9:33 am)
Re: [RFC] globmatch() helper function, Peter Zijlstra, (Wed Dec 17, 9:36 am)
Re: [RFC] globmatch() helper function, Steven Rostedt, (Wed Dec 17, 9:37 am)
Re: [RFC] globmatch() helper function, Tejun Heo, (Wed Dec 17, 9:45 am)
Re: [RFC] globmatch() helper function, Andi Kleen, (Wed Dec 17, 9:51 am)
Re: [RFC] globmatch() helper function, Steven Rostedt, (Wed Dec 17, 9:54 am)
Re: [RFC] globmatch() helper function, George Spelvin, (Thu Dec 18, 1:00 am)
Re: [RFC] globmatch() helper function, George Spelvin, (Thu Dec 18, 1:55 am)
Re: [RFC] globmatch() helper function, Casey Dahlin, (Thu Dec 18, 12:53 pm)
Re: [RFC] globmatch() helper function, George Spelvin, (Thu Dec 18, 2:53 pm)