Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

Previous thread: LSM Stacking by JanuGerman on Tuesday, March 13, 2007 - 12:44 am. (2 messages)

Next thread: [PATCH] Remove CHILD_MAX by Roland McGrath on Tuesday, March 13, 2007 - 1:42 am. (1 message)
From: Gene Heskett
Date: Tuesday, March 13, 2007 - 1:28 am

Greetings;
Someone suggested a fresh thread for this.

I now have my scripts more or less under control, and I can report that 
kernel-2.6.20.1 with no other patches does not exhibit the undesirable 
behaviour where tar thinks its all new, even when told to do a level 2 on 
a directory tree that hasn't been touched in months to update anything.

Next up, 2.6.20.2, plain and with the latest RDSL-0.30 patch.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
[..]

No I didnt.  Someone else wrote that.  Please keep attributions
straight.
	-- From linux-kernel
-

From: Gene Heskett
Date: Tuesday, March 13, 2007 - 11:36 am

And amanda/tar worked normally for 2.6.20.2 plain.

Next up, 2.6.21-rc1 if it will build here.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Politics:  A strife of interests masquerading as a contest of principles.
The conduct of public affairs for private advantage.
-- Ambrose Bierce
-

From: Gene Heskett
Date: Tuesday, March 13, 2007 - 8:31 pm

It built, it booted, and its busted big time.  First, with an amdump 
running in the background, the machine is so close to unusable that I 
considered rebooting, but I needed the data to show the problem.  I am 
losing the keyboard and mouse for a minute or more at a time but the 
keystrokes seem to be being registered so it eventually catches up.

Disk i/o seems to be the killer according to gkrellm.

But to give one an idea of the fits this is giving tar, I'll snip a line 
or 2 from an amstatus report here:
coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 138200 KB, 
must skip incremental dumps]

Huh?  138.2GB?  A 'du -h .' in that dir says 766megs.

coyote:/root                  1     4426m wait for dumping
du -h says 5.0GB so that's ballpark, but its also a level 1, so maybe 20 
megs is actually new since 15:57 this afternoon local.  kmails final 
maildir is in that dir.

This goes on for much of the amstatus report, very few of the reported 
sizes are close to sane.

Now, can someone suggest a patch I can revert that might fix this?  The 
total number of patches between 2.6.20 and 2.6.21-rc1 will have me 
building kernels to bisect this till the middle of June at this rate.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Is a tattoo real, like a curb or a battleship?  Or are we suffering in 
Safeway?
-

From: William Lee Irwin III
Date: Tuesday, March 13, 2007 - 10:07 pm

4 billion patches could be bisected in 34 boots. Between 2.6.20 and
2.6.21-rc1 there are only:

$ git rev-list --no-merges v2.6.20..v2.6.21-rc1  |wc -l
3118

patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots.

Of course, this is a little optimistic because it assumes no additional
breakage occurring at the various bisection points. In any event,
assuming (pessimistically) 10 minutes per build, this is 280 minutes or
4 hours and 40 minutes of build time. I estimate the process should
complete well before Friday of this week, never mind June.


-- wli
-

From: William Lee Irwin III
Date: Tuesday, March 13, 2007 - 10:44 pm

33 boots for 4 billion, 13 boots for 3118, ceil(log(n)/log(2))+1 boots
in general, 10 minutes/build gives 130 minutes or 2 hours, 10 minutes
for 13 boots. I have no plausible explanation for these errors, and
don't care to be told of any, either.


-- wli
-

From: Gene Heskett
Date: Tuesday, March 13, 2007 - 11:09 pm

Chuckle, sorry to disappoint you wli, on that 32 cpu Niagra Con was 
calling 'poor equipment', maybe.

Even using  ccache, its about 15-18 minutes per build, with another 10 to 
edit my build script and construct the kernel tree with the proper 
patches applied.  Then a reboot, probably 10 minutes by the time I get 
the nvidia driver installed for the new kernel and get startx'd, then its 
another 2 hours or a bit less for an amanda run to test it.

I've posted to the amanda lists too, so they will be aware of it.  And 
because an ls -lc returns perfectly sane values for the mtimes and sizes, 
I suspect the real problem may not necessarily be 100% kernel related.  I 
have been intermittently ranting because both the tar api stir, and the 
return from tar are such a moving target that the developers are having a 
hard time staying ahead of the changes to tar, backward compatibility it 
seems, is the furthest thing from the tar maintainers minds.  The most 
recent change that I'm aware of is that tar now returns a 1 for success! 
What the heck were those guys at gnu.org thinking?  Or smoking as the 
case may be.

I obviously have a copy of the -rc1 patch in its entirety that I could 
peruse, but I'm not sure I would recognize a change that would effect tar 
if it bit me, hence the questions here to those who are far more 
conversant than I.

As I've said on several occasions William, at my age, the best part I can 
play here is the Canary, in the coal mine scene, and something strange in 
the air of 2.6.21* just killed me.  It's now up to the coroner(s) to 
determine the cause, and he has several dozen very very able assistants 
monitoring this list in my NSH opinion.  Whatever, either tar, or this 
particular board in the kernels architecture surely needs fixed before 
2.6.21 final.  I don't even know if gnu.org has a bugzilla setup, but 
I'll look around tomorrow night as I'm tied up now till late tomorrow.  
If they do, I'll file it.

But, I'm also amazed that no one else ...
From: William Lee Irwin III
Date: Tuesday, March 13, 2007 - 11:19 pm

2 hours, 48 minutes times 13 boots (see the correction post) is 36
hours, 24 minutes. One attempt a day (24 hours instead of 2 hours, 48
minutes) yyields 2 weeks. So you're still done by April, not June.


-- wli
-

From: Gene Heskett
Date: Wednesday, March 14, 2007 - 7:54 am

Back to the original theme of this thread, 2.6.20.3 works, now I'm booted 
to 2.6.20.3-rdsl-0.30 and trying that.  And 10 minutes into it, its going 



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"Ahead warp factor 1"
- Captain Kirk
-

From: Ray Lee
Date: Wednesday, March 14, 2007 - 2:49 pm

In a previous email, you said you were using ext3. If that's the case,
there doesn't appear to be much going on in terms of patches between
2.6.20 and 2.6.21-rc1. The only one that even comes close to looking
like it might have an effect would only come in to play if you have a
filesystem that has ACL information, but is mounted by a kernel that
doesn't have ACL support.

I have to echo wli here, I'm afraid, and recommend at least a *few*
bisections to help narrow down the list of suspect patches.

There are tutorials out there for git users. I use the mercurial
repository, as I find the mercurial interface and workflow a lot more
intuitive, but it has the same capability.

Even 2-5 bisections will greatly help others hunt the bug down.

Ray
-

From: Gene Heskett
Date: Wednesday, March 14, 2007 - 8:12 pm

Probably.  But I've now put a week into this, and from some other clues 
I've collected, I'm beginning to think tar has a tummy ache. After all, 
and ls -lc reports totally sane mtimes.  So why is tar going bonkers 
under kernels 2.6.21-rc*, with or without Cons patches?

I've also spent a day now looking for a valid place to put a bugzilla 
entry against tar, but googles search results are sending me to 
gcc.gnu.org and this is NOT the correct bugzilla for a tar problem.

Its no secret that with all the churn in tar over the last 5 years, worse 
churn than the kernel IMO in going from 2.0 to 2.6, that I'm not a fan of 
yet another _new_ version of tar, when what we just need is _one_ that 
works.  It is not capable of executing the recovery command listed in the 
first block of every amdump file it (amdump) ever built right now, and 
I've played the equ of the 10,000 monkeys writing Shakespear for several 
hours trying.  Damned frustrating is what it is.

The error it reports seems to indicate that it cannot write through the 
pipes involved.  But with tar's error reporting, who the hell knows for 
sure.

Here is an example
[root@coyote data]# dd if=00010.coyote._lib.1 bs=32k count=1
AMANDA: FILE 20070314104344 coyote /lib  lev 1 comp .gz program /bin/tar
To restore, position tape at start of file and run:
 dd if=<tape> bs=32k skip=1 |  /bin/gzip -dc |  /bin/tar -f - ...

And the elipsis is an error if not removed.  Then one is supposed to be 
able to redirect tars output with the usual >/tmp/test/ syntax

So:
[root@coyote data]# dd if=00010.coyote._lib.1 bs=32k 
skip=1 |  /bin/gzip -dc |  /bin/tar -f - >/tmp/test/
-bash: /tmp/test/: Is a directory

which is the return from any variation in how the redirect is done.

So what is it that am I doing wrong in the above command line?, so I can 
add it to my helper scripts to be published eventually on zmanda.org.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. ...
From: Ray Lee
Date: Wednesday, March 14, 2007 - 9:45 pm

One of us is confused, and it may very well be me, but...

the /bin/tar -f - >/tmp/test/ looks to me like it should fail exactly as
bash says it does. the output redirect (>) will only write out to a
file, not a directory. (So, /tmp/file should work, /tmp/file/ won't.)

Are you trying to redirect where the files get restored? That should be
done with a cd before doing the uncompress.

Or am I misunderstanding what you're telling me?

Ray
-

From: Gene Heskett
Date: Wednesday, March 14, 2007 - 11:30 pm

No, apparently its me that's been running with a fubar'd understanding.
I was certain that tar (or bash) should have been able to put the 
recovered files IN the directory /tmp/test but that turns out to need 
more options after the '-f -' section of that sample line I posted.

Thanks.  A bunch..

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Mal: "You are very much lacking in imagination."

Zoe: "I imagine that's so, sir."
				--Episode #8, "Out of Gas"
-

From: Willy Tarreau
Date: Wednesday, March 14, 2007 - 10:28 pm

with "/bin/tar -f - >/tmp/test/", you ask bash to open the file "/tmp/test/"
for write, then start tar and pass this file as its stdout. Obviously this
is wrong. I think that what you're trying to do is send extracted files to
/tmp/test, which is what '-C' is for. Also, you need to specify a command
for tar. You didn't. I bet if you do the following, it will work :

[root@coyote data]# dd if=00010.coyote._lib.1 bs=32k skip=1 |
    /bin/gzip -dc |  /bin/tar -C /tmp/test/ -xf -

Now, Gene, this is becoming totally off-topic right here.

Regards,
Willy

-

From: Gene Heskett
Date: Wednesday, March 14, 2007 - 11:33 pm

On Thursday 15 March 2007, Willy Tarreau wrote:


My apologies, I've been corrected, thanks for your patience.  And I'll see 



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
It takes less time to do a thing right than it does to explain why you
did it wrong.
		-- H.W. Longfellow
-

Previous thread: LSM Stacking by JanuGerman on Tuesday, March 13, 2007 - 12:44 am. (2 messages)

Next thread: [PATCH] Remove CHILD_MAX by Roland McGrath on Tuesday, March 13, 2007 - 1:42 am. (1 message)