With new gitweb and new git it is not that expensive. It is now one call to git-for-each-ref per repository. Besides, we can't rely that .git/info/refs is up to date, or even exists. It is for dumb protocols, not for gitweb. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git -
Well, SOMETHING needs to be done for this page, since it can take 15 minutes or more to generate. Caching doesn't help one iota, since it's stale before being generated. -hpa -
Hi, To me, it seems like all boils down to caching parsed data structures. I.e. parse the config, then serialize the parsed data to a file. Don't reparse the config unless it is 1 hour older than the config. Likewise, run for-each-ref, and serialize the parsed data into a file. Don't rerun for-each-ref if that file is younger than 15 minutes. Maybe the same for the first 200 commits of each branch. (I made those times up, but you get the idea.) Ciao, Dscho -
A much better idea is to have that data structure updated on repository updates, which is the whole point behind .git/info/refs. On kernel.org, at least, if you don't keep .git/info/refs up to date you need to get your fingers whacked anyway, since it damages usability for one particular class of users. -hpa -
Hi, Granted, for some things this might work. However, I would not wreak havoc by changing the format of .git/info/refs, rather put the details you wanted into .git/info/refs-details. However, for other things (like showing a certain number of commits), it _might_ make sense to cache them (e.g. when literally thousands of people look at the 100 last commits of linux-2.6.git), but not for others (e.g. the 100th last to the 200th last commit of git-tools.git). Having said that, it should be relatively easy to store the (parsed, or at least easily parseable) 500 last commits of a branch into .git/info/commits-<branch>. This would put the burden of publishing a branch higher, easening the overall load on the server. Jakub? Ciao, Dscho -
It's not clear to me if it would be wrecking havoc. After all, if a format can't be expanded *at all*, there is something wrong, and adding things to the end of a line is a common structured way of expansion. Any query that's within a repository is fairly easily cachable post-generation. The front page (and its RSS variant) is a bit of an exception, because it involves all repositories at once. Doesn't mean we couldn't do better, but... -hpa -
Hi, The idea of .git/info/refs is to enable dumb transports to fetch something akin to intelligently. They don't need that information, and frankly, I don't think they should need to understand it. I also expect that they interpret everything after the sha1 as refname, what with our having become quite liberal with refnames (they can contain spaces, tabs, and even a small amount of special K). So I don't see a way to upgrade the file format. But as should be clear by now, I'd prefer additional information -- that is of no interest to dumb transports anyway -- to be put in an own file. That also opens the possibility of, say .git/info/perl/, which contains _only_ serialized perl objects! I imagine this could be a performance ... and here we have a problem, right? No single update hook can update the _whole_ information. Ciao, Dscho -
The simple and fast solution would be to make post-update hook contain the git-for-each-ref with parameters like in git_get_last_activity, saving e.g. to .git/info/last-committer, and in gitweb read this file if it exist, run git-for-each-ref otherwise (similar to what we used to do with .git/info/refs and git-peek-remote in gitweb). -- Jakub Narebski Poland -
Right, this is basically what I'm saying; the question is only whether or
not this fits into .git/info/refs or should be a separate file.
Either way, I think git-update-server-info should generate all these files.
-hpa
-
Hi, Well, no. At least not per default. What you want is _very_ special to gitweb. It is _only_ needed by gitweb. And .git/info/refs is for _dumb transports_, _not_ for gitweb. That said, I think it makes sense _in your setup_ to trigger updating _another_ file for use in gitweb. Remember, this is all very, very special for gitweb. So let's separate it cleanly from all which is not special for gitweb. I hope I have made it clear why (at least IMHO) it would be wrong, wrong, wrong to change the format of .git/info/refs _only_ for gitweb, which it is not meant for to begin with. So let's introduce another file in .git/info/ especially dedicated to gitweb. Then we are free to introduce real cool performance hacks, like using Storable to store the parsed data structures (I was alluding to this in an earlier reply, as "serializing"). Then you just retrieve the file -- if it exists -- or call for-each-ref (like Jakub said). By separating this gitweb-special thing cleanly, maybe into a hook, we can have a perl script which writes this file. We can write a simple hash, which may or may not contain keys, thus being of "extensible format". By having this perl script, you can -- as root -- run it as the appropriate user for each repository where it does not exist yet. Remains the problem: how do we _force_ this hook enabled site-wide, i.e. in _all_ repos? But that is too easy: just edit the existing template, and then replace the update hooks in all repos (possibly verifying that the existing update hook indeed matches the old template). So what problems remain with this approach? Ciao, Dscho -
Hi, No. Once again, .git/info/refs is _not_ for gitweb. But I will stop arguing about that topic, because I don't have enough time for that. Ciao, Dscho -
Right, this is basically what I'm saying; the question is only whether or
not this fits into .git/info/refs or should be a separate file.
Either way, I think git-update-server-info should generate all these files.
-hpa
-
Well, I think it was Johannes that said once for each ref. But either which way, it's a totally unacceptable load with resulting unacceptable latency. -hpa -
Hi, No. I would never say that you have to run for-each-ref for each ref. That's plain stupid. BTW I take some satisfaction in that you finally agreed (in another email) that some post-creation caching is necessary. I would be even more satisfied if you finally agreed that it is a good practice to separate conceptually different things, and not continued ad infinitum (and ad nauseam) arguing that .git/info/refs should serve dumb transports, and gitweb, and eventually bring peace to everybody on this planet. Ciao, Dscho -
I went back and looked at the thread, and I had indeed misread the original message, which was from Jakub, not you. I think I got in the "this is surreal" mode as a result of that (invoking for-each-ref 250 I don't believe I have ever disputed that (in fact, I have pushed very I've already said I think it's an aesthetic argument, but I don't really care either way, as long as there is only one hook that updates all the caches. I don't want the user to have to juggle an arbitrary and increasing number of hooks. Fair? -hpa -
Normally I'm not interested in the "Last Change" column, I just want to go to the project summary page, and normally I'm not interested in the last 16 tags (the last three are just enough). For me they should be show only when explicitly asked. Santi -
Hi, I just had another idea: why not generate the content of the "cover page" in a cron job, every minute or so, and save it into a static index.html? This should take quite a load from the server, since not even Perl has to be started to serve that page. Ciao, Dscho -
Ehm... because it often takes longer than that to generate the page? We can pre-generate the page before the first hit, but that's not a replacement for update-time caching. -hpa -
Hi,
Sorry, I should have been clearer. Plan:
1. echo "Generating" > /htdocs/git/index.html
2. edit crontab to do this every minute:
2.1 gitweb is called directly_, to generate /htdocs/git/index.html.new
2.2 /htdocs/git/index.html.new is _moved_ into /htdocs/git/index.html,
overwriting the existing one.
Yes, there could be two instances of this task concurrently. No, it does
It was only meant as a quick fix for the horrible workload.
Just a thought, feel free to ignore me,
Dscho
-
Yes, it does matter, because it drives the load up further. If you start having this going on in overlapping instances, then you're soon on And we have already experimented with it. It unfortunately doesn't help much, it only makes matters worse. -hpa -
We already cache it with a forced duration of some 15 minutes. The end result is exactly the same. -hpa -
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List |
