login
Header Space

 
 

Re: git filter-branch --subdirectory-filter

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: James Sadler <freshtonic@...>
Cc: <git@...>
Date: Saturday, May 10, 2008 - 1:53 am

On Sat, May 10, 2008 at 01:31:37PM +1000, James Sadler wrote:


This is only lightly tested, but the script below should do the trick.
It works as an index filter which munges all content in such a way that
a particular line is always given the same replacement text. That means
that diffs will look approximately the same, but will add and remove
lines that say "Fake line XXX" instead of the actual content.

You can munge the commit messages themselves by just replacing them with
some unique text; in the example below, we just replace them with the
md5sum of the content.

This will leave the original author, committer, and date, which is
presumably non-proprietary.

-- >8 --
#!/usr/bin/perl
#
# Obscure a repository while still maintaining the same history
# structure and diffs.
#
# Invoke as:
#   git filter-branch \
#     --msg-filter md5sum \
#     --index-filter /path/to/this/script

use strict;
use IPC::Open2;
use DB_File;
use Fcntl;
tie my %blob_cache, 'DB_File', 'blob-cache', O_RDWR|O_CREAT, 0666;
tie my %line_cache, 'DB_File', 'line-cache', O_RDWR|O_CREAT, 0666;

open(my $lsfiles, '-|', qw(git ls-files --stage))
  or die "unable to open ls-files: $!";
open(my $update, '|-', qw(git update-index --index-info))
  or die "unable to open upate-inex: $!";

while(<$lsfiles>) {
  my ($mode, $hash, $path) = /^(\d+) ([0-9a-f]{40}) \d\t(.*)/
    or die "bad ls-files line: $_";
  $blob_cache{$hash} = munge($hash)
    unless exists $blob_cache{$hash};
  print $update "$mode $blob_cache{$hash}\t$path\n";
}

close($lsfiles);
close($update);
exit $?;

sub munge {
  my $h = shift;

  open(my $in, '-|', qw(git show), $h)
    or die "unable to open git show: $!";
  open2(my $hash, my $out, qw(git hash-object -w --stdin));

  while(<$in>) {
    $line_cache{$_} ||= 'Fake line ' . $line_cache{CURRENT}++ . "\n";
    print $out $line_cache{$_};
  }

  close($in);
  close($out);

  my $r = <$hash>;
  chomp $r;
  return $r;
}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
git filter-branch --subdirectory-filter, James Sadler, (Thu May 8, 9:01 pm)
Re: git filter-branch --subdirectory-filter, Jeff King, (Thu May 8, 9:33 pm)
Re: git filter-branch --subdirectory-filter, James Sadler, (Fri May 9, 3:38 am)
Re: git filter-branch --subdirectory-filter, Jeff King, (Fri May 9, 4:00 am)
Re: git filter-branch --subdirectory-filter, James Sadler, (Fri May 9, 11:31 pm)
Re: git filter-branch --subdirectory-filter, Jeff King, (Sat May 10, 1:53 am)
Re: git filter-branch --subdirectory-filter, James Sadler, (Sat May 10, 7:38 am)
Re: git filter-branch --subdirectory-filter, Jeff King, (Sat May 10, 7:44 am)
Re: git filter-branch --subdirectory-filter, James Sadler, (Sat May 10, 3:10 am)
Re: git filter-branch --subdirectory-filter, Johannes Sixt, (Fri May 9, 3:57 am)
speck-geostationary