我希望您通过删除具有相同值的列来修剪文件.
# the file I have (tab-delimited,millions of columns) jack 1 5 9 john 3 5 0 lisa 4 5 7
# the file I want (remove the columns with the same value in all lines) jack 1 9 john 3 0 lisa 4 7
你能否就这个问题给我任何指示?我更喜欢sed或awk解决方案,或者可能是perl解决方案.
提前致谢.
最好,
解决方法
#!/usr/bin/perl $/="\t"; open(R,"<","/tmp/filename") || die; while (<R>) { next if (($. % 4) == 3); print; }
好吧,这是假设它是第三列.如果是按价值:
#!/usr/bin/perl $/="\t"; open(R,"/tmp/filename") || die; while (<R>) { next if (($_ == 5); print; }
通过问题编辑,OP的愿望变得清晰.怎么样:
#!/usr/bin/perl open(R,"/tmp/filename") || die; my $first = 1; my (@cols); while (<R>) { my (@this) = split(/\t/); if ($. == 1) { @cols = @this; } else { for(my $x=0;$x<=$#cols;$x++) { if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x])) { $cols[$x] = undef; } } } next if (($_ == 5)); # print; } close(R); my(@del); print "Deleting columns: "; for(my $x=0;$x<=$#cols;$x++) { if (defined($cols[$x])) { print "$x ($cols[$x]),"; push(@del,$x-int(@del)); } } print "\n"; open(R,"/tmp/filename") || die; while (<R>) { chomp; my (@this) = split(/\t/); foreach my $col (@del) { splice(@this,$col,1); } print join("\t",@this)."\n"; } close(R);