如何按百分比分割文件.线?
假设我想将文件分成3个部分(60%/ 20%/ 20%部分),我可以手动执行此操作,-_-:
$wc -l brown.txt 57339 brown.txt $bc <<< "57339 / 10 * 6" 34398 $bc <<< "57339 / 10 * 2" 11466 $bc <<< "34398 + 11466" 45864 bc <<< "34398 + 11466 + 11475" 57339 $head -n 34398 brown.txt > part1.txt $sed -n 34399,45864p brown.txt > part2.txt $sed -n 45865,57339p brown.txt > part3.txt $wc -l part*.txt 34398 part1.txt 11466 part2.txt 11475 part3.txt 57339 total
但我相信有更好的方法!
有一个实用程序将行号作为参数,这些行号应成为每个相应新文件的第一个:csplit.这是它的
POSIX version的包装:
#!/bin/bash usage () { printf '%s\n' "${0##*/} [-ks] [-f prefix] [-n number] file arg1..." >&2 } # Collect csplit options while getopts "ksf:n:" opt; do case "$opt" in k|s) args+=(-"$opt") ;; # k: no remove on error,s: silent f|n) args+=(-"$opt" "$OPTARG") ;; # f: filename prefix,n: digits in number *) usage; exit 1 ;; esac done shift $(( OPTIND - 1 )) fname=$1 shift ratios=("$@") len=$(wc -l < "$fname") # Sum of ratios and array of cumulative ratios for ratio in "${ratios[@]}"; do (( total += ratio )) cumsums+=("$total") done # Don't need the last element unset cumsums[-1] # Array of numbers of first line in each split file for sum in "${cumsums[@]}"; do linenums+=( $(( sum * len / total + 1 )) ) done csplit "${args[@]}" "$fname" "${linenums[@]}"
在要拆分的文件的名称之后,它采用拆分文件的大小相对于它们的总和的比率,即,
percsplit brown.txt 60 20 20 percsplit brown.txt 6 2 2 percsplit brown.txt 3 1 1
都是等价的.
与问题案例类似的用法如下:
$percsplit -s -f part -n 1 brown.txt 60 20 20 $wc -l part* 34403 part0 11468 part1 11468 part2 57339 total
但编号从零开始,并且没有txt扩展名. GNU version支持–suffix格式选项,该选项允许.txt扩展,并且可以添加到接受的参数中,但这需要比getopts更精细的解析它们.