perl – 如何解析文件,创建记录并对记录执行操作,包括术语频率和距离计算

我是一个介绍Perl课程的学生,正在寻找建议和反馈我的方法来编写一个小的(但棘手的)程序来分析有关原子的数据.我的教授鼓励论坛.我没有使用Perl subs或模块(包括Bioperl),因此请将响应限制在适当的“初学者级别”,以便我能够理解并从您的建议和/或代码中学习(也请限制“魔术”).

该计划的要求如下:

  1. Read a file (containing data about Atoms) from the command line & create an array of atom records (one record/atom per newline). For each record the program will need to store:

    • The atom’s serial number (cols 7 – 11)
    • The three-letter name of the amino acid to which it belongs (cols 18 – 20)
    • The atom’s three coordinates (x,y,z) (cols 31 – 54 )
    • The atom’s one- or two-letter element name (e.g. C,O,N,Na) (cols 77-78 )

    >提示三个命令之一:频率,长度,密度d(d是某个数字):

    • freq – how many of each type of atom is in the file (example Nitrogen,Sodium,etc would be displayed like this: N: 918 S: 23
    • length – The distances among coordinates
    • density d (where d is a number) – program will prompt for the name of a file to save computations to and will containing the distance between that atom and every other atom. If that distance is less than or equal to the number d,it increments the count of the number of atoms that are within that distance,unless that count is zero into the file. The output will look something like:
    1: 5
    2: 3
    3: 6
    … (very big file) and will close when it finishes.

    我正在寻找下面代码中我写的(并且需要写)的反馈.我特别感谢有关如何编写我的潜艇的任何反馈.我在底部包含了示例输入数据.

    我看到的程序结构和功能描述:

    $^W = 1; # turn on warnings
    use strict; # behave!
    
    my @fields;
    my @recs;
    
    while ( <DATA> ) {
     chomp;
     @fields = split(/\s+/);
     push @recs,makeRecord(@fields);
    }
    
    for (my $i = 0; $i < @recs; $i++) {
     printRec( $recs[$i] );
    }
        my %command_table = (
     freq => \&freq,length => \&length,density => \&density,help => \&help,quit => \&quit
     );
    
    print "Enter a command: ";
    while ( <STDIN> ) {
     chomp; 
     my @line = split( /\s+/);
     my $command = shift @line;
     if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
        print "Command must be: freq,length,density or quit\n";
        }
      else {
        $command_table{$command}->();
        }
     print "Enter a command: ";
     }
    
    sub makeRecord 
        # Read the entire line and make records from the lines that contain the 
        # word ATOM or HETATM in the first column. Not sure how to do this:
    {
     my %record = 
     (
     serialnumber => shift,aminoacid => shift,coordinates => shift,element  => [ @_ ]
     );
     return\%record;
    }
    
    sub freq
        # take an array of atom records,return a hash whose keys are 
        # distinct atom names and whose values are the frequences of
        # these atoms in the array.  
    
    sub length
        # take an array of atom records and return the max distance 
        # between all pairs of atoms in that array. My instructor
        # advised this would be constructed as a for loop inside a for loop. 
    
    sub density
        # take an array of atom records and a number d and will return a
        # hash whose keys are atom serial numbers and whose values are 
        # the number of atoms within that distance from the atom with that
        # serial number. 
    
    sub help
    {
        print "To use this program,type either\n","freq\n","length\n","density followed by a number,d,\n","help\n","quit\n";
    }
    
    sub quit
    {
     exit 0;
    }
    
    # truncating for testing purposes. Actual data is aprox. 100 columns 
    # and starts with ATOM or HETATM.
    __DATA__
    ATOM   4743  CG  GLN A 704      19.896  32.017  54.717  1.00 66.44           C  
    ATOM   4744  CD  GLN A 704      19.589  30.757  55.525  1.00 73.28           C  
    ATOM   4745  OE1 GLN A 704      18.801  29.892  55.098  1.00 75.91           O

解决方法

看起来你的Perl技能正在很好地推进 – 使用引用和复杂的数据结构.以下是一些一般建议的提示和部分.

>使用警告而不是$^ W = 1启用警告.前者是自我记录的,并且具有封闭块的本地优势而不是全局设置.
>使用命名良好的变量,这将有助于记录程序的行为,而不是依赖于Perl的特殊$_.例如:

while (my $input_record = <DATA>){
}

>在用户输入场景中,无限循环提供了一种避免重复指令的方法,例如“输入命令”.见下文.
>您的正则表达式可以简化,以避免重复锚点的需要.见下文.
>作为一般规则,肯定性测试比否定测试更容易理解.请参阅下面修改后的if-else结构.
>将程序的每个部分都包含在自己的子程序中.由于一系列原因,这是一个很好的通用做法,所以我只是开始习惯.
>一个相关的良好做法是尽量减少全局变量的使用.作为练习,您可以尝试编写程序,使其根本不使用全局变量.相反,任何所需的信息都将在子例程之间传递.对于小程序,人们不一定需要对避免使用全局变量来保持僵硬,但是记住理想并不是一个坏主意.
>为您的长度子例程指定一个不同的名称.该名称已被内置长度函数使用.
>关于makeRecord的问题,一种方法是忽略makeRecord中的过滤问题.相反,makeRecord可以包含一个额外的哈希字段,过滤逻辑将驻留在其他地方.例如:

my $record = makeRecord(@fields);
push @recs,$record if $record->{type} =~ /^(ATOM|HETATM)$/;

以上一些要点的说明:

use strict;
use warnings;

run();

sub run {
    my $atom_data = load_atom_data();
    print_records($atom_data);
    interact_with_user($atom_data);
}

...

sub interact_with_user {
    my $atom_data = shift;
    my %command_table = (...);

    while (1){
        print "Enter a command: ";
        chomp(my $reply = <STDIN>);

        my ($command,@line) = split /\s+/,$reply;

        if ( $command =~ /^(freq|density|length|help|quit)$/ ) {
            # Run the command.
        }
        else {
            # Print usage message for user.
        }
    }
}

...

相关文章

忍不住在 PerlChina 邮件列表中盘点了一下 Perl 里的 Web 应用框架(巧的是 PerlBuzz 最近也有一篇相关...
bless有两个参数:对象的引用、类的名称。 类的名称是一个字符串,代表了类的类型信息,这是理解bless的...
gb2312转Utf的方法: use Encode; my $str = "中文"; $str_cnsoftware = encode("utf-8...
  perl 计算硬盘利用率, 以%来查看硬盘资源是否存在IO消耗cpu资源情况; 部份代码参考了iostat源码;...
1 简单变量 Perl 的 Hello World 是怎么写的呢?请看下面的程序: #!/usr/bin/perl print "Hello W...
本文介绍Perl的Perl的简单语法,包括基本输入输出、分支循环控制结构、函数、常用系统调用和文件操作,...