我有这个原始文本:
________________________________________________________________________________________________________________________________ Pos Car Competitor/Team Driver Vehicle Cap CL Laps Race.Time Fastest...Lap 1 6 Jason Clements Jason Clements BMW M3 3200 10 9:48.5710 3 0:57.3228* 2 42 David Skillender David Skillender Holden VS Commodore 6000 10 9:55.6866 2 0:57.9409 3 37 Bruce Cook Bruce Cook Ford Escort 3759 10 9:56.4388 4 0:58.3359 4 18 Troy Marinelli Troy Marinelli Nissan Silvia 3396 10 9:56.7758 2 0:58.4443 5 75 Anthony Gilbertson Anthony Gilbertson BMW M3 3200 10 10:02.5842 3 0:58.9336 6 26 Trent Purcell Trent Purcell Mazda RX7 2354 10 10:07.6285 4 0:59.0546 7 12 Scott Hunter Scott Hunter Toyota Corolla 2000 10 10:11.3722 5 0:59.8921 8 91 Graeme Wilkinson Graeme Wilkinson Ford Escort 2000 10 10:13.4114 5 1:00.2175 9 7 Justin Wade Justin Wade BMW M3 4000 10 10:18.2020 9 1:00.8969 10 55 Greg Craig Grag Craig Toyota Corolla 1840 10 10:18.9956 7 1:00.7905 11 46 Kyle Orgam-Moore Kyle Organ-Moore Holden VS Commodore 6000 10 10:30.0179 3 1:01.6741 12 39 Uptiles Strathpine Trent Spencer BMW Mini Cooper S 1500 10 10:40.1436 2 1:02.2728 13 177 Mark Hyde Mark Hyde Ford Escort 1993 10 10:49.5920 2 1:03.8069 14 34 Peter Draheim Peter Draheim Mazda RX3 2600 10 10:50.8159 10 1:03.4396 15 5 Scott Douglas Scott Douglas Datsun 1200 1998 9 9:48.7808 3 1:01.5371 16 72 Paul Redman Paul Redman Ford Focus 2lt 9 10:11.3707 2 1:05.8729 17 8 Matthew Speakman Matthew Speakman Toyota Celica 1600 9 10:16.3159 3 1:05.9117 18 74 Lucas Easton Lucas Easton Toyota Celica 1600 9 10:16.8050 6 1:06.0748 19 77 Dean Fuller Dean Fuller Mitsubishi Sigma 2600 9 10:25.2877 3 1:07.3991 20 16 Brett Batterby Brett Batterby Toyota Corolla 1600 9 10:29.9127 4 1:07.8420 21 95 Ross Hurford Ross Hurford Toyota Corolla 1600 8 9:57.5297 2 1:12.2672 DNF 13 Charles Wright Charles Wright BMW 325i 2700 9 9:47.9888 7 1:03.2808 DNF 20 Shane Satchwell Shane Satchwell Datsun 1200 Coupe 1998 1 1:05.9100 1 1:05.9100 Fastest Lap Av.Speed Is 152kph,Race Av.Speed Is 148kph R=under lap record by greatest margin,r=under lap record,*=fastest lap time ________________________________________________________________________________________________________________________________ Issue# 2 - Printed Sat May 26 15:43:31 2012 Timing System By NATSOFT (03)63431311 www.natsoft.com.au/results Amended
我需要将它解析为具有明显位置,汽车,驱动程序等字段的对象.问题是我不知道使用什么样的策略.如果我将它拆分为空格,我最终会得到一个如下列表:
["1","6","Jason","Clements","BMW","M3","3200","10","9:48.5710","3","0:57.3228*"]
你能看出这个问题吗?我不能只是解释这个列表,因为人们可能只有一个名字,或一个名字中的3个单词,或汽车中的许多不同的单词.它使得仅使用索引仅引用列表是不可能的.
如何使用列名定义的偏移量?我不太清楚如何使用它.
编辑:所以我使用的当前算法的工作方式如下:
>在新行上拆分文本,给出一组行.
>在每一行上找到FURTHEST RIGHT的常见空白字符.即每一行的位置(索引)彼此之间
行包含空格.例如:
>根据这些常见字符拆分线条.
>修剪线条
存在几个问题:
如果名称包含相同的长度,如下所示:
Jason Adams Bobby Sacka Jerry Louis
然后它将把它解释为两个单独的项目:([“Jason”“Adams”,“Bobby”,“Sacka”,“Jerry”,“Louis”]).
然而,如果它们都如此不同:
Dominic Bou Bob Adams Jerry Seinfeld
然后它将正确地分裂在Seinfeld的最后一个’d'(因此我们将得到三个名字的集合([“Dominic Bou”,“Bob Adams”,“Jerry Seinfeld”]).
它也很脆弱.我正在寻找一个更好的解决方案.
解决方法
您可以使用fixed_width gem.
require 'fixed_width' require 'pp' FixedWidth.define :cars do |d| d.head do |head| head.trap { |line| line !~ /\d/ } end d.body do |body| body.trap { |line| line =~ /^(\d|DNF)/ } body.column :pos,4 body.column :car,5 body.column :competitor,31 body.column :driver,25 body.column :vehicle,21 body.column :cap,5 body.column :cl_laps,11 body.column :race_time,11 body.column :fast_lap_no,4 body.column :fast_lap_time,10 end end pp FixedWidth.parse(File.open("races.txt"),:cars)
陷阱方法识别每个部分中的行.我使用了正则表达式:
>头部正则表达式查找不包含数字的行.
>正则表达式查找以数字或“DNF”开头的行
每个部分必须包括紧接在最后一行之后的行.列定义只是标识要抓取的列数.该库为您删除空白.如果要生成固定宽度的文件,可以添加对齐参数,但似乎不需要它.
结果是一个像这样开始的哈希:
{:head=>[{},{},{}],:body=> [{:pos=>"1",:car=>"6",:competitor=>"Jason Clements",:driver=>"Jason Clements",:vehicle=>"BMW M3",:cap=>"3200",:cl_laps=>"10",:race_time=>"9:48.5710",:fast_lap_no=>"3",:fast_lap_time=>"0:57.3228"},{:pos=>"2",:car=>"42",:competitor=>"David Skillender",:driver=>"David Skillender",:vehicle=>"Holden VS Commodore",:cap=>"6000",:race_time=>"9:55.6866",:fast_lap_no=>"2",:fast_lap_time=>"0:57.9409"},