前面虽然介绍了一种方法能够逐一查找到给定string中可以匹配的内容,但是相对来说比较麻烦.
于是标准库提供了:
- std::regex_iterator
- template<class BidirIt,class CharT = typename std::iterator_traits<BidirIt>::value_type,class Traits = std::regex_traits<CharT>
- > class regex_iterator
std::regex_iterator::regex_iterator
- regex_iterator();
- regex_iterator(BidirIt a,BidirIt b,const regex_type& re,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default);
- regex_iterator(const regex_iterator&);
- regex_iterator(BidirIt,BidirIt,const regex_type&&,std::regex_constants::match_flag_type =
- std::regex_constants::match_default) = delete;
需要特别注意的是:
即使std::regex_iterator是一个class,我们并不能使用std::regex_iterator<BidirectionItr > iter;
而必须使用std::regex_iterator< BidirectionItr > iterator = std::regex_iterator<BidirectionItr >()
或者std::regex_iterator<itr> iterator = std::regex_iterator<itr>(BidirectionItr,BidirectionItr,std::basic_regex<type>);
看个例子就明白了:
- #include <iostream>
- #include <regex>
- #include <string>
- #include <iterator>
- int main()
- {
- std::basic_string<char> data("姓名: 唐彤 语文: 90 数学: 100 英语: 100");
- std::basic_regex<char> regex("[0-9]{1,}");
- std::basic_string<char>::const_iterator pos = data.cbegin();
- std::basic_string<char>::const_iterator end = data.cend();
- std::regex_iterator<std::basic_string<char>::const_iterator> regItr =
- std::regex_iterator<std::basic_string<char>::const_iterator>(pos,end,regex);
- //std::regex_iterator<std::basic_string<char>::const_iterator> end;
- std::cout << std::distance(regItr,std::regex_iterator<std::basic_string<char>::const_iterator>()) << std::endl;
- for (; regItr != std::regex_iterator<std::basic_string<char>::const_iterator>(); ++regItr) {
- std::cout << std::boolalpha << std::is_same<decltype(*regItr),const std::match_results<std::basic_string<char>::const_iterator>&>::value << std::endl;
- std::cout << regItr->str() << std::endl;
- }
- return 0;
- }
下面是标准库提供的切分器(tokenize):
1,所谓切分器就是说我们通过正则表达式匹配从给定string匹配我们不想要的内容,然后提取出来两个不想要的内容值之间的内容!
2,我们通过指定正则表达式匹配给定string中的内容但是,我们只对匹配到内容的某个子串感兴趣的时候也可以通过制定submatchs来只保存对我们有用的子串.
- std::regex_token_iterator
- template<class BidirIt,class Traits = std::regex_traits<CharT>
- > class regex_token_iterator
我们必须提供一个双向的迭代器(BidirectionalIterator)
进去!
- //默认构造函数一般被当作end来使用.
- regex_token_iterator();
- //传入2个双向只读迭代器(iterator),一个std::basic_regex,并且指定一个submatch(稍后详细介绍).
- regex_token_iterator( BidirectionalIterator a,BidirectionalIterator b,int submatch = 0,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default );
- //传入2个双向只读迭代器(iterator),并且通过一个vector来指定我们感兴趣的子串.
- regex_token_iterator( BidirectionalIterator a,const std::vector<int>& submatches,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default );
- //传入2个双向只读迭代器,并且通过initializer_list来指定匹配到的我们感兴趣的子串.
- regex_token_iterator( BidirectionalIterator a,std::initializer_list<int> submatches,并且通过一个数组来指定匹配到的我们感兴趣的子串.
- template <std::size_t N>
- regex_token_iterator( BidirectionalIterator a,const int (&submatches)[N],std::regex_constants::match_flag_type m =
- std::regex_constants::match_default );
- //拷贝构造函数.
- regex_token_iterator( const regex_token_iterator& other );
- //不能接受一个右值的std::basic_regex.
- regex_token_iterator( BidirectionalIterator a,const regex_type&& re,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default ) = delete;
- regex_token_iterator( BidirectionalIterator a,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default ) = delete;
- regex_token_iterator( BidirectionalIterator a,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default ) = delete;
- template <std::size_t N>
- regex_token_iterator( BidirectionalIterator a,std::regex_constants::match_flag_type m =
- std::regex_constants::match_default ) = delete;
Demo for std::regex_token_iterator:
- #include <iostream>
- #include <utility>
- #include <regex>
- #include <string>
- #include <vector>
- int main()
- {
- std::basic_string<char> data = "<person>\n"
- "<first>Nico</first>"
- "<last>Josuttis</last>"
- "</person>";
- std::basic_regex<char> regex("<(.*)>(.*)</\\1>");
- std::regex_token_iterator<std::basic_string<char>::const_iterator> beg(data.cbegin(),data.cend(),regex,{ 0,2 });
- std::regex_token_iterator<std::basic_string<char>::const_iterator> end;
- std::cout << std::boolalpha << std::is_same<decltype(*beg),const std::sub_match<std::basic_string<char>::const_iterator>&>::value << std::endl;
- for (; beg != end; ++beg) {
- std::cout << beg->length() << std::endl;
- std::cout << beg->str() << std::endl;
- }
- std::cout << "----------------------" << std::endl;
- std::basic_string<char> names("nico,jim,helmut,paul,tim,john,rita");
- std::basic_regex<char> regex_("[,:.]+");
- //上面通过正则表达式指定为分割器.
- //上面的正则表达式指出: 从给定的string(names)中匹配,或:或.至少一次.
- //但是我们的本意是不想要这些,:.于是切分器就提供了这样的功能.
- std::basic_string<char>::const_iterator _beg = names.cbegin();
- std::basic_string<char>::const_iterator _end = names.cend();
- //注意下面的最后一个参数我们指定为: -1.
- std::regex_token_iterator<std::basic_string<char>::const_iterator> pos(_beg,_end,regex_,-1);
- std::regex_token_iterator<std::basic_string<char>::const_iterator> end_;
- for (; pos != end_; ++pos) {
- std::cout << pos->str() << std::endl;
- }
- return 0;
- }
上面的std::regex_token_iterator除了可以指定std::regex_constants::match_flag_type之外还可以指定一个额外的参数:
1,可以是一个int.
2,可以是一个std::vector,
3,可以是一个std::initializer_list.
4,可以是一个数组.
比如上面的Demo中我们指出:
{0,2} : 它的含义是我们只对每次匹配到的string中的第二个子串感兴趣.
-1:表明我们对正则表达式两次匹配之间的内容感兴趣.
n(>0) : 表明我们对第n次匹配到的string感兴趣.