假设我有一个我想分析的域名列表.除非域名是连字符,否则我看不到一种特别简单的方法来“提取”域中使用的关键字.但我看到它在DomainTools.com,Estibot.com等网站上完成.例如:
ilikecheese.com becomes "i like cheese" sanfranciscohotels.com becomes "san francisco hotels" ...
有效和有效地实现这一目标的任何建议?
编辑:我想用PHP编写.
好吧,我运行了我为
this SO question编写的脚本,进行了一些小的更改 – 使用日志概率来避免下溢,并修改它以读取多个文件作为语料库.
原文链接:https://www.f2er.com/php/133672.html对于我的语料库,我从项目Gutenberg下载了一堆文件 – 没有真正的方法,只需从etext00,etext01和etext02中获取所有英语文件.
以下是结果,我保存了每个组合的前三名.
expertsexchange: 97 possibilities - experts exchange -23.71 - expert sex change -31.46 - experts ex change -33.86 penisland: 11 possibilities - pen island -20.54 - penis land -22.64 - pen is land -25.06 choosespain: 28 possibilities - choose spain -21.17 - chooses pain -23.06 - choose spa in -29.41 kidsexpress: 15 possibilities - kids express -23.56 - kid sex press -32.65 - kids ex press -34.98 childrenswear: 34 possibilities - children swear -19.85 - childrens wear -25.26 - child ren swear -32.70 dicksonweb: 8 possibilities - dickson web -27.09 - dick son web -30.51 - dicks on web -33.63