smallseg python分词

fxsjy 2009-10-13
http://code.google.com/p/smallseg/

特点:可自定义词典、切割后返回登录词列表和未登录词列表、有一定的新词识别能力

性能(对于2000字左右的文章进行分词):
Load dict...
Dict is OK.
1 times, cost: 0.0160000324249
2 times, cost: 0.0150001049042
3 times, cost: 0.0159997940063
4 times, cost: 0.0310001373291
5 times, cost: 0.0309998989105
6 times, cost: 0.0320000648499
7 times, cost: 0.0460000038147
8 times, cost: 0.0469999313354
9 times, cost: 0.0629999637604
10 times, cost: 0.0620000362396
11 times, cost: 0.0780000686646
12 times, cost: 0.0789999961853
13 times, cost: 0.0780000686646
14 times, cost: 0.0929999351501
15 times, cost: 0.0940001010895
16 times, cost: 0.109999895096
17 times, cost: 0.108999967575
18 times, cost: 0.108999967575
19 times, cost: 0.125
20 times, cost: 0.125
********************************
fxsjy 2009-10-14
System.out.println(seg.cut("日照香炉生紫烟,遥看瀑布挂前川。飞流直下三千尺,疑是银河落九天。"));
System.out.println(seg.cut("伊藤洋华堂总府店"));
System.out.println(seg.cut("永和服装饰品有限公司"));

r:[日照, 香炉, 瀑布, 飞流, 直下, 疑是, 银河, 落九天]
u:[生紫烟, 紫烟, 遥看, 挂前川, 前川, 三千尺, 千尺]
r:[洋华堂]
u:[伊藤, 总府店, 府店]
r:[永和, 服装, 饰品, 有限, 公司]
u:[]
Global site tag (gtag.js) - Google Analytics