smallseg python分词
fxsjy
2009-10-13
http://code.google.com/p/smallseg/
特点:可自定义词典、切割后返回登录词列表和未登录词列表、有一定的新词识别能力 性能(对于2000字左右的文章进行分词): Load dict... Dict is OK. 1 times, cost: 0.0160000324249 2 times, cost: 0.0150001049042 3 times, cost: 0.0159997940063 4 times, cost: 0.0310001373291 5 times, cost: 0.0309998989105 6 times, cost: 0.0320000648499 7 times, cost: 0.0460000038147 8 times, cost: 0.0469999313354 9 times, cost: 0.0629999637604 10 times, cost: 0.0620000362396 11 times, cost: 0.0780000686646 12 times, cost: 0.0789999961853 13 times, cost: 0.0780000686646 14 times, cost: 0.0929999351501 15 times, cost: 0.0940001010895 16 times, cost: 0.109999895096 17 times, cost: 0.108999967575 18 times, cost: 0.108999967575 19 times, cost: 0.125 20 times, cost: 0.125 ******************************** |
|
fxsjy
2009-10-14
System.out.println(seg.cut("日照香炉生紫烟,遥看瀑布挂前川。飞流直下三千尺,疑是银河落九天。")); System.out.println(seg.cut("伊藤洋华堂总府店")); System.out.println(seg.cut("永和服装饰品有限公司")); r:[日照, 香炉, 瀑布, 飞流, 直下, 疑是, 银河, 落九天] u:[生紫烟, 紫烟, 遥看, 挂前川, 前川, 三千尺, 千尺] r:[洋华堂] u:[伊藤, 总府店, 府店] r:[永和, 服装, 饰品, 有限, 公司] u:[] |