NLTK笔记


加载自定义语料库:

1 from nltk.corpus import PlaintextCorpusReader
2 corpus_root = '/tmp' #路径
3 wordlists = PlaintextCorpusReader(corpus_root, '.*') #可以是a.txt
4 wordlists.fileids()