谷粒商城学习——P124 es 自定义扩展词库
P122中安装的ik分词器, 本身默认的词库并不支持一些新的词汇,这就需要修改ik分词器的配置文件,指定远程词库进行扩展词库。ik分词器向远程发送请求要到最新的单词,这样最新的单词就会做为新的词源远行分解。可以给nginx发送请求,nginx反回最新词库。
虚拟机装nginx,内存不够,先将虚拟机关机,打开设置,将内存调到3075,然后重启
然后扩大es内存
由于之前做了映射,扩大es内存最快的办法是删了es重新创建。docker ps查看es容器id,docker stop 和rm停掉和删除es容器
转到data文件夹下,创建es
[root@10 data]# pwd /mydata/elasticsearch/data [root@10 data]# docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ > -e "discovery.type=single-node" \ > -e ES_JAVA_OPTS="-Xms64m -Xmx512m" \ > -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ > -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ > -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ > -d elasticsearch:7.4.2
创建 nginx
在/mydata下创建nginx文件夹
随便启动一个nginx实例,只是为了复制出配置
[root@10 mydata]# docker run -p 80:80 --name nginx -d nginx:1.10
执行后会自动下载,下载后会自动启动nginx容器
装上面的容器内的配置文件copy到当前目录,
docker container cp nginx:/etc/nginx .
cp:copy
nginx:/etc/nginx .:哪个容器(名字)下的哪个文件夹
然后停掉并删除nginx,改下nginx目录结构
[root@10 nginx]# docker stop nginx nginx [root@10 nginx]# docker rm nginx nginx [root@10 nginx]# cd ../ [root@10 mydata]# mv nginx conf [root@10 mydata]# mkdir nginx [root@10 mydata]# mv conf/ nginx [root@10 mydata]# ls elasticsearch mysql nginx redis
创建新的nginx,执行以下命令:
docker run -p 80:80 --name nginx \ -v /mydata/nginx/html:/usr/share/nginx/html \ -v /mydata/nginx/logs:/var/logs/nginx \ -v /mydata/nginx/conf:/etc/nginx \ -d nginx:1.10
-p 80:80映射nginx80端口,
\换行
将nginx所有html静态资源/usr/share/nginx/html映射到/mydata/nginx/html
将nginx所有日志想关信息/var/logs/nginx映射到/mydata/nginx/logs
将nginx的配置/etc/nginx关联到/mydata/nginx/conf
到这nginx就安装成功了
将要分词的内容放到html/es/fenci.txt中
[root@10 nginx]# cd html [root@10 html]# mkdir es [root@10 html]# cd es [root@10 es]# ls [root@10 es]# vi fenci.txt
填入
张亚南
尚硅谷
修改ik分词器的远程词库地址/usr/share/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml(已映射为/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml)
[root@10 config]# pwd /mydata/elasticsearch/plugins/ik/config [root@10 config]# vi IKAnalyzer.cfg.xml
打开远程扩展字典并配置
然后重启es
[root@10 config]# docker restart elasticsearch
测试ik_smart
POST _analyze { "tokenizer": "ik_smart", "text": "张亚南喜欢尚硅谷" }
结果:
{ "tokens" : [ { "token" : "张亚南", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "喜欢", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 1 }, { "token" : "尚硅谷", "start_offset" : 5, "end_offset" : 8, "type" : "CN_WORD", "position" : 2 } ] }
测试ik_max_word:
POST _analyze { "tokenizer": "ik_max_word", "text": "张亚南喜欢尚硅谷" }
结果:
{ "tokens" : [ { "token" : "张亚南", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "喜欢", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 1 }, { "token" : "尚硅谷", "start_offset" : 5, "end_offset" : 8, "type" : "CN_WORD", "position" : 2 }, { "token" : "硅谷", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 3 } ] }
到些,扩展词库配置完成
设置es开机自启:
[root@10 config]# docker update elasticsearch --restart=always