谷粒商城学习——P124 es 自定义扩展词库


P122中安装的ik分词器, 本身默认的词库并不支持一些新的词汇,这就需要修改ik分词器的配置文件,指定远程词库进行扩展词库。ik分词器向远程发送请求要到最新的单词,这样最新的单词就会做为新的词源远行分解。可以给nginx发送请求,nginx反回最新词库。

虚拟机装nginx,内存不够,先将虚拟机关机,打开设置,将内存调到3075,然后重启

 

 然后扩大es内存

由于之前做了映射,扩大es内存最快的办法是删了es重新创建。docker ps查看es容器id,docker stop 和rm停掉和删除es容器

 转到data文件夹下,创建es

[root@10 data]# pwd
/mydata/elasticsearch/data
[root@10 data]# docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
> -e  "discovery.type=single-node" \
> -e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
> -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
> -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
> -v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
> -d elasticsearch:7.4.2

创建 nginx

在/mydata下创建nginx文件夹

  随便启动一个nginx实例,只是为了复制出配置

[root@10 mydata]# docker run -p 80:80 --name nginx -d nginx:1.10

执行后会自动下载,下载后会自动启动nginx容器

装上面的容器内的配置文件copy到当前目录,

docker container cp nginx:/etc/nginx .

cp:copy

nginx:/etc/nginx .:哪个容器(名字)下的哪个文件夹

 然后停掉并删除nginx,改下nginx目录结构

[root@10 nginx]# docker stop nginx
nginx
[root@10 nginx]# docker rm nginx
nginx
[root@10 nginx]# cd ../
[root@10 mydata]# mv nginx conf
[root@10 mydata]# mkdir nginx
[root@10 mydata]# mv conf/ nginx
[root@10 mydata]# ls
elasticsearch  mysql  nginx  redis

创建新的nginx,执行以下命令:

docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/logs/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10
-p 80:80映射nginx80端口,
\换行
将nginx所有html静态资源/usr/share/nginx/html映射到/mydata/nginx/html
将nginx所有日志想关信息/var/logs/nginx映射到/mydata/nginx/logs
将nginx的配置/etc/nginx关联到/mydata/nginx/conf
到这nginx就安装成功了

将要分词的内容放到html/es/fenci.txt中
[root@10 nginx]# cd html
[root@10 html]# mkdir es
[root@10 html]# cd es
[root@10 es]# ls
[root@10 es]# vi fenci.txt

填入

张亚南

尚硅谷

修改ik分词器的远程词库地址/usr/share/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml(已映射为/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml)

[root@10 config]# pwd
/mydata/elasticsearch/plugins/ik/config
[root@10 config]# vi IKAnalyzer.cfg.xml

打开远程扩展字典并配置

然后重启es

[root@10 config]# docker restart elasticsearch

测试ik_smart

POST _analyze
{
  "tokenizer": "ik_smart",
  "text": "张亚南喜欢尚硅谷"
}

结果:

{
  "tokens" : [
    {
      "token" : "张亚南",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "喜欢",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "尚硅谷",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

测试ik_max_word:

POST _analyze
{
  "tokenizer": "ik_max_word",
  "text": "张亚南喜欢尚硅谷"
}

结果:

{
  "tokens" : [
    {
      "token" : "张亚南",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "喜欢",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "尚硅谷",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "硅谷",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

到些,扩展词库配置完成

设置es开机自启:

[root@10 config]# docker update elasticsearch --restart=always