solr - 安装ik中文分词和初始化富文本检索

1.下载安装包

https://repo1.maven.org/maven2/org/apache/solr/solr-dataimporthandler/7.4.0/solr-dataimporthandler-7.4.0.jar

https://repo1.maven.org/maven2/org/apache/tika/tika-app/1.19.1/tika-app-1.19.1.jar


https://repo1.maven.org/maven2/org/apache/solr/solr-dataimporthandler-extras/7.4.0/solr-dataimporthandler-extras-7.4.0.jar

ik分词器 ，我放在git 了
https://github.com/cen-xi/netty-tcp-spring-boot/tree/0458cb5626dcde976270b5351d67e96b162356d0/src/main/resources

一共四个包，

把ik 的jar放到 E:\plug\solr\solr-7.7.3\server\solr-webapp\webapp\WEB-INF\lib 里面

其他的放到 E:\plug\solr\solr-7.7.3\contrib\extraction\lib

在 E:\plug\solr\solr-7.7.3\server\solr-webapp\webapp\WEB-INF 新建 classes 文件夹

然后在里面新建 IKAnalyzer.cfg.xml ，

<?xml version="1.0" encoding="UTF-8"?>
"http://java.sun.com/dtd/properties.dtd">  
  
    IK Analyzer 扩展配置
    
    "ext_dict">hotword.dic;
    
    
    "ext_stopwords">stopword.dic;

剩下的两个 hotword.dic 和 stopword.dic ，用sublime来创建，格式为 utf8且无bom的，否则不生效

更改hotword.dic 和 stopword.dic 后需要重启solr才生效

2.如果需要加载数据源，用在添加富文本检索数据【代码里一般是在上传文件时就用tk来抽取检索内容然后存入solr的 ,这样用来初始化solr的检索文件的，比如数据丢失，一般将备份检索数据存在mysql里，启动时初始化检索数据】

需要在进入创建的 core 里，E:\plug\solr\solr-7.7.3\server\solr\mycore1\conf 找到 solrconfig.xml

添加

  "/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    "defaults">
      "config">tika-data-config.xml

在同级目录添加 tika-data-config.xml 文件

内容为

<?xml version="1.0" encoding="UTF-8" ?>

    "BinFileDataSource"/>
    
        "file" processor="FileListEntityProcessor" dataSource="null"
                baseDir="E:/plug/solr/testfile/" fileName=".(doc)|(pdf)|(docx)|(txt)|(csv)|(json)|(xml)|(pptx)|(pptx)|(ppt)|(xls)|(xlsx)"
                rootEntity="false">
            "file" name="id"/>
            "fileSize" name="fileSize"/>
            "fileLastModified" name="fileLastModified"/>
            "fileAbsolutePath" name="fileAbsolutePath"/>
            "pdf" processor="TikaEntityProcessor" url="${file.fileAbsolutePath}" format="text">
                "Author" name="author" meta="true"/>
                
                "title" name="title" meta="true"/>
                "text" name="text"/>

E:/plug/solr/testfile/  是存放文件的目录

E:\plug\solr\solr-7.7.3\server\solr\mycore1\conf 找到 managed-schema

添加类型和字段，字段时根据需要来添加，但是则必须要有

  
      "title" type="text_ik" indexed="true" stored="true"/>
      "pdf" type="text_ik" indexed="true" stored="true"/>
      "mytab666" type="text_ik" indexed="true" stored="true"/>
      "text" type="text_ik" indexed="true" stored="true" />
      "author" type="text_ik" indexed="true" stored="true"/>
      "fileSize" type="plong" indexed="true" stored="true"/>
      "fileLastModified" type="pdate" indexed="true" stored="true"/>
      "fileAbsolutePath" type="string" indexed="true" stored="true"/>
      "text_ik" class ="solr.TextField">
               "index" isMaxWordLength ="false" class ="org.wltea.analyzer.lucene.IKAnalyzer"/>
               "query" isMaxWordLength ="true" class ="org.wltea.analyzer.lucene.IKAnalyzer"/>

保存后，在 http://localhost:8983/solr/ 控制面板的 core admin找到这个code 然后点击 reload , 否则不生效

数据源配置完了，找到 Dataimport ，执行数据源的导入操作，否则不更新检索数据

测试

solr文件检索服务器

solr - 安装ik中文分词和初始化富文本检索

相关

标签

solr - 安装ik中文分词 和初始化富文本检索

相关

标签

solr - 安装ik中文分词和初始化富文本检索