Understanding Natural Language Queries over Relational Databases论文学习


研究背景

  • NLIDBs have many advantages over other widely accepted query interfaces (keyword-based search, form-based interface, and visual query builder).(NLIDB 与其他广泛接受的查询界面(基于关键字的搜索、基于表单的界面和可视化查询构建器)相比具有许多优势。
  • Despite many advantages, NLIDBs have not been adopted widely. The fundamental problem is that understanding natural language is hard.(尽管有许多优点,但 NLIDB 并未被广泛采用。 根本问题是理解自然语言很困难。

查询机制

The query mechanism of the system NALIR (Natural Language Interface to Relational databases) facilitates collaboration between the system and the user in processing natural language queries. First, the system explains how it interprets a query, from each ambiguous word/phrase to the meaning of the whole sentence. Second, for each ambiguous part, we provide multiple likely interpretations for the user to choose from.(系统NALIR的查询机制有利于系统与用户协作处理自然语言查询。 首先,系统解释它如何解释查询,从每个歧义词/短语到整个句子的含义。其次,对于每个不明确的部分,我们提供了多种可能的解释供用户选择。

系统架构

The entire system consists of three main parts: the query interpretation part, interactive communicator and query tree translator. The query interpretation part, which includes parse tree node mapper and structure adjustor, is responsible for interpreting the natural language query and representing the interpretation as a query tree. The interactive communicatoris responsible for communicating with the user to ensure that the interpretation process is correct. The query tree, possibly verified by the user, will be translated into a SQL statement in the query tree translator and then evaluated against an RDBMS.(整个系统由三个主要部分组成:查询解释部分、交互通信器和查询树翻译器。 查询解释部分,包括解析树节点映射器和结构调整器,负责解释自然语言查询并将解释表示为查询树。 交互式通讯器负责与用户进行通讯,以确保解释过程的正确性。 可能由用户验证的查询树将在查询树转换器中转换为 SQL 语句,然后根据 RDBMS 进行评估。

  • Adjust the structure of the parse tree in two steps. In the first step, we reformulate the nodes in the parse tree to make it fall in the syntactic coverage of our system (valid parse tree). If there are multiple candidate valid parse trees for the query, we choose the best one as default input for the second step and report top k of them to the interactive communicator. In the second step, the chosen (or default) valid parse tree is analyzed semantically and implicit nodes are inserted to make it more semantically reasonable. This process is also under the supervision of the user. (分两步调整解析树的结构。第一步,我们重新构造解析树中的节点,使其落入我们系统的句法覆盖范围内(有效解析树)。 如果查询有多个候选的有效解析树,我们选择最好的一个作为第二步的默认输入,并将其中前 k 个报告给交互式通信器。第二步,对选择的(或默认的)有效解析树进行语义分析,并插入隐式节点,使其在语义上更合理。 这个过程也是在用户的监督下进行的。
  • Interactive communications are organized in three steps, which verify the intermediate results in the parse tree node mapping, parse tree structure reformulation, and implicit node insertion, respectively.(交互通信分为三个步骤,分别验证解析树节点映射、解析树结构重构和隐式节点插入中的中间结果。

如何做实验

  • There are two crucial aspects we must evaluate: the quality of the returned results (effectiveness) and whether our system is easy to use for non-technical users (usability).(我们必须评估两个关键方面:返回结果的质量(有效性)以及我们的系统是否易于非技术用户使用(可用性)。
  • The experiment was a user study, in which participants were asked to finish the query tasks we designed for them.(该实验是一项用户研究,其中要求参与者完成我们为他们设计的查询任务。
  • We used the data set of Microsoft Academic Search (MAS). We compared our system with the faceted interface of the MAS website.(我们使用了 Microsoft Academic Search (MAS) 的数据集。我们将我们的系统与 MAS 网站的分面界面进行了比较。

相关