Exam (elaborations)

Threshold-Based Retrieval and Textual Entailment Detection on Legal Bar Exam Questions

5 views 0 purchase

Course
Threshold-Based Retrieval and Textual Entailment

Institution
Threshold-Based Retrieval And Textual Entailment

RELATED WORK The related work for our approach is divided in two parts: The legal information retrieval task and the entailment detection task. The first part consists of approaches using BM25 scoring or word embeddings, as well as similarity thresholding for a retrieval task. We further prese...

[Show more]

Preview 2 out of 9 pages

View example

Uploaded on August 7, 2024
Number of pages 9
Written in 2024/2025
Type Exam (elaborations)
Contains Questions & answers

threshold based retrieval and textual entailment d
legal information retrieval

Institution Threshold-Based Retrieval and Textual Entailment
Course Threshold-Based Retrieval and Textual Entailment

TIFFACADEMICS Member since 1 year 532 documents sold

$12.99

Added

Add to cart Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Threshold-Based Retrieval and Textual Entailment Detection on
Legal Bar Exam Questions
Sabine Wehnert Sayed Anisul Hoque
sabine.wehnert@ovgu.de sayed.hoque@st.ovgu.de
Otto von Guericke University Magdeburg Otto von Guericke University Magdeburg
Germany Germany

Wolfram Fenske Gunter Saake
wolfram.fenske@ovgu.de saake@iti.cs.uni-magdeburg.de
arXiv:1905.13350v1 [cs.IR] 30 May 2019

Otto von Guericke University Magdeburg Otto von Guericke University Magdeburg
Germany Germany

ABSTRACT jurisdictions falling under the scope of their activities. Tracking
Getting an overview over the legal domain has become challeng- changes in law is a challenging task, especially in statutory law
ing, especially in a broad, international context. Legal question where a single modification may affect the applicability of several
answering systems have the potential to alleviate this task by au- legal articles, due to implicit co-dependencies between these doc-
tomatically retrieving relevant legal texts for a specific statement uments. While domain experts are mostly required to ensure a
and checking whether the meaning of the statement can be in- reliable assessment of relationships among laws and their implica-
ferred from the found documents. We investigate a combination of tions, the amount of legal documents is hard to oversee for a single
the BM25 scoring method of Elasticsearch with word embeddings person. Therefore, a decision support system can help in finding rel-
trained on English translations of the German and Japanese civil evant laws and applying them to a specific question or statement.1
law. For this, we define criteria which select a dynamic number Finding out whether a statement is true, given a corpus of legal text,
of relevant documents according to threshold scores. Exploiting falls under the task of legal question answering. A legal question
two deep learning classifiers and their respective prediction bias answering system consists of two major parts: document retrieval
with a threshold-based answer inclusion criterion has shown to be and textual entailment recognition. In the retrieval phase, relevant
beneficial for the textual entailment task, when compared to the law articles are selected for a query, having the form of a statement
baseline. which shall be supported or contradicted by the law articles from
the document collection. During the textual entailment phase, the
CCS CONCEPTS query and accordingly retrieved legal documents are processed by
a classification algorithm which returns “yesž in case of positive
· Information systems → Question answering; Similarity mea-
textual entailment or “nož otherwise. This work is a contribution
sures; Relevance assessment; · Computing methodologies → Neu-
to the Competition on Legal Information Extraction/Entailment
ral networks.
(COLIEE) competition which provides a dataset from Japanese bar
exam questions (translated to English) for evaluating the system
KEYWORDS performance on both tasks, retrieval and entailment classification.
legal text retrieval, textual entailment, stacked encoder, explainable Our contribution involves the following methods:
artificial intelligence, threshold-based relevance scoring
• We combine results from BM25 scoring with word embedding-
ACM Reference Format:
based retrieval.
Sabine Wehnert, Sayed Anisul Hoque, Wolfram Fenske, and Gunter Saake.
2019. Threshold-Based Retrieval and Textual Entailment Detection on Legal • We develop a stacked encoder ensemble for entailment de-
Bar Exam Questions. In Proceedings of COLIEE 2019 workshop: Competition tection.
on Legal Information Extraction/Entailment (COLIEE 2019). ACM, New York, • We use thresholding for both approaches.
NY, USA, 9 pages.
The remainder of this work is structured as follows: Section 2
outlines related work for both tasks with respect to their achieve-
1 INTRODUCTION ments using similar methods to our approach. In Section 3, we
Nowadays, globalization poses a challenge for many international describe basic concepts for string representation in machine learn-
organizations, since they need to ensure compliance to laws of all ing models, scoring methods and stacked encoders. We explain
our approach in detail in Section 4 and show evaluation results in
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
Section 5. After discussing those results, we conclude our findings
for profit or commercial advantage and that copies bear this notice and the full citation and mention our future work considerations.
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
COLIEE 2019, June 21, 2019, Montreal, Quebec
© 2019 Copyright held by the owner/author(s).
1 The work is supported by Legal Horizon AG, Grant No.:1704/00082

, COLIEE 2019, June 21, 2019, Montreal, Quebec Wehnert et al.

2 RELATED WORK the entailment relationship. The task is performed on the SNLI2
The related work for our approach is divided in two parts: The dataset which is based on image captioning. Rocktäschel et al. apply
legal information retrieval task and the entailment detection task. neural attention [2] for entailment recognition on the same SNLI
The first part consists of approaches using BM25 scoring or word corpus [26]. Two LSTM networks are employed for encoding the
embeddings, as well as similarity thresholding for a retrieval task. query and the document, whereby the output vectors from the
We further present deep learning methods, followed by approaches document are used by an attention mechanism for each word in
using thresholds for a textual entailment task. the respective query. Their method achieves 83.5% accuracy, which
is compared to the results by Bowman et al. an improvement of
2.1 Legal Information Retrieval 3.3 percentage points. Liu et al. use a bidirectional LSTM with
an attention mechanism [16] and obtained 85% accuracy on the
2.1.1 BM25-Based Solutions. In the COLIEE ’16 competition, On-
SNLI dataset. A stacked encoder architecture developed by Nie and
odera and Yoshioka apply BM25 scoring for information retrieval
Bansal achieved 86.1% accuracy on the SNLI dataset. Considering
with several extensions using query keyword expansion. Their best
that result as the state-of-the-art, we adapt the main idea to our
result was an F-measure of 54.5% [11]. Arora et al. observe the
task in the legal domain and further explain this architecture in
best score with the BM25 scoring method on a different task of
section 3.3. Do et al. use a convolutional neural network (CNN)
legal document retrieval [1], compared to language models and
with word embeddings [6]. They incorporate additional features
term frequency - inverse document frequency (TF-IDF) weighting.
from a TF-IDF and latent semantic indexing (LSI) representation of
This finding contradicts the previous observations from COLIEE
the sentences. Finally, they feed these features in conjunction with
competitions and the FIRE 2017 IRLeD Track, where ranking SVMs
the output of the CNN model into a multi-layer perceptron (MLP)
[12] or language models [17, 32] performed better than mere BM25
network to predict the answer.
scoring. Despite those observations, BM25 has shown to provide at
We are inspired by the work of the Chen et al., which focuses
least competitive results in many cases, so that we consider it as
on a factoid question and answering system [5]. Their goal is to
part of our approach.
predict a sequence in the document to answer the query, as opposed
2.1.2 Word Embeddings. Word Embeddings have proven to be use- to our task of detecting an entailment relationship. They trained
ful in many natural language processing contexts. We outline sev- two multi-layer bi-directional LSTMs to encode the articles and the
eral works which have used this document feature representation query. For encoding the article, they extract multiple features from
for legal information retrieval. During the COLIEE ’18 competi- the query and document pairs: word embeddings of the document
tion, the SPABS team was able to overcome vocabulary mismatch (300-dimensional Glove embeddings), an exact matching flag, token
in some cases using an RNN-based solution with Word2Vec em- features (part-of-speech tags, named entity tags, normalized term
beddings trained on English legal documents [34]. Team UB used frequencies) and attention scores for the similarity of a document
word embeddings with PL2 term weighting [34]. Yoshioka et al. and the aligned query. These features are concatenated to form the
suggest to use semantic matching techniques for hard questions input vector for the LSTM that encodes the article. The question is
involving vocabulary mismatch combined with more reliable lexical encoded without extracting any features. Their evaluation is based
methods for easy questions [34]. This is the main motivation for on the top five pages returned by the algorithm, and results in 77.8%
our retrieval system, which incorporates lexical BM25 scoring and of correct answers on the SQuAD [23] dataset.
word embeddings as a semantic representation, respectively. Nanda et al. apply a hybrid network of LSTM networks coupled
with a CNN, with the final prediction based on a softmax classi-
2.1.3 Thresholding. Thresholding based on similarity values can fier [20]. They use pre-trained general-purpose word embeddings
improve retrieval results by filtering out low-scoring matches. Islam from the Google news corpus, consisting of 3 billion words. Their
and Inkpen use similarity thresholds to increase the precision of accuracy for the COLIEE ’17 competition was 53.8%, which they at-
text matching [10]. Stein et al. also employ thresholds for plagia- tribute to the general-purpose embeddings which may not capture
rized document retrieval [29]. In the COLIEE ’18 competition, team important semantic relationships needed for the legal domain.
UBIRLED use a similarity threshold for filtering out irrelevant case From these works, we conclude that LSTM architectures are
judgments [13]. Nanda et al. select the top-5 matching documents suitable for entailment detection for open-domain tasks. However,
from a topic clustering approach [20]. Given the document with the COLIEE dataset poses a challenge for deep learning models due
the highest similarity score to the query, they apply thresholding, to the specific meaning of terms in the legal domain and the rather
such that any further document will be incorporated into the result small size of the dataset. Therefore, we refrain from training word
set if the distance to the topmost document is less than 15%. Our embeddings on the statute law competition corpus only, but con-
approach uses a similar criterion for document inclusion. sider using other general-purpose word embeddings and a slightly
different architecture compared to the previous work. We also find
2.2 Legal Textual Entailment in the related work that extracting additional features from the
2.2.1 Deep Learning Approaches. Deep learning approaches have documents can improve the classifier performance.
been used by several authors for entailment detection, starting with
2.2.2 Thresholding. Thresholding for the entailment task is ap-
an application of a single-layered long short-term memory network
plied in two cases: First, the entailment detection can be done by
(LSTM) for input encoding by Bowman et al. [3]. The encoded
using a similarity threshold. This works similar to an attention layer
features from both texts are concatenated and passed through three
200-dimensional tanh layers to a softmax classifier for predicting 2 nlp.stanford.edu/projects/snli/

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller TIFFACADEMICS. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $12.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

79789 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Exam (elaborations)

Threshold-Based Retrieval and Textual Entailment Detection on Legal Bar Exam Questions

Document information

Subjects

Written for

Seller

Reviews received

Content preview