100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Automatic Detection of Answers to Research Questions from Medline Abstracts $15.49   Add to cart

Exam (elaborations)

Automatic Detection of Answers to Research Questions from Medline Abstracts

 9 views  0 purchase
  • Course
  • Automatic Detection
  • Institution
  • Automatic Detection

1 Introduction The large amount of medical literature hinders professionals from analyzing all the relevant knowledge to particular medical questions. Search engines are increasingly used to access such information. However, such systems retrieve documents based on the appearance of the query...

[Show more]

Preview 2 out of 6  pages

  • August 10, 2024
  • 6
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
  • Automatic Detection
  • Automatic Detection
avatar-seller
StudyCenter1
Automatic Detection of Answers to Research Questions from Medline
Abstracts
Abdulaziz Alamri and Mark Stevenson
Department of Computer Science
The University of Sheffield
Sheffield, UK
adalamri1@sheffield.ac.uk; mark.stevenson@sheffield.ac.uk


Abstract Incorporating a middle tier system between the
search engine and the user will be useful to min-
Given a set of abstracts retrieved from a imize the effort required to filter the results. This
search engine such as Pubmed, we aim to research presents a system that aids those search-
automatically identify the claim zone in ing for studies that discuss a particular research
each abstract and then select the best sen- question. The system acts as a mediator between
tence(s) from that zone that can serve as the search engine and the user. It interprets the
an answer to a given query. The system search engine results and returns the most infor-
can provide a fast access mechanism to the mative sentence(s) from the claim zone of each
most informative sentence(s) in abstracts abstract that are potential answers to the research
with respect to the given query. question. The system reduces the cognitive loads
on the user by assisting their identification of rele-
1 Introduction vant claims within abstracts
The large amount of medical literature hinders The system comprises two components. The
professionals from analyzing all the relevant first component identifies the claim zone in
knowledge to particular medical questions. Search each abstract using the rhetorical moves principle
engines are increasingly used to access such in- (Teufel and Moens, 2002), and the second compo-
formation. However, such systems retrieve docu- nent uses the sentences in the claim zone to pre-
ments based on the appearance of the query terms dict the most informative sentence(s) from each
in the text despite the fact that they may describe abstract to the given query.
another problem. This paper makes three contributions: present-
The search engine Pubmed R for example is a ing a new set of features to build a classifier to
well known IR system to access more than 24 mil- identify the structure role of sentences in an ab-
lion abstracts for the biomedical literature includ- stract that is at least shows similar performance to
ing Medline R (Wheeler et al., 2008). The engine the current systems; building a classifier to detect
takes a query from user and returns a list of ab- the best sentence(s) (lexically) that can be an an-
stracts that can be relevant or partially irrelevant swer to a given query; and introducing a new fea-
to the query, which requires from the user to go ture (Z-score) for this task.
through each abstract for further analysis and eval-
2 Related Work
uation.
Researchers who conduct a systematic review We are not aware of any work that has explicitly
(Gough et al., 2012) tend to use the same approach discussed the detection of claim sentence most re-
to collect the studies of interest; however, they lated to a predefined question, however, studies
are found to spend significant effort identifying have discussed related research.
the studies that are relevant to the research ques- Ruch et al. (2007) for example used the rhetori-
tion. Relevancy is usually measured by scanning cal moves approach to identify the conclusion sen-
the result and conclusion sections to identify au- tences in abstracts. Their system was based on a
thors claim and then comparing the claim with the Bayesian classifier, and normalized n-grams and
review question; where a claim can be defined as relative position features. The main objective of
the summary of the main points presented in a re- that research was to identify sentences that belong
search argument. to the conclusion sections of abstracts; they re-


141
Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), pages 141–146,
Beijing, China, July 30, 2015. c 2015 Association for Computational Linguistics

, garded such information as key information to de- National Library of Medicine (NLM) have re-
termine the research topic. Our research is similar ported that 2,779 headings have been used to label
to that work since we use the conclusion section abstracts sections in Medline (Ripple et al., 2012).
to identify the key information in an abstract with Relying on the labels provided by the abstracts
respect to a query, but we also include the result authors to identify the roles of the sentences could
sections. be useful for research purpose; but in practice
Hirohata et al. (2008) showed a similar sys- this means all Medline abstracts need to be re-
tem using CRFs to classify the abstract sentences annotated even the structured abstracts to guaran-
into four categories: objective, methods, results, tee that they are labelled with the same set of an-
and conclusions. That classifier takes into account notations to understand their roles. This is not ef-
the neighbouring features in sentence Sn such as ficient especially when we consider the huge vol-
the n-grams of the previous sentence Sn−1 and the ume of the Medline repository.
next sentence Sn+1 .
To accommodate that problem, we use the NLM
Agarwal et al. (2009) described a system that
category value assigned to each section in the
automatically classifies sentences appear in full
XML abstract (nlmCategory attribute). The NLM
biomedical articles into one of four rhetorical cat-
assigns five possible values (categories): Objec-
egories: introduction, methods, results and discus-
tive, Background, Methods, Results and Conclu-
sions. The best system was achieved using Multi-
sions. This research uses these categories as an
nominal Naive Bayes. They reported that their
alternative way to learn the roles of abstracts sen-
system outperformed their baseline system which
tences. This resolves two problems: first, the roles
was a rule-based.
of sentences in structured abstracts can be auto-
Recently, Yepes et al. (2013) described a system
matically learned from the the value of the nlm-
to index Gene Reference Into Function (GeneRIF)
Category attribute without any further processing,
sentences that show novel functionality of genes
consequently, the roles of sentences in 30% of
mentioned in Medline. The goal of that work
the Medline abstracts can be accurately identified;
was to choose the most likely sentences to be se-
second, those labels can be used to build a machine
lected for GeneRIF indexing. The best system was
learning classifier to predict the role sentences of
achieved using Naive Bayes classifier and various
the unstructured abstracts in Medline.
features including the discourse annotations (the
NLM category labels) for the abstracts sentences. The claim zoning component regards identify-
Our research is close to Hirohata et al. (2008) ing the roles of sentences as a sequence labelling
system since we use the same algorithm, but use a problem. This requires an algorithm that takes
different set of features to build the model. More- into account the neighbouring observations rather
over, it similar to Yepes et al.(2013) system since than only current observation as in other ordinary
we use the value of the nlmCategory attribute classifiers e.g. SVM and Naive bayes. Condi-
rather than the labels provided by the authors to tional Random Fields (CRF) algorithm have been
learn the role of sentences. used successfully for such task (Hirohata et al.,
2008; Lin et al., 2009). Therefore, we use the
3 Method CRF algorithm along with lexical, structural and
sequential features to build a classifier model to
3.1 Claim Zoning Component identify the claim zones in abstracts. The clas-
This component is based on the hypothesis that the sifier is implemented using the CRFsuite library
contribution of a research paper tend to be found (Okazaki, 2007) using L-BFGS method. Note that
within the result or conclusion sections of its ab- we modify the NLM five categories to become
stract (Lin et al., 2009). Identifying these sections four where the Background and Objective cate-
manually especially in unstructured abstracts is a gories are merged into a new category called Intro-
tedious task. Medical abstracts tend to have logi- duction. That is because the background and ob-
cal structure (Orasan, 2001) in which each section jectives sections in Medline tend to overlap with
represent a different role. each other (Lin et al., 2009). Moreover, these
Unfortunately, about 70% of Medline abstracts sections usually appear sequentially and merging
are unstructured (have no section labels). Struc- them together is sensible to avoid the overlapping
tured abstracts use a variety of these labels. The problem. Therefore, this component identifies the


142

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller StudyCenter1. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $15.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

78075 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$15.49
  • (0)
  Add to cart