Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1 €4,99   Ajouter au panier

Resume

Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1

2 revues
 233 vues  10 fois vendu
  • Cours
  • Établissement
  • Book

Dit is een samenvatting van het eerste deeltentamen van het vak Taaltheorie en Taalverwerking van de Universiteit van Amsterdam. De samenvatting is een mix van stof uit het boek (Jurafsky) en extra stof uit de colleges. De samenvatting is op volgorde van de colleges

Aperçu 3 sur 25  pages

  • Inconnu
  • 18 avril 2020
  • 25
  • 2015/2016
  • Resume

2  revues

review-writer-avatar

Par: ramtinbashardoost80 • 6 mois de cela

review-writer-avatar

Par: luuk90 • 2 année de cela

avatar-seller
Taaltheorie en Taalverwerking Samenvatting 1
Lecture 1a
Reasons for studying natural language processing (NLP) within AI:
 You want a computer to communicate with users in their terms.
 There is a vast store of information recorded in natural language that can be accessible via
computers: news, government reports, social media etc.
 Three main types of applications:
o Enabling human-computer communication
o Improving human-human communication
o Doing useful processing of text or speech

Formal (FLs) and Natural (NLs) languages:
 Formal (computer) languages: e.g. Java, Prolog, Python, HTML
 Natural (human) languages: e.g. Dutch, English, Spanish, German, Japanese
 ‘Languages’ that represent the behavior of a machine or a system: e.g. think about
‘communicating’ with a vending machine via coin insertions and button presses: pressButton1

Syntax trees:
 Syntax: Describes the structural properties of the language. Natural language is much more
complicated than the formal languages used for the artificial languages of logics and computer
programs.




Natural language semantics:
 Semantics: Represents the meaning of words and sentences. Logic is a good candidate.
 Consider the sentence:
o Every student has access to a computer
 The meaning of this can be expressed as two different logical formulas:
o ∀x.(student(x) ⇒ ∃y.(computer(y) ∧ hasAccessTo(x, y)))
 Every student has access to their own computer
o ∃y.(computer(y) ∧ ∀x.(student(x) ⇒ hasAccessTo(x, y)))
 There is a computer to which every student has access to
 Problem: How can (either of) these formulas be mechanically generated from a syntax tree for
the original sentence?

FLs and NLs:
 There are close relationships between FLs and NLs, but also important differences:
o FLs can be pinned down by a precise definition.
o NLs are fluid, fuzzy at the edges and constantly evolving.
o NLs are riddled with ambiguity at all levels.
 This is normally avoidable in FLs.

,Ambiguity in Natural Language:
 Phonological ambiguity: ‘an ice lolly’ vs. ‘a nice lolly’
 Lexical ambiguity: ‘bass’ has at least two meanings: fish and musical instrument
 Syntactic ambiguity: two possible syntax trees for ‘complaints about referees multiplying
o Could mean:
 There are more and more complaints about referees
 Referees are multiplying and there are complaints about that
 Semantic ambiguity: ‘Please use all available doors while boarding the train’ vs. ‘Please fill all
sections in the questionnaire’
o First sentence: Doesn’t mean that you personally have to use every single door
o Second sentence: Does mean that you have to fill in every single section

Levels of Language Complexity:
 Some languages features are ‘more complex’ (harder to describe, harder to process) than
others. We can classify languages on a scale of complexity known as the Chomsky Hierarchy:
o Regular languages: Those whose phrases can be ‘recognized’ by a finite state machine
o Context-free languages: Many aspects of NLs can be described at this level (also used
for most programming languages)
o Context-sensitive languages: Some NLs involve features at this level of complexity
o Recursively enumerable languages: All languages that can in principle be defined via
mechanical rules.

Formal Languages:
 A formal language is a set of strings
 Each string is composed of symbols from a set called an alphabet (or a vocabulary)
o Examples of alphabets:
 English letters: Σ = {a, b, c . . . z}
 Decimal digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 Programming language ‘tokens’: Σ = {if, while, x, ==}
 ‘Primitive’ actions performed by a machine or system, e.g. a vending machine
Σ = {insert50c, pressButton1, ...}
 Words in (some fragment of) a natural language: Σ = {the, an, a, dog, cat,
sleeps}
o Examples of strings over alphabets:
 Let Σ1 = {0, 1} be an alphabet. Then 01101, 000001, 1101 are strings over Σ1.
— in fact, all binary numbers are strings over Σ1
 Let Σ2 = {a, b, c, d, e, f, g} be an alphabet. Then bee, dad, cabbage, and face
are strings over Σ2, as are fffff and agagag.
 Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but
bee, dad or cabbage are not.
 Let Σ4 = {♠, ©, ♣} be an alphabet. Then ♠♠ and ♣©♣ are strings over Σ4.
 The length of a string is the number of token symbols from the alphabet it contains
o Examples:
 The length of ‘face’ over Σ2 = {a, b, c, d, e, f, g} is 4
 The length of ‘face’ over Σ3 = {ba, ca, fa, ce ,fe, ge} is 2
 The string of length 0 is called the empty string, denoted by ε (epsilon).
 Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the
order in which they occur in s.
 An initial substring is called prefix and a final substring a suffix.

, o Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ε, un, unth, unthinkable
are prefixes, while ε, e, able, thinkable, and unthinkable are suffixes. Other substrings
include nthi, inka, bl.
 Σ* denotes the set of all strings over an alphabet Σ
o Σ is always infinite, regardless of the number of symbols Σ contains.
 We may now define a formal language L over an alphabet Σ as any subset of Σ*: L ⊆ Σ*
o Examples: let Σ = (a, b, c … x, y, z}. Then Σ* is the set of strings over the Latin alphabet
and the following subsets of Σ* are possible formal languages:
 The set of strings consisting of consonants (medeklinker) only
 The set of strings containing at least one vowel (klinker) and one consonant
 The set of strings whose length is less than 9 symbols
 The set {one, two, tree, four, five, six, seven, eight, nine, ten}
 The set of all English words
 The empty set

Ways to define a Formal Language:
 Given an alphabet Σ and the infinite set Σ* of formal languages it can give rise to, how can we
select a particular formal language?
o Direct mathematical definition: Σ = {a, b, c}, L1 = {aa, bb, abc, abcc}, L2 = {a nbn |n > 0}
o Formalisms (formal expressions and grammars): sets of rules
o Automata: computational devices for computing languages
 Specify some machine for testing whether a string is legal or not
 Formalisms and automata allow us to distinguish a formal language of interest (a set of
strings) from other possible languages over a given alphabet
o They capture the patterns that characterize a language
o As such, they act as a definition of the language they capture
 From an abstract point of view, a natural language, like Dutch, is a set of strings
(sounds/letters/words etc.)
 Therefore, formalisms and automata can help us to model aspects of natural languages

Regular expressions:
 Regular expressions are a formal notation for characterizing sets of strings that follow a fairly
simple regular pattern.
 We can construct regular expressions over an alphabet Σ as follows:
(these regular expressions are written in mathematical notation)




o We often ignore the dot in concatenation, and simply write ab
o Disjunction (or union) may be written as a|b a+b or a∪b
o a+ is the set of a-strings with at least one a (same as a*a or aa*)
o an can be used to abbreviate the concatenation of a with itself n times
o the notation Σ* can be seen as abbreviating (a|b|…)* for any symbol a, b, … in Σ
 Examples: Let Σ = {a, b, c … x, y, z}
o me(o)*w mew, meow, meooow, meooooooooooow etc.

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur kimgouweleeuw. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €4,99. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

80467 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€4,99  10x  vendu
  • (2)
  Ajouter