Exam (elaborations)

Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers

5 views 0 purchase

Course
Audio-Visual Processing

Institution
Audio-Visual Processing

2 What Has Been Said During the Meeting? Meetings are an audio visual experience by nature, information is presented for example in the form of presentation slides, drawings on boards, and of course by verbal communication. The latter forms the backbone of most meetings. The automatic transcrip...

[Show more]

Preview 2 out of 12 pages

View example

Uploaded on August 6, 2024
Number of pages 12
Written in 2024/2025
Type Exam (elaborations)
Contains Questions & answers

audio visual processing in meetings seven questio
1 institute for human machine communication techn
2 what has been said during the meeting

Institution Audio-Visual Processing
Course Audio-Visual Processing

Tutorgrades Member since 11 months 53 documents sold

$16.49

Added

Add to cart Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Audio-Visual Processing in Meetings:
Seven Questions and Current AMI Answers

Marc Al-Hames1 , Thomas Hain2 , Jan Cernocky3 , Sascha Schreiber1 ,
Mannes Poel4 , Ronald Müller1 , Sebastien Marcel5 , David van Leeuwen6 ,
Jean-Marc Odobez5 , Sileye Ba5 , Herve Bourlard5 , Fabien Cardinaux5 ,
Daniel Gatica-Perez5 , Adam Janin8 , Petr Motlicek3,5 , Stephan Reiter1 ,
Steve Renals7 , Jeroen van Rest6 , Rutger Rienks4 , Gerhard Rigoll1 ,
Kevin Smith5 , Andrew Thean6 , and Pavel Zemcik3 ⋆⋆
1
Institute for Human-Machine-Communication, Technische Universität München
2
Department of Computer Science, University of Sheffield
3
Faculty of Information Technology, Brno University of Technology
4
Department of Computer Science, University of Twente
5
IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL)
6
Netherlands Organisation for Applied Scientific Research (TNO)
7
Centre for Speech Technology Research, University of Edinburgh
8
International Computer Science Institute, Berkeley, CA

Abstract. The project Augmented Multi-party Interaction (AMI) is
concerned with the development of meeting browsers and remote meet-
ing assistants for instrumented meeting rooms – and the required com-
ponent technologies R&D themes: group dynamics, audio, visual, and
multimodal processing, content abstraction, and human-computer inter-
action. The audio-visual processing workpackage within AMI addresses
the automatic recognition from audio, video, and combined audio-video
streams, that have been recorded during meetings. In this article we
describe the progress that has been made in the first two years of the
project. We show how the large problem of audio-visual processing in
meetings can be split into seven questions, like “Who is acting during
the meeting?”. We then show which algorithms and methods have been
developed and evaluated for the automatic answering of these questions.

1 Introduction
Large parts of our working days are consumed by meetings and conferences.
Unfortunately a lot of them are neither efficient, nor especially successful. In a
recent study [12] people were asked to select emotion terms that they thought
would be frequently perceived in a meeting. The top answer – mentioned from
more than two third of the participants – was “boring”; furthermore nearly one
third mentioned “annoyed” as a frequently perceived emotion. This implies that
many people feel meetings are nothing else, but flogging a dead horse.
⋆⋆
This work was partly supported by the European Union 6th FWP IST Integrated
Project AMI (Augmented Multi-party Interaction, FP6-506811).

, Things get from bad to worse if transcriptions are required to recapitulate de-
cisions or to share information with people who have not attended the meeting.
There are different types of meeting transcriptions: they can either be written
by a person involved in the meeting and are therefore often not exhaustive and
usually from the particular perspective of this person. Sometimes they are only
hand-written drafts that can not easily be shared. The second type are profes-
sional minutes, written by a person especially chosen to minute the meeting,
usually not involved in the meeting. They require a lot of effort, but are usually
detailed and can be shared (if somebody indeed takes the time to read over
them). The third and most common transcript is no transcript at all.
Projects, like the ICSI meeting project [14], Computers in the human inter-
action loop (CHIL) [29], or Augmented Multi-party Interaction (AMI) [7] try to
overcome these drawbacks of meetings, lectures, and conferences. They deal with
the automatic transcription, analysis, and summarisation of multi-party interac-
tions and aim to both improve the efficiency, as well as to allow a later recapitu-
lation of the meeting content, e.g with a meeting browser [30]. The project AMI
is especially concerned with the development of meeting browsers and remote
meeting assistants for instrumented meeting rooms – and the required compo-
nent technologies R&D themes: group dynamics, audio, visual, and multimodal
processing, content abstraction, and human-computer interaction.“Smart meet-
ing rooms” are equipped with audio-visual recording equipment and a huge range
of data is captured during the meetings. A corpus of 100 hours of meetings is
collected with a variety of microphones, video cameras, electronic pens, presen-
tation slide and whiteboard capture devices. For technical reasons the meetings
in the corpus are formed by a group of four persons.
The first step for the analysis of this data is the processing of the raw audio-
visual stream. This involves various challenging tasks. In the AMI project we
address the audio-visual recognition problems by formulating seven questions:

1. What has been said during the meeting?
2. What events and keywords occur in the meeting?
3. Who and where are the persons in the meeting?
4. Who in the meeting is acting or speaking?
5. How do people act in the meeting?
6. What are the participants’ emotions in the meeting?
7. Where or what is the focus of attention in meetings?

The audio-visual processing workpackage within the AMI project aims to
develop algorithms that can automatically answer each of these questions from
the raw audio-visual streams. The answers can then be used either directly during
or after the meeting (e.g. in a meeting browser), or as an input for a higher level
analysis (e.g. summarisation). In this article we describe the progress that has
been made in the first two AMI project years towards the automatic recognition
from audio-visual streams, and thus towards answering the questions. Each of
the next chapters discusses algorithms, methods, and evaluation standards for
one of the seven questions and summarises the experiences we made.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Tutorgrades. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $16.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

82871 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Exam (elaborations)

Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers

Document information

Subjects

Written for

Seller

Reviews received

Content preview