Opto-acoustic scene analysis for humanoid robots and man machine interaction:
Motivation:
- A humanoid robot should be able to orient itself in the environment and fulfill its everyday tasks
- Known persons and objects should be recognized and information about unknown entities has to be learned
- A person describes verbally and non-verbally the objects in a spatial context → Labeling and additional information
- Fusion of interactively learned information → Reduction of uncertainties
- Humans and the robot require a common language to communicate
- Robot has to identify multimodal object references (e.g. pointing gestures) and focus his attention on the object
- Achieving a shared point of reference is necessary to relate the verbal description and referred-to object
- The relation between the linguistic description and the visual appearance of an object has to be learned
- Limited computational resources for, e.g., multimodal object and person recognition, tracking, view direction estimation and gesture recognition
- microphone array
- localization of sound sources (e.g. objects or persons)
- classification of objects (e.g. kitchen appliances) and persons
- stereo camera
- localization and classification of objects
- detection and identification of persons
- Challenges
- Near distance ↔ Far distance
- Multiple targets, incl. occlusions
- Audio-visual fusion using particle filtering
- Efficient perception of all important persons and objects in the environment
- Hierarchical perception refinement
- More detailed information about entities over time
- Level of abstraction is reduced during exploration
- Fusion of different modalities for each attribute
- Automatic generation of multimodal models to recognize objects again at a later time – labeling by humans
- Adding of new information to any object at any time
Are you interested in this research field and looking for a project work or thesis?
Go to the students section to find a suitable topic.
Go to the students section to find a suitable topic.
