Das Forschungsseminar ist eine Veranstaltung, die sich an interessierte Studenten des Hauptstudiums richtet (bzw. Master oder höheres Semester Bachelor). Andere Interessenten sind jedoch jederzeit herzlich willkommen!
Die vortragenden Studenten und Mitarbeiter der Professur KI stellen aktuelle forschungsorientierte Themen vor. Vorträge werden in der Regel in Englisch gehalten. Das Seminar findet unregelmäßig im Raum 336 statt.
Den genauen Termin einzelner Veranstaltungen entnehmen Sie bitte den Ankündigungen auf dieser Seite.
Informationen für Diplom- und MasterstudentenDie im Studium enthaltenen Seminarvorträge (das "Hauptseminar" im Studiengang Diplom-IF/AIF bzw. das "Forschungsseminar" im Master) können ebenso im Rahmen dieser Veranstaltung durchgeführt werden. Beide Lehrveranstaltungen (Diplom-Hauptseminar und Master-Forschungsseminar) haben das Ziel, dass die Teilnehmer selbststängig forschungsrelevantes Wissen erarbeiten und es anschließend im Rahmen eines Vortrages präsentieren. Thematisch behandeln die Seminare das Gebiet der Künstlichen Intelligenz, wobei der Schwerpunkt auf Objekterkennung, Neurocomputing auf Grafikkarten und Multi-Core Rechnern, Reinforcement Lernen, sowie intelligente Agenten in Virtueller Realität liegt. Andere Themenvorschläge sind aber ebenso herzlich willkommen!
Das Seminar wird nach individueller Absprache durchgeführt. Interessierte Studenten können unverbindlich Prof. Hamker kontaktieren, wenn sie wenn sie ein Interesse haben, bei uns eine der beiden Seminarveranstaltungen abzulegen.
A network model of the function and dynamics of hippocampal place-cell sequences in goal-directed behavior
Mon, 21. 1. 2019, 13:00, Room 219
Hippocampal place-cell sequences observed during awake immobility often represent previous experience, suggesting a role in memory processes. However, recent reports of goals being overrepresented in sequential activity suggest a role in short-term planning, although a detailed understanding of the origins of hippocampal sequential activity and of its functional role is still lacking. In particular, it is unknown which mechanism could support efficient planning by generating place-cell sequences biased toward known goal locations, in an adaptive and constructive fashion. To address these questions, I propose a spiking network model of spatial learning and sequence generation as interdependent processes. Simulations show that this model explains the generation of never-experienced sequence trajectories in familiar environments and highlights their utility in flexible route planning. In addition, I report the results of a detailed comparison between simulated spike trains and experimental data, at the level of network dynamics. These results demonstrate how sequential spatial representations are shaped by the interaction between local oscillatory dynamics and external inputs.
Categorizing facial emotion expressions with attention-driven convolutional neural networks
Mon, 21. 1. 2019, 14:30, Room 219
The development of so-called deep machine learning techniques has brought new possibilities for the automatic processing of emotion-related information which can have great benefits for human-computer interaction. Vice versa machine learning can profit from concepts known from human information processing (e.g., visual attention). Being located in the spectrum of human and artificial intelligence, the aim of the present thesis was twofold: (a) to employ a classification algorithm for facial expressions of emotions in the form of a deep neural network incorporating a spatial attention mechanism on image data of facial emotion expressions and (b) to compare the output of the algorithm with results from human facial emotion recognition experiments. The results of this thesis show that such an algorithm can achieve state-of-the-art performance in a facial emotion recognition task. With regard to its visual search strategy some similarities with human saccading behavior emerged when the model's perceptive capabilities were restricted. However, there was only limited evidence for emotion-specific search strategies as can be found in humans.
Road scene semantic segmentation using residual factorized convnet and surround view fisheye cameras
Wed, 23. 1. 2019, 11:30, Room 367a
Automotive industry is continuously evolving, especially in the self-driving domain which creates a demand for new concepts to be developed, implemented and tested. At present the only sensor capable of sensing the immediate surrounding of the vehicle is a camera.This thesis addresses the 360 degrees road scene semantic segmentation problem for fisheye cameras. Present vehicles are equipped with distinct types of cameras used for various practical real-time applications, the most common camera model being the wide-angle fisheye cameras which are considered for this thesis. Usage of this camera brings two major challenges: firstly, CNN-based semantic segmentation task requires a huge amount of pixel-level annotated data. So far there is no open-source annotated dataset available for wide-angle images. Secondly, a fisheye camera introduces severe distortions and negates the positional invariance offered by a conventional pinhole camera model. To overcome this, training the images on transformed images that are augmented using a fisheye filter is proposed. An approach to integrate blocks which improve the representational power of existing architectures by explicitly modelling interdependencies between channels of convolutional features, has been tested. The experiments carried out prove the effectiveness of these blocks when augmented data is used. Much of the work presented in the thesis was devoted to a rigorous comparison of the architectures.The evaluation of the thesis is done on two different kind of datasets, a real world dataset and a synthetic dataset. The primary metric used for evaluation was the Intersection-over-Union (IoU). The results at the end of the thesis showed that a large amount of existing annotated data taken from pinhole cameras can be reused through augmentation and relatively small amount of annotated from fisheye cameras is required to account for domain shift. Further, the new architectures presented in this thesis show promising results when applied to augmented data.
3D reconstruction with consumer depth cameras
Manh Ha Hoang
Wed, 9. 1. 2019, Room 132
In this thesis, we develop an RGB-D camera-based system that is able to generate a 3D model of a single household object using a consumer depth (RGB-D) camera. The system then grabs textures of the object from a high-resolution DSLR camera and applies them to the reconstructed 3D model. Our approach specially addresses on generating a highly accurate 3D shape and recovering high-quality appearance of the object within a short time interval. The high-quality 3D texture object models can be used for the products of online shopping, augmented reality, and further research of 3D Machine Learning.
Hierarchical representations of actions in multiple basal ganglia loops
Wed, 5. 12. 2018, Room 132
I will introduce here three novel concepts, tested and evaluated by means of a neuro-computational model that brings together ideas regarding the hierarchical organization of the basal ganglia and particularly assigns a prominent role to plasticity. I will show how this model reproduces the results of two cognitive tasks used to measure the development of habitual behavior and introduce a model prediction.
Investigating reservoir-based reinforcement learning for robotic control
Wed, 28. 11. 2018, Room 132
Reservoir Computing is a relatively novel approach for training recurrent neural networks. It is based on generating a random recurrent reservoir as a part of the network and training only the readout of the reservoir. This separation makes the setup easy to be implemented and offers different directions for further research to be done. Existing methods for learning cognitive tasks often require continuous reward signals, which are not always available in cognitive tasks. However, this disadvantage can be avoided by using supralinear amplification on the trace of node-perturbation weight updates to suppress the relaxation-effect, as proposed by (Miconi, 2017). In this presentation, I will show how such a network can be applied to a robotic control task and investigate the role of the different parameters.
Model Uncertainty estimation for a semantic segmentation network with a real time network deployment analysis on Nvidia Drive PX2 for Autonomous Vehicles
Mon, 19. 11. 2018, Room 132
Autonomous vehicles require a high degree of perception capabilities in order to perceive the environment and predict objects therein at a high precision in real time. For such cases we use semantic segmentation networks. A major challenge in using semantic segmentation is determining how confident the network is in its prediction or in other words how trustworthy classification outcomes are. Integrating uncertainty estimates with semantic segmentation help us to understand the confidence measure with which a network predicts its output. Bayesian approaches along with dropouts provide us the necessary tool in deep learning to extract the uncertainty involved in the prediction from a model. In Bayesian Neural Networks, we place a distribution over the weights, giving us a probabilistic interpretation about the classification. For such networks, multiple Monte Carlo sampling is needed to generate a reliable posterior distribution from which we can infer uncertainty statistics. The serial nature of this sampling approach restricts its use in the real time environment. In this work through in-depth analysis we show the best possible places in a neural network to deploy dropouts along with the number of MC sampling which needs to be done such that we can maximize the quantifications to estimate uncertainty. We also exploit parallel capabilities of GPU to realize certain neural operations such as convolution and dropouts directly on an embedded hardware with minimal abstraction. As a result we propose the necessary alternative changes to the kernel functions needed to implement parallel Monte Carlo dropout sampling to estimate uncertainty in real-time. Finally, we provide a brief comparison in terms of benchmarking about the kernel implementations on a CPU (Intel Xeon processor) and a GPU (DrivePX2 and Nvidia Geforce 1080Ti).
Disentangling representations of grouped observations in adversarial autoencoders
Wed, 14. 11. 2018, Room 131
Being able to classify the shown emotion or facial action from mere pictures of faces is a challenging task in machine learning, since simple classification requires at least reliably labeled data, which is hard to get in sufficient quantity. Unsupervised learning methods can at least in part avoid the problem of dependency from such data, by finding representations that are meaningful. In my thesis I present an algorithm that teaches an Adversarial Autoencoder how to find representations of data. With clever administration of the training process it is possible to strip information from the representation that would not be beneficial for specific tasks like classification. This process is called disentangling and the administrative strategy is to find groups of data. I will show the results of some experiments that verify that the algorithm does what it promises and elaborate on where its weaknesses may be, by training an Adversarial Autoencoder on a colorful MNIST dataset and let it produce disentangled representations that separate style from content.
Interpreting deep neural network-based models for automotive diagnostics
Wed, 7. 11. 2018, Room 131
With the breakthrough of Artificial intelligence over the last few decades and extensive improvements in Deep Learning methodologies, the field of Deep Learning has gone through major changes. AI has outdone humans in computing complex tasks like object and image recognition, fault detection in vehicles, speech recognition, medical diagnosis etc. From a bird's-eye view the models are basically algorithms which try to learn concealed patterns and relationships from the data fed into it without any fixed rules or instructions. Although these models' prediction accuracies may be impressive, the system as a whole is a black-box (non-transparent). Hence, explaining the working of a model to the real world poses its own set of challenges. This work deals with interpreting vehicle fault-detection model. Current fault detection approaches rely on model-based or rule-based systems. With an increasing complexity of vehicles and their sub systems, these approaches will reach their limits in detecting fault root causes in highly connected and complex systems. Furthermore, current vehicles produce rich amounts of data valuable for fault detection which cannot be considered by current approaches. Deep Neural Networks (DNN) offer great capabilities to tackle these challenges and automatically train fault detection models using in-vehicle data. However, fault detection models based on DNNs (here, CNNs and LSTMs) are black boxes so it is nearly impossible to back-trace their outputs. Therefore, the aim of this work is to identify, implement and evaluate available approaches to interpret decisions made by DNNs applied in vehicle diagnostics. With that, decisions made by the DNN diagnostics model can be better understood to (i) comprehend the model's outputs and thus increase model performance as well as (ii) enhance their acceptability in vehicle development domain.
Learning the Motor Program of a Central Pattern Generator for Humanoid Robot Drawing
Thu, 1. 11. 2018, Room 132
In this research project, we present a framework where a humanoid robot, NAO, acquires the parameter of a motor program in a task of drawing arcs in Cartesian space. A computational model based on Central Pattern Generator is used. For the purpose of drawing a scene, geometrical features such as arcs are extracted from images using Computer Vision algorithms. The algorithm used in the project which considers only important features for the purpose of robot drawing is discussed. These arcs can be described as a feature vector. A discussion is done on how genetic algorithms help us in parameter estimation for the motor representation for selected feature vector. This understanding of parameters is used further to generalize the acquired motor representation on the workspace. In order to have a generalization for achieving a mapping between the feature vector and the motor program, we propose an approximation function using a multilayer perceptron (MLP). Once the network is trained, we present different scenarios to the robot and it draws the sketches. It is worth noting that our proposed model generalizes the motor features for a set of joint configuration, unlike the traditional way of robots drawing by connecting intermediate points using inverse kinematics.
Cortical routines - from experimental data to neuromorphic brain-like computation
Prof. Dr. Heiko Neumann (Ulm University, Inst. of Neural Information Processing)
Tue, 30. 10. 2018, Room 1/336
A fundamental task of sensory processing is to group feature items that form a perceptual unit, e.g., shapes or objects, and to segregate them from other objects and the background. In the talk a conceptual framework is provided, which explains how perceptual grouping at early as well as higher-level cognitive stages may be implemented in cortex. Different grouping mechanisms are implemented which are attuned to basic features and feature combinations and evaluated along the forward sweep of stimulus processing. More complex combinations of items require integration of contextual information along horizontal and feedback connections to bind neurons in distributed representations via top-down response enhancement. The modulatory influence generated by such flexible dynamic grouping and prediction mechanisms is time-consuming and is primarily sequentially organized. The coordinated action of feedforward, feedback, and lateral processing motivates the view that sensory information, such as visual and auditory features, is efficiently combined and evaluated within a multiscale cognitive blackboard architecture. This architecture provides a framework to explain form and motion detection and integration, higher-order processing of articulated motion, as well as scene segmentation and figure-ground segregation of spatio-temporal inputs which are labelled by enhanced neuronal responses. In addition to the activation dynamics in the model framework, steps are demonstrated how unsupervised learning mechanisms can be incorporated to automatically build early- and mid-level visual representations. Finally, it is demonstrated that the canonical circuit architecture can be mapped onto neuromorphic chip technology facilitating low-energy non-von Neumann computation.
Neural Reflexive Controller for Humanoid Robots Walking
Thu, 25. 10. 2018, Room 131
For nearly three decades, a great amount of research emphasis has been given in the study of robotic locomotion, where researchers, in particular, have focused on solving the problem of locomotion control for multi-legged humanoid robots. Especially, the task of imitating human walking has since been the most challenging one, as bi-pedal humanoid robots implicitly experience instability and tend to topple itself over. However, recently new machine learning algorithms have been approached to replicate the sturdy, dexterous and energy-efficient human walking. Interestingly many researchers have also proposed that the locomotion principles, although run on a centralized mechanism (central pattern generator) in conjunction with sensory feedback, they can also independently run on a purely localized sensory-feedback mechanism. Therefore, this thesis aims at designing and evaluating two simple reflex-based neural controllers, where the first controller generates a locomotion pattern for the humanoid robot by combining the sensory feedback pathways of the ground and joint sensors to the motor neuron outputs of the leg joints. The second controller makes use of the Hebb's learning rule by first deriving locomotion patterns from the MLMP-CPG controller while observing the sensory feedback simultaneously and finally generating motor-neuron outputs associatively. In the end, this thesis also proposes a fast switching principle where the output to motorneurons after a certain interval is swiftly transferred from the MLMP-CPG to the associative reflex controller. This is implemented to observe adaptive behavior present for centralized locomotor systems.
Improving autoregressive deep generative models for natural speech synthesis
Ferin Thunduparambil Philipose
Wed, 24. 10. 2018, Room 132
Speech Synthesis or Text To Speech (TTS) synthesis is a domain that has been of research interest for several decades. A workable TTS system would essentially generate speech from textual input. The quality of this synthesized speech would be gauged based on how similar it sounds to the human voice and the ease of understanding it clearly. .A fully end to end neural Text-To-Speech system has been set up and improved upon, with the help of WaveNet and Tacotron deep generative models. The Tacotron network acts as a feature prediction network that outputs the log-mel spectrograms, which are in-turn utilized by WaveNet as the local conditioning features. Audio quality was improved by the logmel local conditioning and the fine-tuning of hyper-parameters such as mini-batch size & learning rate. Computational effort was reduced by compressing the WaveNet network architecture.
Fatigue detection using RNN and transfer learning
Wed, 24. 10. 2018, Room 132
Driving car is a insecure activity which requires full attention. Any distraction can lead to dangerous consequences, such as accidents. While driving, many factors are involved, such as: fatigue, drowsiness, distractions. Drowsiness is a state between alert and sleep. For this reason, it is important to detect drowsiness in advance which will help in protecting the people from accidents. The research guides us to understand an implicit and efficient approach to detect the different levels of drowsiness. Every driver has different driving patterns. The developed system should be able to adopt to the changes of driver?s behavior. The aim of this thesis is to contribute to the study of detecting drivers drowsiness levels while driving through different approaches which integrates of two sensory data to improve detection performance.
Car localization in known environments
Tue, 2. 10. 2018, Room 131
Localization in a broader sense is very wide topic and at present basic localization takes place with the help of GPS sensor but lacks accuracy which is important for Autonomous driving. To overcome this problem, there are different environmental sensors used (typically, Sonar, Lidar, Camera). Lidar sensor being very accurate in case of depth perception is the used. In this thesis, Simultaneous Localization And Mapping (SLAM) approach is selected. SLAM, as name suggested Localization and mapping is chicken egg problem and to solve it, we are creating map of an environment before performing localization. For mapping, Gmapping and for localization within map, Adaptive Monte Carlo Localization (AMCL) is selected. AMCL is basically a particle filter. After giving a map of an environment, the algorithm estimates the position and orientation of a car as it moves and senses the environment.
Image anonymization using GANs
Mon, 24. 9. 2018, Room 131
Millions of images are being collected every day for applications to enable scene understanding, decision making, resource allocation and policing to ease the human life. Most of these applications doesn't require the identity of the people in the images.There is an increasing concern in these systems invading the privacy of the users and the public. On one side, the camera/robots can assist a lot in everyday life, but on the other side, the privacy of the user or the public should not be compromised. In this master thesis, a novel approach was implemented to anoymize faces in the datasets which enable privacy protection of the individuals in the datasets. The Generative Adversarial Network(GAN) approach was extended and the loss function was formulated in a combined fashion. The performance of conventional image anonymization techniques like blurring, cropping, pixelating were compared against GAN generated images using autonomous driving applications like object detection and semantic segmentation.
Training approaches onsemantic segementation using transfer learning, dataset quality assessment and intelligent data augmentation
Mohamed Riyazudeen Puliadi Baghdad
Mon, 24. 9. 2018, Room 131
Data Sparsity is one of the key problems that automotive industries face today. One way to overcome this is to use synthetic data that are generated from graphics engines or virtual world generator, that can be leveraged to train neural networks and accomplish tasks such as autonomous driving. The features learned from synthetic data yield better performance with a suitable training approach and some real data. The number of images in the synthetic dataset, and its similarity to real world dataset play a major role in transferring the learned features effectively across domains. This similarity in the distribution of these datasets was achieved through different approaches, the most effective one being Joint Adaptation Network Approach. Also, data augmentation in a smart way could boost the performance achieved. Intelligent data augmentation was achieved using conditional Generative Adversarial Networks and Color Augmentation technique. With the findings of this research work, a possible solution for tackling data sparsity problem was achieved.
Investigating Model-based Reinforcement Learning Algorithms for Continuous Robotic Control
Wed, 19. 9. 2018, Room 368
Obwohl model-free, deep Reinforcement Learning eine immer größer werdende Bandbreite an Aufgaben erfüllen kann, leiden die jeweiligen Algorithmen an einer großen Ineffizienz bezüglich der dafür erforderlichen Datenmenge. Model-based Reinforcement Learning, welches ein Dynamics Model der Umwelt erlernt, verspricht hierbei Abhilfe. Jüngste Forschungen kombinieren model-free Algorithmen mit model-based Ansätzen, um die Stärken beider Reinforcement Learning-Zweige auszunutzen. In meiner Verteidigung gebe ich eine Einleitung in model-based Reinforcement Learning und einen Überblick über die mögliche Nutzung von Dynamics Models, wie sie in neusten Publikationen zu finden ist. Wir konzentrieren uns dabei auf Umgebungen mit kontinuierlichen Action Spaces, wie sie in der Robotik anzutreffen sind. Temporal Difference Model ist ein solcher Hybrid aus model-free Learning mit model-based Control. Dieser wird im Detail vorgestellt und ausgewertet.
Sensor simulation and Depth map prediction on Automotive Fisheye camera using automotive deep learning
Deepika Gangaiah Prema
Wed, 12. 9. 2018, Room 131
The aim is to create a synthetic 3D environment which enables to obtain a supervised dataset using Unity framework and simulating different sensors like lidar and fisheye camera in the simulation environment. This dataset will be used to develop, test and validate different machine learning algorithms for automotive use cases. The big advantage of the simulation environment is the possibility to generate data from different sensors which are still under development and the final hardware is still not available. Another advantage is that the known ground truth of the simulation environment. This much cheaper than equipping a vehicle with those sensors, record lots of data and manually label the ground truth by humans. The 3D environment shall include urban and highway driving scenarios with balanced object categories like vehicles, pedestrians, trucks, terrain and street or free space to cover all levels for autonomous driving The simulation of a fish eye camera such as next generation lidar will be carried out in the thesis on the same Unity 3D framework, the generated images and point cloud data are used to generate different data sets. The final goal is to use this for training different models and test them on a real environment. Qualitative test are carried out by benchmarking the data sets with the aid of different algorithms. The aim of this thesis is to study the different approaches with which CNNs could be used in the task of depth estimation from a single fisheye camera image (180 degree FoV) for Autonomous Driving.
Humanoid robot learns walking by human demonstration
Tue, 14. 8. 2018, Room 131
In this thesis, a method designed for making the humanoid robot walking is developed by using the Q learning based on MLMP-CPG and wrist sensors. Machine learning has demonstrated a promising feature in many fields including robotics. However, the supervised learning algorithms are more often applied. However, supervised learning like neural networks always need a massive amount of data to train, which is sometimes not permitted in the real situation. Although not much data is required in reinforcement learning, it needs many attempts in its environment thus concluding a strategy. For a humanoid robot, it is not allowed to have too many wrong attempts because a fall may lead to the injury of joints. In this thesis, a method that the robot learns walking with the help of a human can avoid accidental fallings is proposed.
Digital Twin Based Robot Control via IoT Cloud
Tue, 14. 8. 2018, Room 131
Digital Twin (DT) technology is the recent key technology for Industry 4.0 based monitoring and controlling industrial manufacturing and production. There are a lot of researches and development happening on DT based robot control. Monitoring and controlling the robot from a remote location is a complex process. In this research work, I have developed a prototype for controlling a robot using DT and cloud computing. Different technologies and techniques related to Digital Twin have been researched and analyzed to prepare an optimal solution based on this prototype. In this work, the latency of different types of machine to machine (M2M) communication protocols is observed. Different type of network protocols such as AMQP, MQTT, and HTTP has a lot of latency variation in the end to end data transfer communication. Furthermore, different external factors impact on persistent communication. For example, the cloud computing service as like as Azure?s data processing and throughput is not constant-time. A robot controlling mechanism expects a minimum constant time response for the quality of service. In this research, the main focus was to minimize communication latencies for a remote robot controller in a cloud-based communication. Finally, an average quality of service in the range of 2-5 seconds for persistent robot communication has been achieved based on different setup.
Vision-based Mobile Robotics Obstacle Avoidance with Deep Reinforcement Learning
Wed, 8. 8. 2018, Room 131
Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots. In this thesis, the problem of obstacle avoidance in simple 3D environments where the robot has to rely solely on a single monocular camera is considered. Inspired by the recent advantages of deep reinforcement learning (DRL) in Atari games and understanding highly complex situations in Go, the obstacle avoidance problem is tackled in this thesis as a data-driven end-to-end deep learning approach. An approach which takes raw images as input, and generates control commands as output is presented. The differences between discrete and continuous control commands are compared. Furthermore, a method to predict the depth images from monocular RGB images using conditional Generative Adversarial Networks (cGAN) is presented and the increase in learning performance by additionally fusing predicted depth images with monocular images is demonstrated.
Deep Convolutional Generative Adversarial Networks (DCGAN)
Tue, 24. 7. 2018, Room 132
Generative Adversarial Networks (GAN) have made great progress in the recent years. Most of the established recognition methods are supervised, which have strong dependence on image labels. However obtaining large number of image labels is expensive and time consuming. In this project, we investigate the unsupervised representation learning method that is DCGAN. We base our work on previous paper by Radford and al., and aim to replicate their results. When training our model on different datasets such as MNIST, CIFAR-10 and Vehicle dataset, we are able to replicate some results for e.g. smooth transmission.