Themenpool studentische Arbeiten | Lehre | Professur Graphische Datenverarbeitung und Visualisierung | Fakultät für I…

Die folgende Liste enthält einige ausgewählte Themen, die derzeit im Rahmen studentischer Arbeiten ausgeschrieben sind. Weitere Themen erhalten Sie über die Mitarbeiter der Professur.

Virtual Presence: Übertragen des menschlichen Körpers in Augmentierte Realität

Ansprechpartner: M. Sc. Carsten Rudolph

Inhalt:

Mittels virtueller Augmentierungen ist es möglich, die wahrgenommene Realität visuell zu erweitern. In der Regel beschränkt sich diese Erweiterung auf nicht-belebte Objekte oder nicht-menschliche Avatare. Jedoch ermöglichen Tiefenkameras die Erfassung von Körpern und deren Bewegungen. Ziel des Praktikums ist es, eine Tiefenkamera (Microsoft Kinect) mit einer stereoskopischen Augmented Reality (basierend auf HTC Vive) Anwendung zu koppeln, so dass die von der Kamera aufgenommene Person als Abbild in der Augmentierten Realität erscheint. Dadurch ergeben sich verschiedene Einsatzszenarien. Beispielsweise muss sich die aufgenommene Person nicht zwangsläufig am gleichen Ort befinden, wodurch ein AR Chatraum erzeugt werden kann. Alternativ können wir auf diese Art ein virtuelles Hologramm von uns selbst erstellen. Beide Szenarien sollen im Rahmen des Praktikums untersucht werden. Die notwendige Technik wird dabei von uns zur Verfügung gestellt.

Entwicklungsarbeiten im Rahmen unseres Visualisierungsframeworks

Ansprechpartner: M. Sc. Tom Uhlmann

Inhalt:

[English Version below]
Die Professur Graphische Datenverarbeitung und Visualisierung an der Technischen Universität Chemnitz entwickelt aktiv ein Open-Source-Cross-Plattform-Visualisierungsframework für wissenschaftliches Arbeiten und Lehre namens CrossForge. Dieses Framework ist unter https://github.com/Tachikoma87/CrossForge online verfügbar und setzt auf OpenGL, ist mit C++ umgesetzt und verwendet CMake und vcpkg für den Buildprozess.

Wir suchen engagierte und tatkräftige Studenten mit einem hohen Maß an Eigeninitiative, Selbstorganisation und Kreativität zur Erweiterung des Frameworks. Mögliche Aufgabenbereiche umfassen:

1. Implementierung moderner Visualisierungs- und Datenverarbeitungstechniken:
- Erweiterung der Funktionalität durch die Implementierung von Techniken wie Mesh Processing und Computer Vision.

2. Erstellung von interaktiven Lehr- und Lerneinheiten:
- Entwicklung von interaktiven Materialien im Bereich Computergrafik, um den Einsatz des Frameworks für Lehrzwecke zu optimieren.

3. Dokumentation vorhandener Tutorials:
- Verfassen von ausführlichen Dokumentationen für bestehende Tutorials, um die Benutzerfreundlichkeit zu verbessern. Diese Aufgabe kann auch im Rahmen einer HiWi-Stelle durchgeführt werden.

Sollten Sie Interesse an einer Mitarbeit an unserer Software haben, melden Sie sich bitte beim angegebenen Ansprechpartner. Wenn Sie besonders interessante und anspruchsvolle Projektideen haben, besteht die Möglichkeit, diese im Rahmen einer Abschlussarbeit zu realisieren.

[English Version]

The Chair for Computer Graphics and Visualization at the Chemnitz University of Technology is actively developing a platform-independent open source visualization framework for scientific work and teaching called CrossForge. This framework is available online at https://github.com/Tachikoma87/CrossForge and is based on OpenGL, implemented in C++ and uses CMake and vcpkg for the build process.

For the further development of the framework, we are looking for dedicated students with a high degree of initiative, self-organization and creativity. Possible tasks include

1. Implementation of modern visualization and data processing techniques:
- Extending the functionality by implementing techniques such as mesh processing and computer vision.

2. Creation of interactive teaching and learning units:
- Development of interactive materials in the area of computer graphics to optimize the use of the framework in teaching.

3. Documentation of existing tutorials:
- Creation of a detailed documentation for existing tutorials to improve the usability. This task can also be done in the context of a HiWi position.

If you are interested in working on our software, please contact the listed contact person. For particularly interesting and challenging project ideas, there is the possibility to realize these in the context of a final thesis.

Anforderungen:

- Gutes Verständnis für OpenGL und Computergrafik (Lehrveranstaltungen CG1 und CG2).
- Erfahrung mit C/C++.
- Gute Englischkenntnisse in Wort und Schrift.

- Good understanding of OpenGL and computer graphics (courses CG1 and CG2).
- Experience with C/C++.
- Good knowledge of English in word and writing.

Toward A Theory in Convolutional Neural Nets: Repetitive Training

Ansprechpartner: Amin Dadgar

Inhalt:

1. What is the Repetitive Training?

Repetitive training Fig. 1 is a special approach to training a neural network. In this approach, we divide our relatively large database into smaller subsets. We then train multiple networks over each of these subsets. During training, we employ the resulting network from the previous subset as the backbone for training on the following subset. With this approach, we aim to address the issue of premature learning saturation.

2. What is the Learning Saturation?

Learning saturation is a state of training in which the training loss does not decrease meaningfully. One reason for this is that the hidden units predominantly produce values close to the asymptotic ends of the activation function range. This state is usually represented in the loss by undirected fluctuations (e.g., saturation) after an initial drop and a somewhat steady (and directed) decrease. Such behavior can have many causes, such as high initial weights, a small network size (e.g., underfitting), and an inappropriate learning rate. However, if these parameters are chosen carefully, (most) networks would reach this state in the final stages of their training. Thus, such saturation would not be a problem.

3. What is the Premature Learning Saturation?

If we have to employ relatively large data sets (for example, when using synthetic data as training sets), this learning saturation would be premature. This means that the network enters this saturation state in the early stages/steps. This seems to be a problem because it seems to cause the networks to learn merely the features from the initial portion of the training set and almost ignore the rest of the data. For a more detailed explanation of these three terms, see [Dadgar and Brunnett, 2023] how repetitive training helped mitigate premature learning saturation for hand segmentation (on specific examples) when the majority of the training data are synthetic images.

4. Our Proposed Praktikum

Here, we propose an Praktikum in a small team, to investigate the influence of repetitive training on a broader and deeper scale. On the one side, we attempt to broaden the scope of this training scheme by considering not only segmentation networks (as in the mentioned paper), but also detection, pose estimation, and classification networks (preferably targeting hands, but considering other objects of interest is negotiable). On the other side, we deepen the scope by considering not only synthetic images, but also real images, and also in comparison with some fancy learning rate schedulers, such as the cosine schedulers.

5. The Takeaways

There are several benefits for those of you who are interested in this project. Some of these benefits are as follows: First, you will gain invaluable insight and experience in the hot topic of convolutional neural networks with different networks, data sets (including those generated in our department), and our lavish facilities. Second, you will work in a small and friendly team, and each of you will focus on a tiny aspect of this project in detail, thereby evaluating and enhancing your teamwork skills in a professional conduct. Third, the entire project would help us to sketch out a detailed picture of the pros and cons of this training scheme. Eventually, we may be able to formulate it as a theorem, with one (or more) publications along the way.

References

[Dadgar and Brunnett, 2023] Dadgar, A. and Brunnett, G. (2023). Hand Segmentation with Mask-RCNN using Mainly Synthetic Images as Training Sets and Repetitive Training Strategy. In VISAPP, Lisbon.

Figure 1: Repetitive Training

Personenerkennung in Multi-Nutzer VR

Ansprechpartner: Dipl.-Inf. Martin Reber

Inhalt:

Head-Mounted Display (HMD) VR Anwendungen haben leiden unter dem Problem der „Einsamkeit“ im virtuellen Raum. Die in der VR befindliche Person kann andere Personen in der Umgebung nicht mehr wahrnehmen, sobald das HMD getragen wird. Die Nutzung von zwei HMD für zwei Personen ermöglicht die Kooperation in der virtuellen Umgebung, und das Positionstracking beider Personen über die Position des HMDs und eventueller Controller. Umstehende Personen werden damit noch immer nicht erfasst.

Das Ziel des Praktikums ist es eine Multi-Nutzer VR Umgebung zu entwickeln, die eine Kooperation innerhalb der virtuellen Realität ermöglicht. Zusätzlich sollen die Positionen aller umstehenden Personen in der VR visualisiert werden. Die visuelle Erfassung der Umgebung kann durch eine oder mehrere erhöht positionierte 360 Grad Kameras erfolgen. Für die Ermittlung der Positionen der erfassten Personen sollen maschinelle Lernverfahren zum Einsatz kommen.

Bestehende VR Umgebungen wie Unreal Engine oder Unity sowie Objekterkennungsverfahren wie YOLO und Pose Estimation Verfahren wie OpenPose können dabei als Grundlage dienen.

Das Labor der Professur kann HTC Vive Pro sowie Ricoh Theta V/Z1 360 Grad Kameras zur Verfügung stellen sowie die dafür notwendigen Rechner.

Echtzeit-Übertragung von Gesichts- und Kopfbewegungen auf virtuellen Charakter

Ansprechpartner: Dipl.-Inf. Martin Reber

Inhalt:

In dieser Masterarbeit soll die Möglichkeit des Einsatzes der Intel RealSense D435i Tiefenkamera für die Echtzeiterkennung grundlegender Gesichts- und Kopfbewegungen untersucht werden, mit dem Ziel, diese Bewegungen anschließend auf den Kopf eines virtuellen Charakters abzubilden. Das Hauptziel besteht darin, zu untersuchen, ob die von der D435i-Kamera bereitgestellten Echtzeit-Tiefendaten genutzt werden können, um eine genaue Erfassung von Gesichtsbewegungen zu erreichen und dabei Echtzeitfähigkeit zu gewährleisten.

Es ist eine Analyse der Qualität der Tiefendaten des D435i und deren Eignung für die Erkennung von relevanten Gesichtsmerkmalen durchzuführen. Anschließend soll Computer-Vision-Pipeline entwickelt werden, die die Tiefeninformationen zur genauen Erfassung von Gesichtsmerkmalen und Kopfbewegungen nutzt. Die so gewonnenen Merkmale sollen auf einen virtuellen Charakter übertragen werden. Bei der Implementierung ist auf Echtzeitfähigkeit zu achten, rechenintensive neuronale Netze sollten daher nach Möglichkeit vermieden werden.

Ein mögliches Einsatzgebiet wäre der Online Chat durch einen virtuellen Avatar, der die Bewegungen der erfassten Person reproduziert und damit die Verwendung beliebiger ggf. auch nicht menschlicher Charaktere ermöglicht.

Selbstlokalisierung einer 360 Grad Kamera

Ansprechpartner: Dipl.-Inf. Martin Reber

Inhalt:

Es soll ein Verfahren entwickelt werden, dass die Selbstlokalisierung von 360 Grad Kameras in einer bekannten Umgebung ermöglicht. Bestehende Tracking Verfahren wie z.B. der Einsatz von Aruco-Markern können als Grundlage der Berechnung der Position und Orientierung relativ zum Raum dienen. Es sind notwendige Anpassungen vorzunehmen, um die Verfahren auf Basis von 360 Grad Projektionen nutzen zu können. Für die Entwicklung und den Test des Verfahrens stehen Ricoh Theta V und Ricoh Theta Z1 Kameras zu Verfügung. Die Kameras können über USB oder W-LAN mit einem PC verbunden werden, sodass eine Live-Übertragung der Kamerabilder möglich ist. Ziel der Arbeit ist es, zu untersuchen, ob eine Selbstlokalisierung einer 360 Grad Kamera in einem bekannten und mit notwendigen Markern versehenen Raum in Echtzeit realisiert werden kann und eine entsprechende Pipeline umzusetzen.

Ein mögliches Einsatzgebiet wäre die Verwendung einer 360 Grad Kamera und entsprechender Lokalisierung zur Steuerung eines autonomen Roboters durch eine bekannte Umgebung.

Finger/Digit Individuation Analysis using the PoseDescriptor

Ansprechpartner: Amin Dadgar

Inhalt:

1 Finger Individuation

In most gestures, involuntary movements of the fingers, imposed by the voluntary movements of the other fingers, are unavoidable. In addition, there are a considerable number of virtual postures that are anatomically impossible for real hands. The analysis of such involuntary, allowed, and forbidden movements, called finger individuation analysis [2, 3, 4], will therefore have a great influence on the design of accurate hand posture estimation systems that return plausible postures. Finger individuation can be done in three ways: dimensionality reduction analysis (DRA), fingers independence analysis (DIA), or finger coupling analysis (DCA).

2 PoseDescriptor

In our analysis of the behavioral repertoire of the hand at the digit/finger level, we observed that we can characterize all postures of every finger in human hands, each with a single and unique value. More precisely, the sum of the distances of the (movable) finger joints/nodes (or of the fingertip) to a locally fixed reference point on that hand (e.g., the wrist joint) has a specific value for each posture of that finger. This unique value, which we call the PoseDescriptor, reduces the dimensionality of the finger’s space from 16 to 5 (i.e., one degree of freedom for each finger) [1]. To make the study and the use of the PoseDescriptor more efficient, we consider three motion patterns that we introduced in that paper.

3 Our Proposed Bachelor Thesis

In that context, we propose a bachelor thesis to study finger individuation and its effect on the flexion/extension of other fingers using our PoseDescriptor. After collecting (and reusing) the existing data on the Internet) and normalizing and preprocessing of the gesture data, the student will analyze and illustrate the influence of finger coupling and enslavement with several computer vision and optimization algorithms (similar to the paper [3]). Ultimately, the goal of this thesis is to investigate the benefits that the PoseDescriptor can bring to the formulation of finger individuation and finger enslavement over the conventional joint-degree configuration.

References

[1] A. Dadgar and G. Brunnett. Using a 1D Pose-Descriptor on the Finger-Level to Reduce the Dimensions in Hand Posture Estimation. In ICPRAM, Lisbon, 2023.
[2] C. Hager-Ross and M. H. Schieber. Quantifying the independence of human finger movements: Comparisons of digits, hands, and movement frequencies. Journal of Neuroscience, 20(22):8542–8550, 2000.
[3] J. N. Ingram, K. P. K¨ording, I. S. Howard, and D. M. Wolpert. The statistics of natural hand movements. Experimental Brain Research, 188(2):223–236, 2008.
[4] M. Nakamura, C. Miyawaki, N. Matsushita, R. Yagi, and Y. Handa. Analysis of voluntary finger movements during hand tasks by a motion analyzer. Journal of Electromyography and Kinesiology, 8(5):295–303, 1998.

Figure 1: Two types of Fingers’ PoseDescriptors

Recognition of 3D Hand Gestures/Postures using Synthetic Data

Ansprechpartner: Amin Dadgar

Inhalt:

Our hands play significant roles in our daily lives. Examples of such roles are 1) Pointing to a person or an object, 2) Conveying information about space, shapes, the objects’ number, or the temporal characteristics of motions, 3) Interacting relentlessly with objects in operating rooms, airplanes, laboratories, and factories, 4) Carrying out unconscious gesticulation to express ideas, and 5) Conducting conscious communication with sign language. Thus, a successful system design that encompasses the entire chain of automatic recognizing hand gestures could be beneficial for many social sectors. And it would play a central role in many intelligent systems of our future world.

Though being an immensely beneficial topic, the project calls for addressing many unforeseen challenges. The setting of the project’s design would demand the specificity of those challenges to be redefined anew. However, within the vision-based class of technology and, in the scope of the analysis-by-synthesis approach, there is a general trend of modules which can attend to several significant obstacles. Those modules include but are not limited to the following:

Bottom-up modules such as tracking (Fig. a) and segmentation (Fig. b) to locate and extract human hands within 2D image scenes.
Top-down modules such as spatio-temporal models to effectively relating hand’s postures with mathematical representations (Fig. c), and optimization techniques to efficiently searching through the high dimensional search space (Fig. d).
Classification frameworks for converting the estimated postures/gestures to the semantically meaningful commands (Fig. e).

It goes without saying that one can view designing of these modules from several fascinating perspectives, such as computer vision, optimization, computer graphics, and machine learning. In that context, we welcome talented, creative, and hardworking students toward completing their master’s thesis focusing on one of the above modules from a particular perspective. Finally, we are open to consider new insight a vision from the student’s side to approach the modules above, subject to novelty and feasibility of the proposals.

Improved SaneNet for hand segmentation and pose estimation

Ansprechpartner: Amin Dadgar

Inhalt:

1 What is the SaneNet?

SaneNet is a type of convolutional neural network trained mainly on synthetic data (Synthetically-based Trained Artificial Neural Network). The idea is to use synthetic data as a training set, without introducing new architectural elements, and using existing networks to eliminate the burden of costly annotation of real data. However, creating photorealistic synthetic images can also be costly. To alleviate this problem, we exploited the invariancy concept of neural networks by especific rotation of the hand model around the cartesian axes. This allowed us to use simplistic synthetic images in the training set instead of employing conventional and expensive techniques of domain randomization in the creation of realistic synthetic images. For a more detailed explanation of what the invariancy concept is and how we exploited it, see the paper [Dadgar and Brunnett, 2020].

2 What are the achievements so far?

First, we addressed the hand detection problem. To do so, we trained the YOLO network using these images (generated with a single hand model, with no shadowing but only shading, no texture, and a simple plain background) in combination with a few (100) real images. The results suggest that our simple and very inexpensive approach to generating synthetic training sets is successful for detecting hands in challenging scenarios [Dadgar and Brunnett, 2020]. Then, using a similar strategy and a new training scheme, we demonstrated the success of the approach in segmenting hands in real scenes, provided that the parameters of the networks and the specifications of our real data are taken with meticulous care [Dadgar and Brunnett, 2023].

3 What is the problem?

As the tasks became more difficult (e.g., from detection to segmentation), we faced a generalization problem. That is, for detection, the networks responded well in different scenarios and examples. However, for segmentation, the networks performed well only for certain examples and test sets. Therefore, we investigated the feasibility of increasing the generalization of the segmentation. To do this, we generated a new set of synthetic images, this time aiming for maximum diversity in the data while only slightly increasing the complexity of our rendering engine [Uhlmann et al., 2023]. Therefore, we generated a large amount of diversity by showing a human and its hands in a multitude of poses and with varying backgrounds while keeping the number of subjects, poses, scenes, and other costly graphical factors at the minimum level (see Fig.1).

4 What is next?

We are proposing two master projects, one focusing on hand segmentation and the other on pose estimation, to investigate the feasibility and success of our SaneNet approach in more general scenarios and test sets. We will provide our enhanced synthetic data for training the neural networks. Students may consider creating a new set of synthetic images using a generative neural network (e.g., GANs) if they can propose a novel approach to forming similarly rotated images (e.g., helping to exploit the invariancy concept).

References

[Dadgar and Brunnett, 2020] Dadgar, A. and Brunnett, G. (2020). SaneNet: Training a Fully Convolutional Neural Network Using Synthetic Data for Hand Detection. IEEE SAMI, pages 251–256.
[Dadgar and Brunnett, 2023] Dadgar, A. and Brunnett, G. (2023). Hand Segmentation with Mask-RCNN using Mainly Synthetic Images as Training Sets and Repetitive Training Strategy. In VISAPP, Lisbon.
[Uhlmann et al., 2023] Uhlmann, T., Dadgar, A., Weigand, F., and Brunnett, G. (2023). A novel Framework for the Generation of Synthetic Datasets with Applications to Hand Detection and Segmentation. In CRC-Hybrid Society, Chemnitz.

Figure 1: Enhanced Synthetic Data

Recognition of 3D Hand Gestures/Postures using Synthetic Data

Ansprechpartner: Amin Dadgar

Inhalt:

Bottom-up modules such as tracking (Fig. a) and segmentation (Fig. b) to locate and extract human hands within 2D image scenes.
Top-down modules such as spatio-temporal models to effectively relating hand’s postures with mathematical representations (Fig. c), and optimization techniques to efficiently searching through the high dimensional search space (Fig. d).
Classification frameworks for converting the estimated postures/gestures to the semantically meaningful commands (Fig. e).