Commit 7465a29e authored by Tediloma's avatar Tediloma
Browse files

paper ver 2.5.1

parent 6e9f42a3
......@@ -238,16 +238,16 @@ To achieve this, the batch size during training is increased from 32 to 128. As
%
\subsubsection{Automatic augmentation}
%------Jonas
As described in chapter \ref{method}, inserting words with automatic augmentation methods not necessarily yield grammatically correct sentences. But as the semantic embeddings are generated using the SBERT mean-token, it introduces diversity into the different embeddings due to the averaging. Still the additional information does usually not focus on the visual description of the classes and therefore differs from the manually created multiple labels. It can rather be modeled as adding noise to the embedding vectors. But in contrast to just adding random noise, it keeps semantic information and relations between different embeddings intact. This helps the network to better generalize its mapping. Experiments using only random noise to generate diverse label embeddings lead to no performance improvements in top-1 unseen accuracy.
As described in chapter \ref{method}, inserting words with automatic augmentation methods does not necessarily yield grammatically correct sentences. But as the semantic embeddings are generated using the SBERT mean-token, it introduces diversity into the different embeddings due to the averaging. Still the additional information does usually not focus on the visual description of the classes and therefore differs from the manually created multiple labels. It can rather be modeled as adding noise to the embedding vectors. But in contrast to just adding random noise, it keeps semantic information and relations between different embeddings intact. This helps the network to better generalize its mapping. Experiments using only random noise to generate diverse label embeddings lead to no performance improvements in top-1 unseen accuracy.
%------
%As described in chapter \ref{method}, using automatic augmentation methods introduces diversity into the different embeddings. As this does not only focus on the visual description of the classes and therefore differs from the manually created multi labels, it could be modeled as noise. But in contrast to just adding random noise to the embedding vector, it keeps semantic information and relationships intact. This helps the network to generalize its mapping. Experiments using only random noise to generate diverse label embeddings lead to no performance improvements in top-1 accuracy.
\section{Conclusion}
In this work, we highlight the importance of the semantic embeddings in the context of skeleton based zero-shot gesture recognition. by showing how the performance can increase based only on the augmentation of those embeddings. By including more visual information in the class labels and combining multiple descriptions per class we can improve the model based on \cite{jasani2019skeleton} by a significant margin. The use of automatic text augmentation methods \cite{ma2019nlpaug} already reduces the effort of manual annotation significantly, while maintaining most of the performance gain.
In this work, we highlight the importance of the semantic embeddings in the context of skeleton based zero-shot gesture recognition by showing how the performance can increase based only on the augmentation of those embeddings. By including more visual information in the class labels and combining multiple descriptions per class we can improve the model based on \cite{jasani2019skeleton} by a significant margin. The use of automatic text augmentation methods \cite{ma2019nlpaug} already reduces the effort of manual annotation significantly, while maintaining most of the performance gain.
Future works might further investigate the following topics: First, generating descriptive sentences from the default labels, e.g. by using methods from Natural Language Processing (NLP), can be implemented to further reduce the manual annotation effort. Second, additional tests on different zero-shot architectures to verify the improvements shown in our work can be performed. Finally, different kinds or combinations of automatic text augmentation methods can be evaluated.
Future works might further investigate the following topics: Firstly, generating descriptive sentences from the default labels, e.g. by using methods from Natural Language Processing (NLP), can be implemented to further reduce the manual annotation effort. Secondly, additional tests on different zero-shot architectures to verify the improvements shown in our work can be performed. Finally, different kinds or combinations of automatic text augmentation methods can be evaluated.
With these advances, data augmentation of the semantic embeddings in Zero-Shot learning can prove useful in optimizing the performance of any Zero-Shot approach in the future.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment