Commit 17f9bdd2 authored by uoega's avatar uoega
Browse files

update tables paper

parent 193298b2
......@@ -16,20 +16,21 @@
\gdef\HyperFirstAtBeginDocument#1{#1}
\providecommand\HyField@AuxAddToFields[1]{}
\providecommand\HyField@AuxAddToCoFields[2]{}
\@writefile{toc}{\contentsline {section}{\numberline {1}\hskip -1em.\nobreakspace {}Introduction}{4321}{section.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}\hskip -1em.\nobreakspace {}Zero-shot learning}{4321}{subsection.1.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.2}\hskip -1em.\nobreakspace {}Skeleton-based visual recognition}{4321}{subsection.1.2}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.3}\hskip -1em.\nobreakspace {}Data augmentation}{4321}{subsection.1.3}}
\@writefile{toc}{\contentsline {section}{\numberline {2}\hskip -1em.\nobreakspace {}Method}{4322}{section.2}}
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Architecture or other needed for method}}{4322}{figure.1}}
\newlabel{fig:long}{{1}{4322}{Architecture or other needed for method}{figure.1}{}}
\newlabel{fig:onecol}{{1}{4322}{Architecture or other needed for method}{figure.1}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}\hskip -1em.\nobreakspace {}Augmentations}{4322}{subsection.2.1}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.1}Automatic Augmentation}{4322}{subsubsection.2.1.1}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Results. Ours is better.}}{4323}{table.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}\hskip -1em.\nobreakspace {}Experiments}{4323}{subsection.2.2}}
\@writefile{toc}{\contentsline {section}{\numberline {3}\hskip -1em.\nobreakspace {}Results}{4323}{section.3}}
\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}\hskip -1em.\nobreakspace {}Discussion}{4323}{subsection.3.1}}
\@writefile{toc}{\contentsline {section}{\numberline {1}\hskip -1em.\nobreakspace {}Introduction}{1}{section.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}\hskip -1em.\nobreakspace {}Zero-shot learning}{1}{subsection.1.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.2}\hskip -1em.\nobreakspace {}Skeleton-based visual recognition}{1}{subsection.1.2}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.3}\hskip -1em.\nobreakspace {}Data augmentation}{1}{subsection.1.3}}
\@writefile{toc}{\contentsline {section}{\numberline {2}\hskip -1em.\nobreakspace {}Method}{2}{section.2}}
\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}\hskip -1em.\nobreakspace {}Augmentations}{2}{subsection.2.1}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.1}Automatic Augmentation}{2}{subsubsection.2.1.1}}
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Architecture or other needed for method}}{3}{figure.1}}
\newlabel{fig:long}{{1}{3}{Architecture or other needed for method}{figure.1}{}}
\newlabel{fig:onecol}{{1}{3}{Architecture or other needed for method}{figure.1}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}\hskip -1em.\nobreakspace {}Experiments}{3}{subsection.2.2}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces ZSL and GZSL results for different approaches.}}{3}{table.1}}
\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Unseen top-1 and top-5 accuracies results in detail.}}{3}{table.2}}
\@writefile{toc}{\contentsline {section}{\numberline {3}\hskip -1em.\nobreakspace {}Results}{3}{section.3}}
\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}\hskip -1em.\nobreakspace {}Discussion}{3}{subsection.3.1}}
\bibstyle{ieee_fullname}
\bibdata{egbib}
\bibcite{Alpher02}{1}
......@@ -37,4 +38,4 @@
\bibcite{Alpher04}{3}
\bibcite{Authors14}{4}
\bibcite{Authors14b}{5}
\@writefile{toc}{\contentsline {section}{\numberline {4}\hskip -1em.\nobreakspace {}Conclusion}{4324}{section.4}}
\@writefile{toc}{\contentsline {section}{\numberline {4}\hskip -1em.\nobreakspace {}Conclusion}{4}{section.4}}
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (MiKTeX 2.9.6840 64-bit) (preloaded format=pdflatex 2018.10.16) 23 JUL 2021 14:58
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (MiKTeX 2.9.6840 64-bit) (preloaded format=pdflatex 2018.10.16) 23 JUL 2021 16:22
entering extended mode
**./paper_working_design.tex
(paper_working_design.tex
......@@ -381,42 +381,72 @@ File: ot1pcr.fd 2001/06/04 font definitions for OT1/pcr.
)
LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <12> not available
(Font) Font shape `OT1/ptm/b/n' tried instead on input line 52.
[4321{C:/Users/XPS15/AppData/Local/MiKTeX/2.9/pdftex/config/pdftex.map}
[1{C:/Users/XPS15/AppData/Local/MiKTeX/2.9/pdftex/config/pdftex.map}
]
Underfull \hbox (badness 4859) in paragraph at lines 78--89
Underfull \hbox (badness 4859) in paragraph at lines 78--79
[]\OT1/ptm/m/n/10 Die gew[]ahlte Ar-chitek-tur f[]ur un-sere Ex-per-i-mente
[]
Underfull \hbox (badness 1038) in paragraph at lines 78--89
Underfull \hbox (badness 1038) in paragraph at lines 78--79
\OT1/ptm/m/n/10 seinen einzel-nen Mod-ulen zusam-menge-baut. Einzelne
[]
Underfull \hbox (badness 1062) in paragraph at lines 78--89
Underfull \hbox (badness 10000) in paragraph at lines 78--79
[]
LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <10> not available
(Font) Font shape `OT1/ptm/b/n' tried instead on input line 82.
Underfull \hbox (badness 10000) in paragraph at lines 80--85
[]
Underfull \hbox (badness 10000) in paragraph at lines 87--92
[]
Underfull \hbox (badness 1062) in paragraph at lines 94--100
\OT1/ptm/m/n/10 ab-u-lar, d.h. alle m[]oglichen Klassen-la-bels, in ein se-
[]
Underfull \hbox (badness 1127) in paragraph at lines 78--89
Underfull \hbox (badness 1127) in paragraph at lines 94--100
\OT1/ptm/m/n/10 die Ab-bil-dung der se-man-tis-chen Merk-male in den vi-
[]
Underfull \hbox (badness 1168) in paragraph at lines 78--89
Underfull \hbox (badness 1168) in paragraph at lines 94--100
\OT1/ptm/m/n/10 Net (RN), das im fol-gen-den Ab-schnitt n[]aher erl[]autert
[]
<Architektur.png, id=16, 817.527pt x 418.509pt>
File: Architektur.png Graphic file (type png)
<use Architektur.png>
Package pdftex.def Info: Architektur.png used on input line 93.
Package pdftex.def Info: Architektur.png used on input line 104.
(pdftex.def) Requested size: 189.70947pt x 97.11714pt.
LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <10> not available
(Font) Font shape `OT1/ptm/b/n' tried instead on input line 101.
[4322 <./Architektur.png>] [4323] (paper_working_design.bbl
Underfull \hbox (badness 10000) in paragraph at lines 113--114
[]
[2]
Underfull \hbox (badness 10000) in paragraph at lines 115--119
[]
Underfull \hbox (badness 10000) in paragraph at lines 121--125
[]
[3 <./Architektur.png>] (paper_working_design.bbl
Underfull \hbox (badness 2376) in paragraph at lines 9--12
[]\OT1/ptm/m/n/9 FirstName Alpher and First-Name Fotheringham-Smythe.
[]
......@@ -427,33 +457,34 @@ Underfull \hbox (badness 1132) in paragraph at lines 14--17
[]
)
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 162.
[4324
]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 162.
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 202.
[4]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 202.
(paper_working_design.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 162.
Package atveryend Info: Empty hook `AtEndAfterFileList' on input line 162.
Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 162.
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 202.
Package atveryend Info: Empty hook `AtEndAfterFileList' on input line 202.
Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 202.
)
Here is how much of TeX's memory you used:
6235 strings out of 492970
91639 string characters out of 3126593
187762 words of memory out of 3000000
9998 multiletter control sequences out of 15000+200000
26528 words of font info for 63 fonts, out of 3000000 for 9000
6242 strings out of 492970
91714 string characters out of 3126593
190756 words of memory out of 3000000
10004 multiletter control sequences out of 15000+200000
28580 words of font info for 67 fonts, out of 3000000 for 9000
1141 hyphenation exceptions out of 8191
32i,9n,27p,1165b,324s stack positions out of 5000i,500n,10000p,200000b,50000s
32i,13n,27p,1165b,324s stack positions out of 5000i,500n,10000p,200000b,50000s
{C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/enc/dvips/base/8r.enc
}<C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/urw/courier/ucrr
8a.pfb><C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/urw/times/
utmb8a.pfb><C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/urw/ti
mes/utmr8a.pfb><C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/ur
w/times/utmri8a.pfb>
Output written on paper_working_design.pdf (4 pages, 376297 bytes).
}<C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/public/amsfonts/
cm/cmmi10.pfb><C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fonts/type1/pub
lic/amsfonts/cm/cmr10.pfb><C:/Users/XPS15/AppData/Local/Programs/MiKTeX 2.9/fon
ts/type1/public/amsfonts/cm/cmsy10.pfb><C:/Users/XPS15/AppData/Local/Programs/M
iKTeX 2.9/fonts/type1/urw/courier/ucrr8a.pfb><C:/Users/XPS15/AppData/Local/Prog
rams/MiKTeX 2.9/fonts/type1/urw/times/utmb8a.pfb><C:/Users/XPS15/AppData/Local/
Programs/MiKTeX 2.9/fonts/type1/urw/times/utmr8a.pfb><C:/Users/XPS15/AppData/Lo
cal/Programs/MiKTeX 2.9/fonts/type1/urw/times/utmri8a.pfb>
Output written on paper_working_design.pdf (4 pages, 403380 bytes).
PDF statistics:
66 PDF objects out of 1000 (max. 8388607)
24 named destinations out of 1000 (max. 500000)
80 PDF objects out of 1000 (max. 8388607)
25 named destinations out of 1000 (max. 500000)
6 words of extra memory for PDF output out of 10000 (max. 10000000)
......@@ -21,7 +21,7 @@
% Pages are numbered in submission mode, and unnumbered in camera-ready
%\ifcvprfinal\pagestyle{empty}\fi
\setcounter{page}{4321}
\setcounter{page}{1}
\begin{document}
%%%%%%%%% TITLE
......@@ -75,11 +75,22 @@ We aim to provide the network more relevant semantic information about the diffe
\section{Method}
Die gewählte Architektur für unsere Experimente entspricht in weiten Teilen der vorgestellten Architektur aus dem Paper: “Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space”. Das darin präsentierte Modell wurde nach dem Reengineering-Prinzip mithilfe der im Paper veröffentlichten Informationen aus seinen einzelnen Modulen zusammengebaut. Einzelne Module wurden dabei ausgetauscht oder zu Gunsten einer besseren Performanz leicht abgeändert. Detaillierte Informationen zum verwendeten Modell, das in [Abbildung…] illustriert ist, sollten den Veröffentlichung von [Autoren vom Paper] entnommen werden. Hier wird hingegen nur ein kurzer Überblick gegeben, nach welchem Prinzip das Modell die Zero-Shot Aufgabe zu lösen versucht, und welche Veränderungen im Vergleich zu [Paper] vorgenommen wurden.
Die Architektur besteht aus drei Teilen:
1. Einem visuellen Pfad
2. Einem semantischen Pfad
3. Einem Vergleich-lernenden Teil
Die gewählte Architektur für unsere Experimente entspricht in weiten Teilen der vorgestellten Architektur aus dem Paper: “Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space”. Das darin präsentierte Modell wurde nach dem Reengineering-Prinzip mithilfe der im Paper veröffentlichten Informationen aus seinen einzelnen Modulen zusammengebaut. Einzelne Module wurden dabei ausgetauscht oder zu Gunsten einer besseren Performanz leicht abgeändert. Detaillierte Informationen zum verwendeten Modell, das in [Abbildung…] illustriert ist, sollten den Veröffentlichung von [Autoren vom Paper] entnommen werden. Hier wird hingegen nur ein kurzer Überblick gegeben, nach welchem Prinzip das Modell die Zero-Shot Aufgabe zu lösen versucht, und welche Veränderungen im Vergleich zu [Paper] vorgenommen wurden.\\
\noindent
Die Architektur besteht aus drei Teilen:\medskip\\
{\bf 1. } Einem visuellen Pfad\\
{\bf 2. } Einem semantischen Pfad\\
{\bf 3. } Einem Vergleich-lernenden Teil\medskip\\
\begin{quotation}
\noindent
Die Architektur besteht aus drei Teilen:\medskip\\
{\bf 1. } Einem visuellen Pfad\\
{\bf 2. } Einem semantischen Pfad\\
{\bf 3. } Einem Vergleich-lernenden Teil\medskip\\
\end{quotation}
Die Aufgabe des visuellen Pfades ist die Merkmalsextraktion des zu klassifizierenden Video-Samples. Als Feature Extractor kommt das Graph Convolutional Net (GCN) aus [Zitat ST-GCN PAPER] zum Einsatz, welches in unserem Fall ausschließlich mit den 80 nicht verwendeten Klassen des NTU-RGB+D 120 Datensatzes trainiert worden ist, um den Zero-Shot Ansatz nicht zu verletzen. Auf diese Weise ist sichergestellt, dass die zu klassifizierenden, ungesehenen Gesten nicht bereits vor der Inferenz schon an einer Stelle im Trainingsprozess aufgetaucht sind. Das GCN erhält als Eingabe die Skelettdaten des zu klassifizierenden Videos und gibt einen 256-dimensionalen Vektor aus, der die Merkmale der gezeigten Geste im Video repräsentiert. Weitere Details sind dem referenzierten Paper zu entnehmen. Es wurden an diesem Teil des Netzes keine wesentlichen Veränderungen vorgenommen.
\newline
Der semantische Pfad hat zunächst die Aufgabe, das Vokabular, d.h. alle möglichen Klassenlabels, in ein semantisches Embedding zu überführen. Hierfür wird im Gegensatz zu unserer Vorbild-Architektur kein Sent2Vec-Modul verwendet, sondern ein sBert-Modul. Die Details zu diesem Modell, das die Klassenlabels in repräsentative 768-dimensionale Vektoren übersetzt, können in [Zitat Bert-Paper] nachgelesen werden. Im Anschluss daran folgt die Abbildung der semantischen Merkmale in den visuellen Kontext. Diese Aufgabe übernimmt ein Multi-Layer-Perceptron (MLP), das im Folgenden als Attribute Network (AN) bezeichnet wird. Das AN befindet sich an der Grenze zwischen dem semantischen Pfad und dem Similarity-Learning Part. Vorgestellt wird es in [Zitat Learning2Compare], wo es zusammen mit dem Relation Net (RN), das im folgenden Abschnitt näher erläutert wird, einen wesentlichen Teil zur Lösung der ZSL-Aufgabe beiträgt. Am AN wurden auch kleine Veränderungen vorgenommen. Diese drücken sich in der Dimensionalität der einzelnen Schichten und dem hinzugefügten Drop-Out, mit einem Drop-Out Faktor von 0,5 aus.
......@@ -99,10 +110,22 @@ Die zwei letztgenannten Module AN und RN aus [Learning2Compare] sind es auch, di
\subsection{Augmentations}
\subsubsection{Automatic Augmentation}
To reduce the manual annotation effort, we would like to generate additional labels automatically for the multi label approach. Therefor we’re using the ContextualWordEmbsAug Augmenter with RoBERTa [liu2019roberta] language model from nlpaug [CITATION] to insert words into a descriptive embedding. We decided on insertions and not substitutions or deletions, since these did not perform well in our tests. (For substitutions with synonyms, we would have expected a better performance, but it turned out that there weren’t enough synonyms for the key words in our sentences.) For the class squat down an example for the used word insertions would be:
Description: A human crouches down by bending their knees.
Augmentation 1: A small human crouches duck down by bending their knees.
Augmentation 2: A human crouches fall down somewhat by bending their knees.
To reduce the manual annotation effort, we would like to generate additional labels automatically for the multi label approach. Therefor we’re using the ContextualWordEmbsAug Augmenter with RoBERTa [liu2019roberta] language model from nlpaug [CITATION] to insert words into a descriptive embedding. We decided on insertions and not substitutions or deletions, since these did not perform well in our tests. (For substitutions with synonyms, we would have expected a better performance, but it turned out that there weren’t enough synonyms for the key words in our sentences.) For the class squat down an example for the used word insertions would be:\\
\noindent
{\bf Description:} A human crouches down by bending their knees.\\
{\bf Augmentation 1:} A \textit{small} human crouches \textit{duck} down by bending their knees.\\
{\bf Augmentation 2:} A human crouches \textit{fall} down \textit{somewhat} by bending their knees.\medskip\\
\begin{quotation}
\noindent
{\bf Description:} A human crouches down by bending their knees.\\
{\bf Augmentation 1:} A \textit{small} human crouches \textit{duck} down by bending their knees.\\
{\bf Augmentation 2:} A human crouches \textit{fall} down \textit{somewhat} by bending their knees.\medskip\\
\end{quotation}
One can see, that the augmented sentences are not necessarily grammatically correct and less human readable. But as our semantic embedding is generated using a weighted average of the tokens of every word from SBERT with an attention mask, it introduces some kind of variance/diversity into the different embeddings of the descriptive labels. We expect this to perform worse compared to the three manually created descriptive label approach but still leading to some improvements compared to just using one descriptive label.
\subsection{Experiments}
......@@ -113,17 +136,34 @@ For evaluating our model, we do training runs on 8 random 35/5 splits, which inc
\begin{table}
\begin{center}
\begin{tabular}{|l|c|}
\begin{tabular}{|l|c|c|c|c|}
\hline
Approach & ZSL & Seen & Unseen & h\\
\hline\hline
baseline & 0.4739 & 0.8116 & 0.1067 & 0.1877\\
aug1 & 0.5186 & 0.8104 & 0.1503 & 0.2495\\
aug2 & \textbf{0.6558} & 0.8283 & \textbf{0.2182} & \textbf{0.3417}\\
aug3 & 0.5865 & \textbf{0.8290} & 0.1856 & 0.3003\\
\hline
\end{tabular}
\end{center}
\caption{ZSL and GZSL results for different approaches.}
\end{table}
\begin{table}
\begin{center}
\begin{tabular}{|l|c|c|}
\hline
Method & Frobnability \\
Approach & top-1${\pm}$ std & top-5 ${\pm}$ std \\
\hline\hline
Theirs & Frumpy \\
Yours & Frobbly \\
Ours & Makes one's heart Frob\\
baseline & ${0.1067\pm 0.0246}$ & ${0.5428\pm 0.0840}$ \\
aug1 & ${0.1503\pm 0.0553}$ & ${0.6460\pm 0.1250}$ \\
aug2 & ${\textbf{0.2182}\pm 0.0580}$ & ${\textbf{0.8580}\pm 0.0657}$ \\
aug3 & ${0.1856\pm 0.0499}$ & ${0.8272\pm 0.0476}$ \\
\hline
\end{tabular}
\end{center}
\caption{Results. Ours is better.}
\caption{Unseen top-1 and top-5 accuracies results in detail.}
\end{table}
All our results were generated following the procedure described in the Experiments section. In [TABLE] one can see the ZSL accuracies of our approach with standard deviation/min-max. [TABLE] shows the seen accuracy, unseen accuracy and the harmonic mean.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment