Commit 8a38216e authored by uoega's avatar uoega
Browse files

update readme

parent 6469d96f
# Zero Shot Action Recognition
This is the git repo of the paper **Data Augmentation of Semantic Embeddings for Skeleton based Zero-Shot
Gesture Recognition** by David Heiming, Hannes Uhl and Jonas Linkerhaegner from the summer term 2021. Following our modular approach in the following sections you can read how to use our code.
Gesture Recognition** by David Heiming, Hannes Uhl and Jonas Linkerhaegner from the summer term 2021. All implementations were done in pytorch. Following the modular approach you can read how to use our code in the following sections.
# ST-GCN
## ST-GCN
This module is located in the folder **st-gcn_original** and can be used like the original from the git repo [st-gcn](https://github.com/yysijie/st-gcn). Additional files:
- **/config/st_gcn** contains two new folders with the config files for our splits
- **/tools** contains three python scripts "ntu_gendata.py", "ntu_gendata_zsasr.py" and "ntu_gendata_zsar_nearest_cos.py" to generate the training splits for the ST-GCN.
- **/processor** contains the python file "feature_extraction.py" to generate the 256 dimensional features of the the classes the ST-GCN was not trained on
**SBERT**
## SBERT
This module is located in the folder **Bert_language_embeddings**.
### Language Embeddings
The most important file to generate the [SBERT mean class label embedding](https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens) is the python file "class_label_embedding_bert_mean.py".
We additionally experimented with the [cls token](https://huggingface.co/sentence-transformers/bert-base-nli-cls-token) and [mpnet](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) from [sentence transformers](https://www.sbert.net/).
You can find the different versions of the manually annotated descriptive labels in the .txt files "class_sentences_40_verx.txt" with x beeing the version between one and five.
- Version 1: First version of the visually focussed descriptions.
- Version 2: Corrected Version 1
- Version 3: Descriptions focussed on the meaning of the gestures rather than their visual charactersitics
- Version 4: Alternative version of the visually focussed descriptions.
- Version 5: Alternative version of the visually focussed descriptions.
### Automatic Augmentation
The files for the automatic augmentation can be found in the subfolder **/augmentation**. The augmentations tested are form [nlpaug](https://github.com/makcedward/nlpaug) and [textaugment](https://pypi.org/project/textaugment/).
## Learning to Compare ZSL
This module is located in the folder **LearningToCompare_ZSL**. Our Version is based on the implementation of the [original paper](https://arxiv.org/abs/1711.06025) from [LearningToCompare_ZSL](https://github.com/lzrobots/LearningToCompare_ZSL).
The main file to train the Zero-Shot part of the architecture is "NTU_RelationNet_copy.py". There are different input arguments most importantly the unseen classes (from 1 to 40) and the used label embedding. For example:
```python:
python NTU_RelationNet_copy.py -u 2 9 11 18 38 -s sentence_40_mean_ver1_norm
```
This uses the classes 2,9,11,18 and 38 as unseen classes and the ver1 descriptive embedding as label embedding.
### Multiple Labels
Here the file "NTU_RelationNet_random_multi_label.py" is used for training. The input arguements are the same like for the single label approach but now more than one label embedding can be used, for example:
```python:
python NTU_RelationNet_random_multi_label.py -u 2 9 11 18 38 -s sentence_40_mean_ver1_norm sentence_40_mean_ver2_norm sentence_40_mean_ver5_norm
```
## Additional files
As usual more experiments than the ones presented in the paper were performed. The corresponding files are put together here.
### Simase Networks
Located in the folder **siamese-triplet** are all files used for experiments using a siamese net to cluster the visual features from the ST-GCN.
This module is located in the folder **Bert_language_embeddings**. The most important file to generate the SBERT mean class label embedding is the python file "class_label_embedding.py".
## Results
**Relation Net**
|Augmentation | ZSL | Seen | Unseen | h|
|---|---|---|---|---|
|Baseline | $0.4739$ | $0.8116$ | $0.1067$ | $0.1877$|
|Descriptive Labels | $0.5186$ | $0.8104$ | $0.1503$ | $0.2495$
|Multiple Descriptive Labels | $\textbf{0.6558}$ | $0.8283$ | $\textbf{0.2182}$ |$\textbf{0.3417}$|
|Automatic Augmentation | $0.5865$ | $\textbf{0.8290}$ | $0.1856$ | $0.3003$|
Augmentation | top-1${\pm}$ std | top-5 ${\pm}$ std
|---|---|---|
Baseline | ${0.1067\pm 0.0246}$ | ${0.5428\pm 0.0840}$
Descriptive Labels | ${0.1503\pm 0.0553}$ | ${0.6460\pm 0.1250}$
Multiple Descriptive Labels | ${\textbf{0.2182}\pm 0.0580}$ | ${\textbf{0.8580}\pm 0.0657}$
Automatic Augmentation | ${0.1856\pm 0.0499}$ | ${0.8272\pm 0.0476}$
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment