Commit c2f3bc53 authored by tim.scherr's avatar tim.scherr
Browse files

code version 2.0

parent 7599ed6c
......@@ -12,7 +12,6 @@ __pycache__/
*__pycache__
*.idea
*.pdf
*.pth
# image data
......
# KIT-Sch-GE 2021 Segmentation
Segmentation method used for our submission to the 6th edition of the [ISBI Cell Tracking Challenge](http://celltrackingchallenge.net/) 2021 (Team KIT-Sch-GE).
Distance-transform-prediction-based segmentation method used for our submission to the 6th edition of the [ISBI Cell Tracking Challenge](http://celltrackingchallenge.net/) 2021 ([Team KIT-Sch-GE](http://celltrackingchallenge.net/participants/KIT-Sch-GE/)).
![Segmentation Overview Image](documentation/segmentation_overview.png)
## Prerequisites
* [Anaconda Distribution](https://www.anaconda.com/products/individual)
* A CUDA capable GPU
* Minimum / recommended RAM: 16 GiB / 32 GiB
* Minimum / recommended VRAM: 12 GiB / 24 GiB
* Recommended RAM: 32 GiB
* Recommended VRAM: 24 GiB
## Installation
Clone the repository:
......@@ -21,43 +23,183 @@ conda env create -f requirements.yml
Activate the virtual environment kit_sch-ge-2021_cell_segmentation_ve:
```
conda activate kit_sch-ge-2021_cell_segmentation_ve
```
## Download Models
The models of the Cell Tracking Challenge submission KIT-Sch-GE (2) can be downloaded with
```
python download_models.py
```
These models were trained on a mixture of gold truth (GT) and silver truth (ST) [annotations](http://celltrackingchallenge.net/annotations/) on a specific cell type which is encoded in the model name. Currently, the following models are available:
- BF-C2DL-HSC_GT+ST_model,
- BF-C2DL-MuSC_GT+ST_model,
- DIC-C2DH-HeLa_GT+ST_model,
- Fluo-C2DL-MSC_GT+ST_model,
- Fluo-C3DH-A549_GT+ST_model,
- Fluo-C3DH-H157_GT+ST_model,
- Fluo-C3DL-MDA231_GT+ST_model,
- Fluo-N2DH-GOWT1_GT+ST_model,
- Fluo-N2DL-HeLa_GT+ST_model,
- Fluo-N3DH-CE_GT+ST_model,
- Fluo-N3DH-CHO_GT+ST_model,
- Fluo-N3DH-SIM+_GT_model (trained also on Fluo-N2DH-SIM+ data),
- PhC-C2DH-U373_GT+ST_model,
- PhC-C2DL-PSC_GT+ST_model,
These models are saved into *./models/kit-sch-ge/* and can be used for retraining, comparison purposes on Cell Tracking Challenge data or may directly generalize to other data when the domain gap is not too large.
## Cell Tracking Challenge 2021
In this section, it is described how to reproduce the segmentation results of our Cell Tracking Challenge submission. If the exact submission results are needed, download our trained models from the Cell Tracking Challenge website when available (and move them to *cell_tracking_challenge/kit-sch-ge_2021/SW/models*, see also next step).
### Data
Download the Cell Tracking Challenge training and challenge data sets. Make a folder *cell_tracking_challenge*. Unzip the training data sets into *cell_tracking_challenge/training_datasets*. Unzip the training data sets into *cell_tracking_challenge/challenge_datasets*. Download and unzip the [evaluation software](http://public.celltrackingchallenge.net/software/EvaluationSoftware.zip). Set the corresponding paths in *paths.json*.
## Download Cell Tracking Challenge Data & Evaluation Software
Download all Cell Tracking Challenge Data (without Fluo-N3DL-DRO, Fluo-N3DL-TRIC, Fluo-N3DL-TRIF) with
```
python download_data.py
```
About 40GiB free memory is needed. The training datasets with annotations are saved into *./train_data/* and the challenge data into *./challenge_data/*. In addition the [evaluation software](http://celltrackingchallenge.net/evaluation-methodology/) will be downloaded.
## Training
New models can be trained with:
```
python train.py --cell_type 'cell_type' --mode 'mode'
```
Thereby, the needed training data will be created automatically (may take some time for some ST datasets).
Models can be retrained with:
```
python train.py --cell_type 'cell_type' --mode 'mode' --retrain 'model_name'
```
Trained models are saved into *./models/all/*.
*train_kit-sch-ge.sh* is a bash script for reproducing the training and evaluation of our whole submission (takes some time!).
### Training Data
For the training, 320px-by-320px crops are generated. For Cell Tracking Challenge GT data, the detection GT located in the *TRA* folder are used to examine if all cells in a crop are annotated. Only high quality crops are used (with some exceptions if too few crops are available). For the mixture of GT and ST, the amount of STs is limited. The final training sets with for training required distance transforms are saved into *./train_data/train_sets/*.
If you want to train models on your own data or to apply trained models to your own data, you need to convert your data into the Cell Tracking Challenge format and add the data to *./train_data/* (best put annotated masks into the GT folders *SEG* and *TRA*). Using the parameter *cell_type=name_of_your_folder* in the training will create a training set with crop_size 320px-by-320px and train models with this set.
### Parameters
Defaults are written bold.
- --act_fun / -a: activation function (**'relu'**, 'leakyrelu', 'elu', 'mish').
- --batch_size / -bs: batch size (**8**).
- --cell_type / -ct: cell_type. 'all' will train a model on preselected Cell Tracking Challenge datasets. Multiple cell types can be used.
- --filters / -f: number of kernels (**64 1024**). After each pooling, the number is doubled in the encoder till the maximum is reached.
- --iterations / -i: number of models trained (**1**).
- --loss / -l: loss function ('l1', 'l2', **'smooth_l1'**).
- --mode / -m: type of training data / training mode (**'GT'**, 'ST', 'GT+ST').
- --multi_gpu / -mgpu: use multiple GPUs if available (**True**).
- --norm_method / -nm: normalization layer type (**'bn'**, 'gn', 'in').
- --optimizer / -o: optimizer (**'adam'**, '[ranger](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer)').
- --pool_method / -pm: Pooling method ('max' (maximum pooling), **'conv'** (convolution with stride 2)).
- --pre_train / -pt: auto-encoder pre-training (only for GT and single cell type).
- --retrain / -r: model to retrain.
- --split / -s: data used for train/val split ('kit-sch-ge' (exact reproduction of sets), '01' (use only 01 set for training data creation), '02', **'01+02'**).
### Recommendations and Remarks:
- Use a batch size of 4 or 8. We use 8 with 2 GPUs (effectively 4).
- Use the default settings but try also the Ranger optimizer with mish activation function (-a 'mish' -o 'ranger') instead of Adam and ReLU (-a 'relu' -o 'adam').
- auto-encoder pre-training seems not really to help even if only a few GT are available.
- Use the retraining with care. Only a single parameter group is used which may lead to large changes in the first filters and make the subsequent learned filters useless. A more sophisticated retraining, e.g., retraining only the decoders or use multiple learning rates for multiple parameter groups, may be added in future releases
- The auto-encoder pre-training is always made on both subsets regardless of the *split* parameter.
### Examples
Train a model on a training set created from STs of the subset '01' of the dataset Fluo-N2DL-HeLa:
```
python train.py --cell_type 'Fluo-N2DL-HeLa' --mode 'ST' --split '01'
```
Train 2 models with Ranger and mish on a training dataset made from BF-C2DL-HSC and BF-C2DL-MuSC:
```
python train.py --cell_type 'BF-C2DL-HSC' 'BF-C2DL-MuSC' --mode 'ST' --act_fun 'mish' --optimizer 'ranger' --iterations 2
```
Retrain a model:
```
python train.py --cell_type 'Fluo-N2DL-HeLa' --mode 'ST' --retrain 'models/kit-sch-ge/Fluo-N2DL-HeLa_GT+ST_model'
```
### Training
After downloading the required Cell Tracking Challenge data, new models can be trained with:
## Evaluation
Trained models can be evaluated on the training datasets with (you may need to make the evaluation software executable once):
```
python cell_segmentation.py --train --cell_type "cell_type" --mode "mode"
python eval.py --cell_type 'cell_type' --mode 'mode'
```
Thereby, the needed training data will be created automatically using the train/val splits in *2021_segmentation/segmentation/training/splits* (takes some time). To use new random splits, just delete all json files in the corresponding folder (and the training sets if already created).
The best model (OP_CSB measure for GT & GT+ST, SEG measure calculated on ST for ST) for the selected cell_type and mode will be copied into *./models/best/*. In the corresponding .json files, the best thresholds and the applied scaling factor can be found (and also some other information). The (raw) results of all evaluated models can be found in *./training_data/cell_type/01_RES_model_name_th_seed_th_cell* and *./training_data/cell_type/02_RES_model_name_th_seed_th_cell*.
The batch size and how many models are trained per *cell_type* and *mode* (GT, ST, GT+ST, allGT, allST, allGT+allST, depending on which label type should be used) can be adjusted in *cell_segmentation_train_settings.json*. With the standard setting a model is trained with the Adam optimizer and a model with the [Ranger](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer) optimizer. For the mode "GT", two models are trained each and two Ranger models with an autoencoder pre-training of the encoder.
*eval_kit-sch-ge.sh* is a bash script for the training and evaluation of our whole submission (takes some time!).
*train_eval.sh* is a bash script for the training and evaluation of our whole submission (takes some time!).
### Parameters
- --apply_clahe / -acl: CLAHE pre-processing.
- --artifact_correction / -ac: Motion-based artifact correction post-processing (only for 2D and dense data).
- --batch_size / -bs: batch size (**8**).
- --fuse_z_seeds / -fzs. Fuse seeds in axial direction (only for 3D).
- --mode / -m: type of training data / evaluation mode (**'GT'**, 'ST').
- --models: Models to evaluate.
- --multi_gpu / -mgpu: use multiple GPUs if available (**True**).
- --n_splitting: Threshold of detected cells to apply splitting post-processing (**40**, only 3D).
- --save_raw_pred / -srp: save some raw/distance predictions.
- --scale / -sc: Scale factor (**0**, 0 means that the information is loaded from corresponding training set .json file).
- --split / -s: Subset for evaluation ('01' (use only 01 set), '02', **'01+02'**).
- --th_cell/ -tc: Threshold(s) for adjusting cell size (**0.07**).
- --th_seed / -ts: Threshold(s) for seed extraction (**0.45**).
### Evaluation
Trained models can be evaluated on the training datasets with:
### Recommendations and Remarks
- Use a lower batch size for large image sizes or 3D data depending on your VRAM.
- If you want to evaluate on your own data, the dataset name / cell type should include '2D' for 2D data and '3D' for 3D data.
- All models which begin with --models will be evaluated and the best model will be selected and copied to ./models/best/'
- Some cell types are excluded for finding the best model evaluations with more than 1 cell type given (since they are quite different and the idea is to find a better model for the remaining cell types).
- A list with metrics for each subset and cell type of each model can be found after the evaluation at ./models/best/.
### Examples
Evaluate all models which begin with 'BF-C2DL-HSC_GT+ST' (in ./models/all) for multiple thresholds:
```
python eval.py --cell_type 'BF-C2DL-HSC' --mode 'GT' --artifact_correction --th_cell 0.07 0.09 --th_seed 0.35 0.45
```
Evaluate the models on multiple cell types and select the model which performs best:
```
python cell_segmentation.py --evaluate --cell_type "cell_type" --mode "mode"
python eval.py --cell_type 'BF-C2DL-HSC' 'BF-C2DL-MuSC' --mode 'GT' --artifact_correction
```
Some raw predictions can be saved with *--save_raw_pred*. The batch size can be set with *--batch_size $int*. For some cell types an artifact correction (*--artifact_correction*) or the fusion of seeds (in *z* direction, --fuse_z_seeds) can be helpful. The mask and marker thresholds to be evaluated can be found in *cell_segmentation.py*. For the settings ST and allST the SEG score calculated on the provided STs is used to find the best model. For the other cases, the OP_CSB is used on the provided GT data.
The best models are copied to *cell_tracking_challenge/kit-sch-ge_2021/SW/models*. In the corresponding json files, the best thresholds and the applied scaling factor can be found (and also some other information). The results of the best model are copied to *cell_tracking_challenge/training_datasets/cell_type/Kit-Sch-GE_2021/mode/csb/*. The other results can be found in the specified result_path.
## Inference
For inference, select a model and run:
```
python infer.py --cell_type 'cell_type' --model 'model'
```
The results can be found in ./challenge_datasets/cell_type.
*inference_kit-sch-ge.sh* is a bash script to reproce our results.
### Parameters
- --apply_clahe / -acl: CLAHE pre-processing.
- --artifact_correction / -ac: Motion-based artifact correction post-processing (only for 2D and dense data).
- --batch_size / -bs: batch size (**8**).
- --fuse_z_seeds / -fzs. Fuse seeds in axial direction (only for 3D).
- --model: Model to use.
- --multi_gpu / -mgpu: use multiple GPUs if available (**True**).
- --n_splitting: Threshold of detected cells to apply splitting post-processing (**40**, only 3D).
- --save_raw_pred / -srp: save some raw/distance predictions.
- --scale / -sc: Scale factor (**0**, 0 means that the information is loaded from corresponding training set .json file).
- --split / -s: Subset for evaluation ('01' (use only 01 set), '02', **'01+02'**).
- --th_cell/ -tc: Threshold for adjusting cell size (**0.07**).
- --th_seed / -ts: Threshold for seed extraction (**0.45**).
### Recommendations and Remarks
- Use a lower batch size for large image sizes or 3D data depending on your VRAM.
- If you want to process on your own data, the dataset name / cell type should include '2D' for 2D data and '3D' for 3D data.
- Like for the training datasets, your own data need to be in the Cell Tracking Challenge format and lie in ./challenge_data/ (no ground truths needed this time).
### Inference
For inference, use the copied best models and the corresponding parameters:
### Examples
Process multiple datasets with the same model and save some raw predictions
```
python cell_segmentation.py --inference --cell_type "cell_type" --mode "mode" --save_raw_pred --batch_size $int --th_cell $float --th_seed $float (--artifact_correction --fuse_z_seeds --apply_clahe --scale $float --multi_gpu)
python eval.py --cell_type 'BF-C2DL-HSC' 'BF-C2DL-MuSC' --model 'best/BF-C2DL-HSC_GT_01+02_model' --save_raw_pred
```
*inference.sh* is a bash script with the parameters we used for our submission (use also our trained models).
## Publication ##
## Releases
### 1.0
This release contains our original code for our Cell Tracking Challenge contribution.
### 2.0
This release improves the usability of our code, e.g., training/retraining. In addition, some subtle changes have been made in the training data creation. However, the original training data sets can still be used usion the parameter *split'.
## Publications
T. Scherr, K. Löffler, M. Böhland, and R. Mikut (2020). Cell Segmentation and Tracking using CNN-Based Distance Predictions and a Graph-Based Matching Strategy. PLoS ONE 15(12). DOI: [10.1371/journal.pone.0243219](https://doi.org/10.1371/journal.pone.0243219).
## License ##
T. Scherr, K. Löffler, O. Neumann, and R. Mikut (2021). On Improving an Already Competitive Segmentation Algorithm for the Cell Tracking Challenge - Lessons Learned. bioRxiv. DOI: [10.1101/2021.06.26.450019](https://doi.org/10.1101/2021.06.26.450019).
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.
\ No newline at end of file
This diff is collapsed.
{
"methods":
[
[["DU", "conv", "relu", "bn", [64, 1024]], "distance", "adam", "smooth_l1", null],
[["DU", "conv", "mish", "bn", [64, 1024]], "distance", "ranger", "smooth_l1", null],
[["DU", "conv", "mish", "bn", [64, 1024]], "distance", "ranger", "smooth_l1", true]
],
"batch_size": 8,
"batch_size_auto": 2,
"iterations": 1,
"iterations_GT_single_celltype": 2
}
\ No newline at end of file
Directory for the data for inference.
Cell Tracking Challenge data:
http://celltrackingchallenge.net/2d-datasets/
http://celltrackingchallenge.net/3d-datasets/
\ No newline at end of file
import os
import requests
import zipfile
from pathlib import Path
def download_data(url, target):
local_filename = target / url.split('/')[-1]
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename
if __name__ == "__main__":
traindata_path = Path.cwd() / 'training_data'
challengedata_path = Path.cwd() / 'challenge_data'
evalsoftware_path = Path.cwd() / 'evaluation_software'
trainingdata_url = 'http://data.celltrackingchallenge.net/training-datasets/'
challengedata_url = 'http://data.celltrackingchallenge.net/challenge-datasets/'
evalsoftware_url = 'http://public.celltrackingchallenge.net/software/EvaluationSoftware.zip'
cell_types = ["BF-C2DL-HSC", "BF-C2DL-MuSC", "DIC-C2DH-HeLa", "Fluo-C2DL-Huh7", "Fluo-C2DL-MSC", "Fluo-C3DH-A549",
"Fluo-C3DH-H157", "Fluo-C3DL-MDA231", "Fluo-N2DH-GOWT1", "Fluo-N2DL-HeLa", "Fluo-N3DH-CE",
"Fluo-N3DH-CHO", "PhC-C2DH-U373", "PhC-C2DL-PSC", "Fluo-C3DH-A549-SIM", "Fluo-N2DH-SIM+",
"Fluo-N3DH-SIM+"]
for cell_type in cell_types:
# Download training set
if not (traindata_path / cell_type).is_dir():
print('Downloading {} training set ...'.format(cell_type))
download_data(url="{}{}.zip".format(trainingdata_url, cell_type), target=traindata_path)
# Unzip training set
print('Unzip {} training set ...'.format(cell_type))
with zipfile.ZipFile(traindata_path / "{}.zip".format(cell_type), 'r') as z:
z.extractall('training_data')
# Remove zip
os.remove(traindata_path / "{}.zip".format(cell_type))
# Download challenge set
if not (challengedata_path / cell_type).is_dir():
print('Downloading {} challenge set ...'.format(cell_type))
download_data(url="{}{}.zip".format(challengedata_url, cell_type), target=challengedata_path)
# Unzip challenge set
print('Unzip {} challenge set ...'.format(cell_type))
with zipfile.ZipFile(challengedata_path / "{}.zip".format(cell_type), 'r') as z:
z.extractall('challenge_data')
# Remove zip
os.remove(challengedata_path / "{}.zip".format(cell_type))
# Download evaluation software
if len(list(evalsoftware_path.glob('*'))) == 0:
print('Downloading evaluation software ...')
download_data(url=evalsoftware_url, target=evalsoftware_path)
# Unzip evaluation software
print('Unzip evaluation software ...')
with zipfile.ZipFile(evalsoftware_path / evalsoftware_url.split('/')[-1], 'r') as z:
z.extractall('evaluation_software')
# Remove zip
os.remove(evalsoftware_path / evalsoftware_url.split('/')[-1])
import os
import requests
import zipfile
from pathlib import Path
from shutil import move, rmtree
def download_data(url, target):
local_filename = target / url.split('/')[-1]
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename
if __name__ == "__main__":
model_path = Path.cwd() / 'models' / 'kit-sch-ge'
model_url = 'http://public.celltrackingchallenge.net/participants/KIT-Sch-GE%20(2).zip'
# Download training set
if len(list(model_path.glob('*'))) == 0:
print('Downloading models ...')
download_data(url=model_url, target=model_path)
# Unzip training set
print('Unzip models ...')
with zipfile.ZipFile(model_path / model_url.split('/')[-1], 'r') as z:
z.extractall('models/kit-sch-ge')
# Remove zip
os.remove(model_path / model_url.split('/')[-1])
# Move models
for file_name in (model_path / 'KIT-Sch-GE (2)' / 'models').glob('*'):
move(str(file_name), str(model_path))
# Delete Software (not needed here)
rmtree(str(model_path / 'KIT-Sch-GE (2)'))
import argparse
import numpy as np
import random
import torch
import warnings
from pathlib import Path
from segmentation.inference.inference import inference_2d_ctc, inference_3d_ctc
from segmentation.training.create_training_sets import get_file, write_file
from segmentation.utils.metrics import count_det_errors, ctc_metrics
from segmentation.utils import utils
warnings.filterwarnings("ignore", category=UserWarning)
class EvalArgs(object):
""" Class with post-processing parameters.
"""
def __init__(self, th_cell, th_seed, n_splitting, apply_clahe, scale, cell_type, save_raw_pred,
artifact_correction, fuse_z_seeds):
"""
:param th_cell: Mask / cell size threshold.
:type th_cell: float
:param th_seed: Seed / marker threshold.
:type th_seed: float
:param n_splitting: Number of detected cells above which to apply additional splitting (only for 3D).
:type n_splitting: int
:param apply_clahe: Apply contrast limited adaptive histogram equalization (CLAHE).
:type apply_clahe: bool
:param scale: Scale factor for downsampling.
:type scale: float
:param cell_type: Cell type.
:type cell_type: str
:param save_raw_pred: Save (some) raw predictions.
:type save_raw_pred: bool
:param artifact_correction: Apply artifact correction post-processing.
:type artifact_correction: bool
:param fuse_z_seeds: Fuse seeds in z-direction / axial direction.
:type fuse_z_seeds: bool
"""
self.th_cell = th_cell
self.th_seed = th_seed
self.n_splitting = n_splitting
self.apply_clahe = apply_clahe
self.scale = scale
self.cell_type = cell_type
self.save_raw_pred = save_raw_pred
self.artifact_correction = artifact_correction
self.fuse_z_seeds = fuse_z_seeds
def main():
random.seed()
np.random.seed()
# Get arguments
parser = argparse.ArgumentParser(description='KIT-Sch-GE 2021 Cell Segmentation - Evaluation')
parser.add_argument('--apply_clahe', '-acl', default=False, action='store_true', help='CLAHE pre-processing')
parser.add_argument('--artifact_correction', '-ac', default=False, action='store_true', help='Artifact correction')
parser.add_argument('--batch_size', '-bs', default=8, type=int, help='Batch size')
parser.add_argument('--cell_type', '-ct', nargs='+', required=True, help='Cell type(s)')
parser.add_argument('--fuse_z_seeds', '-fzs', default=False, action='store_true', help='Fuse seeds in axial direction')
parser.add_argument('--mode', '-m', default='GT', type=str, help='Ground truth type / evaluation mode')
parser.add_argument('--models', required=True, type=str, help='Models to evaluate (prefix)')
parser.add_argument('--multi_gpu', '-mgpu', default=True, action='store_true', help='Use multiple GPUs')
parser.add_argument('--n_splitting', '-ns', default=40, type=int, help='Cell amount threshold to apply splitting post-processing (3D)')
parser.add_argument('--save_raw_pred', '-srp', default=False, action='store_true', help='Save some raw predictions')
parser.add_argument('--scale', '-sc', default=0, type=float, help='Scale factor (0: get from trainset info.json')
parser.add_argument('--subset', '-s', default='01+02', type=str, help='Subset to evaluate on')
parser.add_argument('--th_cell', '-tc', default=0.07, nargs='+', help='Threshold for adjusting cell size')
parser.add_argument('--th_seed', '-ts', default=0.45, nargs='+', help='Threshold for seeds')
args = parser.parse_args()
# Paths
path_data = Path.cwd() / 'training_data'
path_models = Path.cwd() / 'models' / 'all'
path_best_models = Path.cwd() / 'models' / 'best'
path_ctc_metric = Path.cwd() / 'evaluation_software'
# Set device for using CPU or GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if str(device) == 'cuda':
torch.backends.cudnn.benchmark = True
if args.multi_gpu:
num_gpus = torch.cuda.device_count()
else:
num_gpus = 1
# Check if dataset consists in training_data folder
if len(args.cell_type) > 1:
es = 0
for cell_type in args.cell_type:
if not (path_data / cell_type).exists():
print('No data for cell type "{}" found in {}'.format(cell_type, path_data))
es = 1
if es == 1:
return
trainset_name = hash(tuple(sorted(args.cell_type)))
else:
if not (args.cell_type[0] == 'all') and not (path_data / args.cell_type[0]).exists():
print('No data for cell type "{}" found in {}'.format(args.cell_type[0], path_data))
return
trainset_name = args.cell_type[0]
# Get cell types / datasets to evaluate
cell_type_list = args.cell_type
if args.cell_type[0] == 'all':
# Use cell types included in the primary track
cell_type_list = ["BF-C2DL-HSC", "BF-C2DL-MuSC", "DIC-C2DH-HeLa", "Fluo-C2DL-MSC", "Fluo-C3DH-A549",
"Fluo-C3DH-H157", "Fluo-C3DL-MDA231", "Fluo-N2DH-GOWT1", "Fluo-N2DL-HeLa", "Fluo-N3DH-CE",
"Fluo-N3DH-CHO", "PhC-C2DH-U373", "PhC-C2DL-PSC"]
# Check if evaluation metric is available
if not path_ctc_metric.is_dir():
raise Exception('No evaluation software found. Run the skript download_data.py')
# Get models and cell types to evaluate
models = sorted(path_models.glob("{}*.pth".format(args.models)))
if len(models) == 0:
raise Exception('No models to evaluate found.')
if not isinstance(args.th_seed, list):
args.th_seed = [args.th_seed]
if not isinstance(args.th_cell, list):
args.th_seed = [args.th_cell]
# Go through model list and evaluate for stated cell_types
metric_scores = {}
for model in models:
metric_scores[model.stem] = {}
for ct in cell_type_list:
metric_scores[model.stem][ct] = {}
train_sets = [args.subset]
if args.subset in ['kit-sch-ge', '01+02']:
train_sets = ['01', '02']
for train_set in train_sets:
metric_scores[model.stem][ct][train_set] = {}
# Get scale from training dataset info if not stated otherwise
scale_factor = args.scale
if args.scale == 0:
scale_factor = get_file(path_data / model.stem.split('_model')[0] / "info.json")['scale']
# Go through thresholds
for th_seed in args.th_seed:
metric_scores[model.stem][ct][train_set][str(th_seed)] = {}
for th_cell in args.th_cell:
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)] = {}
print('Evaluate {} on {}_{}: th_seed: {}, th_cell: {}'.format(model.stem, ct, train_set,
th_seed, th_cell))
path_seg_results = path_data / ct / "{}_RES_{}_{}_{}".format(train_set, model.stem, th_seed, th_cell)
path_seg_results.mkdir(exist_ok=True)
# Check if results already exist
if (path_seg_results / "SEG_log.txt").exists():
if args.mode == 'ST': # ST only evaluated with SEG metric
det_measure, so, fnv, fpv = 0, np.nan, np.nan, np.nan
else:
det_measure, so, fnv, fpv = count_det_errors(path_seg_results / "DET_log.txt")
seg_measure = utils.get_seg_score(path_seg_results / "SEG_log.txt")
op_csb = (det_measure + seg_measure) / 2
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['DET'] = det_measure
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['SEG'] = seg_measure
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['OP_CSB'] = op_csb
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['SO'] = so
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['FPV'] = fpv
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['FNV'] = fnv
continue
# Get post-processing settings
eval_args = EvalArgs(th_cell=float(th_cell), th_seed=float(th_seed), n_splitting=args.n_splitting,
apply_clahe=args.apply_clahe, scale=scale_factor, cell_type=ct,
save_raw_pred=args.save_raw_pred,
artifact_correction=args.artifact_correction,
fuse_z_seeds=args.fuse_z_seeds)
if '2D' in ct:
inference_2d_ctc(model=model,
data_path=path_data / ct / train_set,
result_path=path_seg_results,
device=device,
batchsize=args.batch_size,
args=eval_args,
num_gpus=num_gpus)
else:
inference_3d_ctc(model=model,
data_path=path_data / ct / train_set,
result_path=path_seg_results,
device=device,
batchsize=args.batch_size,
args=eval_args,
num_gpus=num_gpus)
seg_measure, det_measure = ctc_metrics(path_data=path_data / ct,
path_results=path_seg_results,
path_software=path_ctc_metric,
subset=train_set,
mode=args.mode)
if args.mode == 'ST':
so, fnv, fpv = np.nan, np.nan, np.nan
else:
_, so, fnv, fpv = count_det_errors(path_seg_results / "DET_log.txt")
op_csb = (det_measure + seg_measure) / 2
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['DET'] = det_measure
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['SEG'] = seg_measure
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['OP_CSB'] = op_csb
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['SO'] = so
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['FPV'] = fpv
metric_scores[model.stem][ct][train_set][str(th_seed)][str(th_cell)]['FNV'] = fnv
# Save evaluation metric scores
write_file(metric_scores, path_best_models / "metrics_{}_models_on_{}.json".format(args.models, trainset_name))
# Get best model and copy to ./models/best_model
best_op_csb, best_th_cell, best_th_seed, best_model = utils.get_best_model(metric_scores=metric_scores,
mode="all" if len(args.cell_type) > 1 else "single",
subset=args.subset,
th_cells=args.th_cell,
th_seeds=args.th_seed)
best_settings = {'th_cell': best_th_cell,
'th_seed': best_th_seed,
'scale_factor': scale_factor,
'OP_CSB': best_op_csb}
utils.copy_best_model(path_models=path_models,
path_best_models=path_best_models,
best_model=best_model,
best_settings=best_settings)