Commit da7cebcc authored by Sophia's avatar Sophia
Browse files

Read me updated

parent af23fa11
......@@ -36,61 +36,87 @@
## Special Stuff
#### Content of Jupyter Notebooks
###### 1 data_preprocessing
###### 1 Preprocessing
Methods for the preprocessing steps
**Collection_Detection**
- reverse_data (correct historical order)
- dec_data (decodes the data, removes letters and transforms into integer)
- remove_outliers (temperature outliers are removed)
- write_preprocessed_data (write dataframe into .csv file)
- detect_collection (detects trash collection when sudden changes in height occur)
- get_collection_data (summarize data since last collection)
File to detect collections based on sudden changes occurring in the fill height of the container.
The preprocessed data and collection data are written into a new file in data/preprocessed
If a threshold of the pre_height difference between to time periods is reached a collection is detected.
Then, data before a detected collection are saved in a separate data frame and written into one file in data\preprocessed.
*Example of a collection detection:*
![image-20210705112346995](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112346995.png)
**Training_Data**
In this file, the outlier containers are excluded for modeling purposes. The thresholds are specified and explained in the notebook and the resulting data is saved in \data\modeling\train.
###### 2 exploring_data
###### 2 Data Exploration
Notebook to explore the preprocessed data of the containers with different filter options:
**Interactive Exploration**
Interactive notebook to explore the preprocessed data of the containers with different filter options:
![image-20210705104754696](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705104754696.png)
###### 3 data_visualization_one node
###### 3 Modelling
Visualization of the data of one container node (based on lecture):
**Modell_Clustering**
![image-20210705112452614](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112452614.png)
First Model: 2-dim. Clustering with pre height before emptying and mean emptying interval.
###### 4 detect_collections
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107151.png)
Notebook to detect collections based on the height difference.
If a threshold of the pre_height difference between to time periods is reached a collection is detected.
The collections and relevant data are collected and summarized in an extra data file.
Example of a collection detection:
Second Model: 3-dim. Clustering with pre height before emptying, mean emptying interval and amount of emptying.
![image-20210705112346995](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112346995.png)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107152.png)
###### 5 modell_clustering
Clustering of the containers based on mean emptying intevals and pre_height before empything
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112027118.png)
**Regression**
###### 6 container_map
Visualization of the containers on a map
![image-20210705111842890](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705111842890.png)
###### 4 Visualization
**Correlation**
Pearson Correlation of features (height, collection, temperature, weather, holiday, lockdown)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107153.png)
**Lockdown**
Shows the effect of the lockdown on the pre height before a collection and the emptying intervals.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107154.png)
**Optimal Collection**
The optimal collections are exemplary predicted for three different containers.
*This is the example of one container:*
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107155.png)
**General**
###### 7 data_visualization
Notebook for the visualization of available data.
Notebook for the further visualization of available data.
Structure:
......@@ -100,6 +126,8 @@ Structure:
- Features of raw data and decoded data
- Container overview on map
![image-20210705111842890](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705111842890.png)
7.2 Data Quality
- Container Height
......@@ -110,24 +138,17 @@ Structure:
- Emptying intervals
![image-20210705114505018](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114505018.png)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107158.png)
- Mean pre_height of container before emptying
![image-20210705114521409](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114521409.png)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107157.png)
7.4 Derived Insights with Clusters
- Emptying intervals
- Mean pre_height of container before emptying
![image-20210705112548071](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112548071.png)
- Boxplot of pre_height
![image-20210705114521409](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114521410.png)
The boxplot visualization gives interesting insights about the pre height of containers before being emptied. Looking at the boxplot of the second clustering model one can see, that in most of the clusters there exists a lot of potential in finding the correct emptying intervals as the pre height are in average very low.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-202107156.png)
......@@ -136,7 +157,7 @@ Structure:
1 Clone Repository
```
git clone https://git.scc.kit.edu/urkid/bda-analytics-challenge-template.git
git clone https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template.git
```
2 Go to folder
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment