Commit 665e492f authored by tills's avatar tills
Browse files
parents 4a701c92 26d70cd1
......@@ -36,61 +36,91 @@
## Special Stuff
#### Content of Jupyter Notebooks
###### 1 data_preprocessing
###### 1 Preprocessing
Methods for the preprocessing steps
**Collection_Detection**
- reverse_data (correct historical order)
- dec_data (decodes the data, removes letters and transforms into integer)
- remove_outliers (temperature outliers are removed)
- write_preprocessed_data (write dataframe into .csv file)
- detect_collection (detects trash collection when sudden changes in height occur)
- get_collection_data (summarize data since last collection)
File to detect collections based on sudden changes occurring in the fill height of the container.
The preprocessed data and collection data are written into a new file in data/preprocessed
If a threshold of the pre_height difference between to time periods is reached a collection is detected.
Then, data before a detected collection are saved in a separate data frame and written into one file in data\preprocessed.
*Example of a collection detection:*
###### 2 exploring_data
![image-20210705112346995](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-20210705112346995.png)
Notebook to explore the preprocessed data of the containers with different filter options:
**Training_Data**
![image-20210705104754696](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705104754696.png)
In this file, the outlier containers are excluded for modeling purposes. The thresholds are specified and explained in the notebook and the resulting data is saved in \data\modeling\train.
###### 3 data_visualization_one node
###### 2 Data Exploration
Visualization of the data of one container node (based on lecture):
**Interactive Exploration**
![image-20210705112452614](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112452614.png)
Interactive notebook to explore the preprocessed data of the containers with different filter options:
###### 4 detect_collections
![image-20210705104754696](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-20210705104754696.png)
Notebook to detect collections based on the height difference.
###### 3 Modelling
If a threshold of the pre_height difference between to time periods is reached a collection is detected.
The collections and relevant data are collected and summarized in an extra data file.
**Modell_Clustering**
First Model: 2-dim. Clustering with pre height before emptying and mean emptying interval.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107151.PNG)
Second Model: 3-dim. Clustering with pre height before emptying, mean emptying interval and amount of emptying.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107152.PNG)
**Regression**
###### 4 Visualization
Example of a collection detection:
**Correlation**
![image-20210705112346995](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112346995.png)
Pearson Correlation of features (height, collection, temperature, weather, holiday, lockdown)
###### 5 modell_clustering
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107153.PNG)
Clustering of the containers based on mean emptying intevals and pre_height before empything
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112027118.png)
###### 6 container_map
Visualization of the containers on a map
![image-20210705111842890](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705111842890.png)
**Lockdown**
Shows the effect of the lockdown on the pre height before a collection and the emptying intervals.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107154.PNG)
**Optimal Collection**
The optimal collections are exemplary predicted for three different containers.
*This is the example of one container:*
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107155.PNG)
**General**
###### 7 data_visualization
Notebook for the visualization of available data.
Notebook for the further visualization of available data.
Structure:
......@@ -100,6 +130,8 @@ Structure:
- Features of raw data and decoded data
- Container overview on map
![image-20210705111842890](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-20210705111842890.png)
7.2 Data Quality
- Container Height
......@@ -110,24 +142,25 @@ Structure:
- Emptying intervals
![image-20210705114505018](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114505018.png)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107158.PNG)
- Mean pre_height of container before emptying
![image-20210705114521409](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114521409.png)
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107157.PNG)
7.4 Derived Insights with Clusters
- Emptying intervals
- Mean pre_height of container before emptying
![image-20210705112548071](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705112548071.png)
- Boxplot of pre_height
![image-20210705114521409](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/Sophia/notebooks/pictures/image-20210705114521410.png)
7.4 Derived Insights with Clusters
The boxplot visualization gives interesting insights about the pre height of containers before being emptied. Looking at the boxplot of the second clustering model one can see, that in most of the clusters there exists a lot of potential in finding the correct emptying intervals as the pre height are in average very low.
![image-20210705112027118](https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template/-/raw/master/notebooks/pictures/image-202107156.PNG)
......@@ -136,7 +169,7 @@ Structure:
1 Clone Repository
```
git clone https://git.scc.kit.edu/urkid/bda-analytics-challenge-template.git
git clone https://git.scc.kit.edu/ufesk/bda-analytics-challenge-template.git
```
2 Go to folder
......
,Container_id,Cluster
0,70B3D500700016DA,0
1,70B3D500700016DF,0
2,70B3D500700016E0,0
3,70B3D500700016E5,0
4,70B3D500700016E6,0
5,70B3D500700016E7,0
6,70B3D500700016EB,-1
7,70B3D500700016EE,0
8,70B3D500700016F1,0
9,70B3D500700016F2,-1
10,70B3D500700016F4,1
11,70B3D500700016F6,0
12,70B3D500700016F7,0
13,70B3D500700016FA,0
14,70B3D500700016FC,0
15,70B3D50070001701,0
16,70B3D50070001704,-1
17,70B3D50070001706,0
18,70B3D50070001708,0
19,70B3D50070001709,0
20,70B3D5007000170F,0
21,70B3D50070001710,0
22,70B3D50070001712,-1
23,70B3D50070001713,0
24,70B3D50070001714,0
25,70B3D50070001715,-1
26,70B3D50070001716,0
27,70B3D5007000171A,0
28,70B3D5007000171E,0
29,70B3D5007000171F,0
30,70B3D50070001722,0
31,70B3D50070001724,0
32,70B3D50070001725,0
33,70B3D50070001726,0
34,70B3D50070001727,-1
35,70B3D5007000172B,-1
36,70B3D5007000172C,-1
37,70B3D5007000172D,-1
38,70B3D5007000172E,-1
39,70B3D50070001730,0
40,70B3D50070001733,0
41,70B3D50070001734,0
42,70B3D50070001736,0
43,70B3D50070001737,0
44,70B3D50070001738,0
45,70B3D50070001739,0
46,70B3D5007000173C,0
47,70B3D5007000173E,0
48,70B3D50070001740,1
49,70B3D50070001742,0
50,70B3D50070001743,0
51,70B3D50070001747,0
52,70B3D5007000174D,0
53,70B3D5007000174F,0
54,70B3D50070001750,0
55,70B3D50070001759,0
56,70B3D5007000175A,0
57,70B3D5007000175E,0
58,70B3D50070001764,0
59,70B3D50070001766,0
60,70B3D50070001770,0
61,70B3D50070001772,0
62,70B3D50070001774,0
63,70B3D50070001777,0
64,70B3D50070001779,-1
65,70B3D5007000177C,0
66,70B3D50070001781,0
67,70B3D50070001782,0
68,70B3D50070001786,1
69,70B3D50070001787,0
70,70B3D50070001788,0
71,70B3D50070001789,0
,Container_id,Cluster
0,70B3D500700016DA,-1
1,70B3D500700016DF,0
2,70B3D500700016E0,-1
3,70B3D500700016E5,-1
4,70B3D500700016E6,1
5,70B3D500700016E7,-1
6,70B3D500700016EB,-1
7,70B3D500700016EE,-1
8,70B3D500700016F1,2
9,70B3D500700016F2,-1
10,70B3D500700016F4,-1
11,70B3D500700016F6,-1
12,70B3D500700016F7,-1
13,70B3D500700016FA,-1
14,70B3D500700016FC,-1
15,70B3D50070001701,-1
16,70B3D50070001704,-1
17,70B3D50070001706,-1
18,70B3D50070001708,-1
19,70B3D50070001709,0
20,70B3D5007000170F,-1
21,70B3D50070001710,3
22,70B3D50070001712,-1
23,70B3D50070001713,-1
24,70B3D50070001714,-1
25,70B3D50070001715,-1
26,70B3D50070001716,4
27,70B3D5007000171A,-1
28,70B3D5007000171E,1
29,70B3D5007000171F,-1
30,70B3D50070001722,-1
31,70B3D50070001724,2
32,70B3D50070001725,2
33,70B3D50070001726,2
34,70B3D50070001727,-1
35,70B3D5007000172B,-1
36,70B3D5007000172C,-1
37,70B3D5007000172D,-1
38,70B3D5007000172E,-1
39,70B3D50070001730,2
40,70B3D50070001733,-1
41,70B3D50070001734,0
42,70B3D50070001736,1
43,70B3D50070001737,4
44,70B3D50070001738,3
45,70B3D50070001739,-1
46,70B3D5007000173C,3
47,70B3D5007000173E,1
48,70B3D50070001740,-1
49,70B3D50070001742,2
50,70B3D50070001743,-1
51,70B3D50070001747,4
52,70B3D5007000174D,-1
53,70B3D5007000174F,-1
54,70B3D50070001750,1
55,70B3D50070001759,-1
56,70B3D5007000175A,-1
57,70B3D5007000175E,1
58,70B3D50070001764,-1
59,70B3D50070001766,-1
60,70B3D50070001770,-1
61,70B3D50070001772,-1
62,70B3D50070001774,-1
63,70B3D50070001777,-1
64,70B3D50070001779,-1
65,70B3D5007000177C,-1
66,70B3D50070001781,-1
67,70B3D50070001782,-1
68,70B3D50070001786,-1
69,70B3D50070001787,2
70,70B3D50070001788,-1
71,70B3D50070001789,-1
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment