Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
53dbe88
Merge branch 'main-upd' into develop
nbirillo May 17, 2021
80ea546
Added project template for Java
GirZ0n May 18, 2021
db30732
Added DatasetMarker
GirZ0n May 18, 2021
8c85e36
Code refactoring
GirZ0n May 18, 2021
b1fa21b
Added some words
GirZ0n May 18, 2021
5929f46
Small code refactoring
GirZ0n May 19, 2021
1a34b50
Added new requirements
GirZ0n May 19, 2021
ecbcdf4
Added ID to ColumnName
GirZ0n May 19, 2021
92f9a8a
Added README.md
GirZ0n May 20, 2021
3ef6a42
Added default value for --chunk-size
GirZ0n May 20, 2021
3f028bd
Merge remote-tracking branch 'origin/qodana' into qodana
GirZ0n May 20, 2021
7cd79e0
parse qodana output
nbirillo May 20, 2021
5c4b85a
Merge remote-tracking branch 'origin/develop' into develop
nbirillo May 20, 2021
ec6b477
Merge branch 'develop' into qodana
nbirillo May 20, 2021
a8b80c0
Update README.md
GirZ0n May 20, 2021
75cac7b
Change qodana scipt output
nbirillo May 20, 2021
dd9d502
Merge remote-tracking branch 'origin/develop' into qodana
GirZ0n May 20, 2021
6915254
Merge remote-tracking branch 'origin/qodana' into qodana
nbirillo May 20, 2021
f0c098b
Merge branch 'qodana' into fix/qodana-output
nbirillo May 20, 2021
c428b78
Fix a bug with qodana
nbirillo May 21, 2021
8489c9d
Fix a bug with path to the gradle project
nbirillo May 21, 2021
371f985
Fixed PR issues
GirZ0n May 22, 2021
0dab1b7
Fix/qodana output (#33)
nbirillo May 22, 2021
f9b418d
Added is_java function
GirZ0n May 23, 2021
96c0518
1) Added copy_directory and copy_file functions;
GirZ0n May 23, 2021
38a936a
Removed python_on_whales dependency
GirZ0n May 23, 2021
235e60f
Fixed some PR issues
GirZ0n May 23, 2021
5faa46c
Merge branch 'fix/qodana-output' into qodana
GirZ0n May 23, 2021
32a8010
Moved project_templates to resource folders
GirZ0n May 24, 2021
eced020
Fixed some PR issues
GirZ0n May 24, 2021
4eb0c32
typo fix
GirZ0n May 24, 2021
fc11416
Fixed flake8 issues
GirZ0n May 25, 2021
132b269
Fixed _get_inspections_files function
GirZ0n May 29, 2021
e26161e
Added option not to process subfolders in get_all_file_system_items f…
GirZ0n May 29, 2021
074b5fc
Add penalty statistics (#37)
nbirillo May 31, 2021
8d12c86
Add Qodana handlers (#34)
nbirillo May 31, 2021
b70e63c
Move java template to another resource folder
GirZ0n Jun 1, 2021
bec4ed6
Change template folder
GirZ0n Jun 1, 2021
7d0f7ce
Remove docker package
GirZ0n Jun 1, 2021
1d113d4
Added plotly requirements
GirZ0n Jun 3, 2021
9f5f3da
Added a script for graphing
GirZ0n Jun 3, 2021
a2f6f6c
Added common
GirZ0n Jun 3, 2021
0d36a13
Fix flake8 issues
GirZ0n Jun 3, 2021
01758e5
Add new words
GirZ0n Jun 3, 2021
8f4ab33
Added ability to select output file extension
GirZ0n Jun 3, 2021
1580242
Added 'values' method to Extension
GirZ0n Jun 3, 2021
ab224dc
Merge branch 'develop' into graph-plotter
GirZ0n Jun 3, 2021
73cbd4a
Merge branch 'develop' into graph-plotter
nbirillo Jun 4, 2021
35e0f72
Fix merge conflicts
nbirillo Jun 4, 2021
348470a
Fixed PR issues
GirZ0n Jun 4, 2021
db0cc67
Fixed double quotes
GirZ0n Jun 4, 2021
4ee00e1
Added image extensions
GirZ0n Jun 4, 2021
81977ab
Added 'consts'
GirZ0n Jun 4, 2021
005975b
Added plotly_consts.py
GirZ0n Jun 4, 2021
ed2b9fc
Added parse_yaml function
GirZ0n Jun 5, 2021
c096d4e
Added pyyaml
GirZ0n Jun 5, 2021
75ba546
Removed DEFAULT enum type from MARGIN
GirZ0n Jun 5, 2021
151ce8b
Added update_layout function
GirZ0n Jun 5, 2021
c94b382
Fixed PR issues
GirZ0n Jun 5, 2021
fdfb2c3
Added plotters.py
GirZ0n Jun 5, 2021
1e620f9
Remove TODO
GirZ0n Jun 5, 2021
2980087
Renamed get_supported_image_extensions to get_image_extensions
GirZ0n Jun 5, 2021
f747cce
Code refactoring
GirZ0n Jun 5, 2021
6a9ac78
Fixed arguments order
GirZ0n Jun 5, 2021
ed4c313
Added color support
GirZ0n Jun 5, 2021
c35aef8
Added trailing comma
GirZ0n Jun 5, 2021
36476c7
Added new words
GirZ0n Jun 5, 2021
8a4fb88
Added README.md
GirZ0n Jun 5, 2021
3fae17e
Replaced qualitative sequences with simple colors
GirZ0n Jun 6, 2021
1c09630
Replaced unnecessary words
GirZ0n Jun 6, 2021
5a2086c
Fixed incorrect color setting
GirZ0n Jun 6, 2021
386a9c7
Added examples of graphs
GirZ0n Jun 6, 2021
872a2f2
Moved to another folder
GirZ0n Jun 6, 2021
c75d8d8
Update README.md
GirZ0n Jun 6, 2021
c27b97d
Update README.md
GirZ0n Jun 6, 2021
d6fb2cd
Update README.md
GirZ0n Jun 6, 2021
7269f3c
Update README.md
GirZ0n Jun 6, 2021
898396c
Update README.md
GirZ0n Jun 6, 2021
63de9fc
Added default values
GirZ0n Jun 6, 2021
15c0b8b
Small fix
GirZ0n Jun 6, 2021
40e41e6
Added example of config
GirZ0n Jun 6, 2021
6c03041
Small fix
GirZ0n Jun 6, 2021
b744510
Fixed config example
GirZ0n Jun 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions requirements-evaluation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ openpyxl==3.0.7
pandas==1.2.3
pandarallel
numpy~=1.20.2
plotly
kaleido
pyyaml
85 changes: 85 additions & 0 deletions src/python/evaluation/plots/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Hyperstyle evaluation: plots
This module allows you to visualize the data obtained with the [inspectors](../inspectors) module

## [diffs_plotter.py](diffs_plotter.py)
This script allows you to visualize a dataset obtained with [diffs_between_df.py](../inspectors/diffs_between_df.py).

The script can build the following charts:
* number of unique issues by category ([Example](#number-of-unique-issues-by-category))
* number of issues by category ([Example](#number-of-issues-by-category))
* number of unique penalty issues by category ([Example](#number-of-unique-penalty-issues-by-category))
* number of penalty issues by category ([Example](#number-of-penalty-issues-by-category))
* median penalty influence by category ([Example](#median-influence-on-penalty-by-category))
* distribution of penalty influence by category ([Example](#distribution-of-influence-on-penalty-by-category))

### Usage
Run the [diffs_plotter.py](diffs_plotter.py) with the arguments from command line.

**Required arguments**:
1. `diffs_file_path` — path to a file with serialized diffs that were founded by [diffs_between_df.py](../inspectors/diffs_between_df.py).
2. `save_dir` — directory where the plotted charts will be saved.
3. `config_path` — path to the yaml file containing information about the charts to be plotted. A description of the config and its example is provided in [this section](#config).


**Optional arguments**:

Argument | Description
--- | ---
**‑‑file‑extension** | Allows you to select the extension of output files. Available extensions: `.png`, `.jpg`, `.jpeg`, `.webp`, `.svg`, `.pdf`, `.eps`, `.json`. Default is `.svg`.

### Config
The configuration file is a dictionary in yaml format, where each chart you want to build has its parameters.

**Possible values of the charts**:
* `unique_issues_by_category` to plot the number of unique issues by category
* `issues_by_category` to plot the number of issues by category
* `unique_penalty_issues_by_category` to plot the number of unique penalty issues by category
* `penalty_issues_by_category` to plot the number of penalty issues by category
* `median_penalty_influence_by_category` to plot the median penalty influence by category
* `penalty_influence_distribution` to plot the distribution of penalty influence by category

**Possible parameters**:
Parametr | Description
---|---
**x_axis_name** | Name of the x-axis. The default value depends on the type of chart.
**y_axis_name** | Name of the y-axis. The default value depends on the type of chart.
**limit** | A value that allows you to filter the data before displaying them. </br></br> For charts `unique_issues_by_category`, `issues_by_category`, `unique_penalty_issues_by_category` and `penalty_issues_by_category` only those categories will be shown where the number of issues is greater than or equal to the limit. </br></br> For chart `penalty_issues_by_category` only those categories will be shown where the number of median value is greater than or equal to the limit. </br></br> For chart `penalty_influence_distribution` only those categories will be shown where the number of values is greater than or equal to the limit. </br></br> The default value depends on the type of chart.
**margin** | Defines the outer margin on all four sides of the chart. The available values are specified in the Enum class `MARGIN` from [plots const file](./common/plotly_consts.py). If not specified, the default value provided by Plotly is used.
**sort_order** | Defines the sorting order of the chart. The available values are specified in the Enum class `SORT_ORDER` from [plots const file](./common/plotly_consts.py). If not specified, the default value provided by Plotly is used.
**color** | Defines the color of the chart. The available values are specified in the Enum class `COLOR` from [plots const file](./common/plotly_consts.py). If not specified, the default value provided by Plotly is used.

#### Example of config
```yaml
unique_issues_by_category:
margin: "ZERO"
limit: 10
sort_order: "total descending"
color: "RED"
unique_penalty_issues_by_category:
limit: 30
sort_order: "category ascending"
median_penalty_influence_by_category:
penalty_influence_distribution:
```

The result will be four graphs (`unique_issues_by_category`, `unique_penalty_issues_by_category`, `median_penalty_influence_by_category`, `penalty_influence_distribution`) with the corresponding parameters.

### Examples

#### Number of unique issues by category
<img src="./examples/unique_issues_by_category.png" width="500">

#### Number of issues by category
<img src="./examples/issues_by_category.png" width="500">

#### Number of unique penalty issues by category
<img src="./examples/unique_penalty_issues_by_category.png" width="500">

#### Number of penalty issues by category
<img src="./examples/penalty_issues_by_category.png" width="500">

#### Median influence on penalty by category
<img src="./examples/median_penalty_influence_by_category.png" width="500">

#### Distribution of influence on penalty by category
<img src="./examples/penalty_influence_distribution.png" width="500">
Empty file.
Empty file.
26 changes: 26 additions & 0 deletions src/python/evaluation/plots/common/plotly_consts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from enum import Enum


class MARGIN(Enum):
ZERO = {'l': 0, 'r': 0, 'b': 0, 't': 0}


class SORT_ORDER(Enum): # noqa: N801
CATEGORY_ASCENDING = 'category ascending'
CATEGORY_DESCENDING = 'category descending'
TOTAL_ASCENDING = 'total ascending'
TOTAL_DESCENDING = 'total descending'


class COLOR(Enum):
# Colors from px.colors.DEFAULT_PLOTLY_COLORS
BLUE = "rgb(31, 119, 180)"
ORANGE = "rgb(255, 127, 14)"
GREEN = "rgb(44, 160, 44)"
RED = "rgb(214, 39, 40)"
PURPLE = "rgb(148, 103, 189)"
BROWN = "rgb(140, 86, 75)"
PINK = "rgb(227, 119, 194)"
GRAY = "rgb(127, 127, 127)"
YELLOW = "rgb(188, 189, 34)"
CYAN = "rgb(23, 190, 207)"
131 changes: 131 additions & 0 deletions src/python/evaluation/plots/common/plotters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
from statistics import median
from typing import Any, Callable, Dict, Optional

import pandas as pd
import plotly.graph_objects as go
from src.python.evaluation.inspectors.common.statistics import IssuesStatistics, PenaltyInfluenceStatistics
from src.python.evaluation.plots.common import plotly_consts
from src.python.evaluation.plots.common.utils import create_bar_plot, create_box_plot
from src.python.review.inspectors.issue import IssueType


def _get_dataframe_from_dict(
data_dict: Dict[Any, Any],
key_name: str,
value_name: str,
key_mapper: Callable = lambda x: x,
value_mapper: Callable = lambda y: y,
):
"""
Converts 'data_dict' to a dataframe consisting of two columns: 'key_name', 'value_name'.
'key_name' contains all keys of 'data_dict', 'value_name' contains all corresponding
values of 'data_dict'. With the functions 'key_mapper' and 'value_mapper' you can
additionally convert keys and values respectively.
"""
converted_dict = {
key_name: list(map(key_mapper, data_dict.keys())),
value_name: list(map(value_mapper, data_dict.values())),
}

return pd.DataFrame.from_dict(converted_dict)


def _extract_stats_from_issues_statistics(
statistics: IssuesStatistics, limit: int, only_unique: bool,
) -> Dict[IssueType, int]:
categorized_statistics = statistics.get_short_categorized_statistics()

# If you want to get only unique issues, you should use position 0 of the tuple, otherwise 1.
position = int(not only_unique)

return {
issue_type: stat[position] for issue_type, stat in categorized_statistics.items() if stat[position] >= limit
}


def get_unique_issues_by_category(
statistics: IssuesStatistics,
x_axis_name: str = 'Categories',
y_axis_name: str = 'Number of unique issues',
limit: int = 0,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> go.Figure:
filtered_stats = _extract_stats_from_issues_statistics(statistics, limit, only_unique=True)

df = _get_dataframe_from_dict(
filtered_stats,
key_name=x_axis_name,
value_name=y_axis_name,
key_mapper=lambda issue_type: issue_type.name,
)

return create_bar_plot(df, x_axis_name, y_axis_name, margin, sort_order, color)


def get_issues_by_category(
statistics: IssuesStatistics,
x_axis_name: str = 'Categories',
y_axis_name: str = 'Number of issues',
limit: int = 0,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> go.Figure:
filtered_stats = _extract_stats_from_issues_statistics(statistics, limit, only_unique=False)

df = _get_dataframe_from_dict(
filtered_stats,
key_name=x_axis_name,
value_name=y_axis_name,
key_mapper=lambda issue_type: issue_type.name,
)

return create_bar_plot(df, x_axis_name, y_axis_name, margin, sort_order, color)


def get_median_penalty_influence_by_category(
statistics: PenaltyInfluenceStatistics,
x_axis_name: str = 'Categories',
y_axis_name: str = 'Penalty influence (%)',
limit: int = 0,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> go.Figure:
stat = statistics.stat
filtered_stats = {issue_type: influence for issue_type, influence in stat.items() if median(influence) >= limit}

df = _get_dataframe_from_dict(
filtered_stats,
key_name=x_axis_name,
value_name=y_axis_name,
key_mapper=lambda issue_type: issue_type.name,
value_mapper=lambda influence: median(influence),
)

return create_bar_plot(df, x_axis_name, y_axis_name, margin, sort_order, color)


def get_penalty_influence_distribution(
statistics: PenaltyInfluenceStatistics,
x_axis_name: str = 'Categories',
y_axis_name: str = 'Penalty influence (%)',
limit: int = 0,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
):
stat = statistics.stat
filtered_stats = {issue_type: influence for issue_type, influence in stat.items() if len(influence) >= limit}

df = _get_dataframe_from_dict(
filtered_stats,
key_name=x_axis_name,
value_name=y_axis_name,
key_mapper=lambda issue_type: issue_type.name,
)
df = df.explode(y_axis_name)

return create_box_plot(df, x_axis_name, y_axis_name, margin, sort_order, color)
76 changes: 76 additions & 0 deletions src/python/evaluation/plots/common/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import os
from pathlib import Path
from typing import List, Optional

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from src.python.evaluation.plots.common import plotly_consts
from src.python.review.common.file_system import Extension


def get_supported_extensions() -> List[str]:
extensions = Extension.get_image_extensions()
extensions.append(Extension.JSON)
return [extension.value for extension in extensions]


def create_bar_plot(
df: pd.DataFrame,
x_axis: str,
y_axis: str,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> go.Figure:
fig = px.bar(df, x=x_axis, y=y_axis, text=y_axis)
update_figure(fig, margin, sort_order, color)
return fig


def create_box_plot(
df: pd.DataFrame,
x_axis: str,
y_axis: str,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> go.Figure:
fig = px.box(df, x=x_axis, y=y_axis)
update_figure(fig, margin, sort_order, color)
return fig


def update_figure(
fig: go.Figure,
margin: Optional[plotly_consts.MARGIN] = None,
sort_order: Optional[plotly_consts.SORT_ORDER] = None,
color: Optional[plotly_consts.COLOR] = None,
) -> None:
new_layout = {}

if margin is not None:
new_layout["margin"] = margin.value

if sort_order is not None:
new_layout["xaxis"] = {"categoryorder": sort_order.value}

fig.update_layout(**new_layout)

new_trace = {}

if color is not None:
new_trace["marker"] = {"color": color.value}

fig.update_traces(**new_trace)


def save_plot(
fig: go.Figure,
dir_path: Path,
plot_name: str = "result_plot",
extension: Extension = Extension.SVG,
) -> None:
os.makedirs(dir_path, exist_ok=True)
file = dir_path / f"{plot_name}{extension.value}"
fig.write_image(str(file))
Loading