Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ rmd_files: [
'course_cheatsheet.Rmd',
'webscraping_table_xpath.Rmd',
'basic_visualization_cheatsheet.Rmd',
'ggplot2_cheatsheet.Rmd',

# Part: Tutorials
'visualizing_geographic_time_series_with_messy_country_name.Rmd', # contains PART header
Expand Down
129 changes: 129 additions & 0 deletions ggplot2_cheatsheet.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# ggplot2_cheatsheet

Lyu Peng

```{r, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```

```{r}
library(tidyverse)
library(ggplot2)
```


## Overview
As a statistic tool, r is more common used in statistical learning, however, for most people, they are not statistic experts so that they do not have strong ability and motivation to use r. i will provide some details to r using and explain how to help people who know python to learn r is an important topic, especially in statistical graphing.

## Introduce the rule graphing:

First we have to determine which type of graphs we want, there are several majority plots that we usually draw, scatterplot, histogram, barplot, boxplot, heatmap, etc... for r, people usually use package "ggplot2" to get their graphs, but first of all, we need to do data preprocessing.

### Scatter plot

```{r}
ggplot(mtcars, aes(mpg, disp)) + geom_point()
```

### Bar plot

```{r}
ggplot(quakes, aes(x = depth)) + geom_bar()
```

### Histogram

```{r}
ggplot(data = mpg) +
geom_histogram(aes(x = hwy))
```

### Boxplot

```{r}
ggplot(mpg) +
geom_boxplot(aes(x = class, y = hwy))
```

There are two parts we need to care about, which are `ggplot()` and `geom_point()`, `aes()` shows aesthetics, which should include data for x axis, and data for y axis. by adjusting the words after "geom_" with "bar","line","point","boxplot","histogram". etc...

### Adding rules

If want to create labels or title or theme for your graph, we can by using "+" to combine our command, which is a common rule for r graphing, for example, with `labs()` and `theme_classic()`.
```{r}
ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, color = class)) +
labs(title = "mpg",
subtitle = "disply vs hwy",
x = "x",
y = "y",
color = "Class") +
theme_classic(16)
```

### Ordering

If want to show the data in some orders, there are some functions that we can show x or y with some order in graphing, use `fct_reorder()`, `fct_inorder()`, `fct_relevel()`, `fct_infreq()`, `fct_rev()`, etc...
```{r}
ggplot(mpg) +
geom_point(aes(x = fct_infreq(fl), y = hwy, color = class)) +
labs(title = "mpg",
subtitle = "fl vs hwy",
x = "x",
y = "y",
color = "Class") +
theme_classic(16)
```

## Facet

If want to generate multiply graphs by some criterias, in other word, facet by multiple variables, people can use `facet_warp()` and `facet_grid()` to make several graphs in some criterias, for which the description of them is not well documented.

### Facet_grid

```{r}
p <- ggplot(mpg, aes(displ, cty)) + geom_point()
p + facet_grid(rows = vars(drv))
p + facet_grid(cols = vars(cyl))
p + facet_grid(vars(drv), vars(cyl))
```

We see `facet_grid()` can generate columns and rows seperately or simultaneously, which facet in a 2d grid of panels.

### Facet_warp()
```{r}
p <- ggplot(mpg, aes(displ, cty)) + geom_point()
p + facet_wrap(vars(drv))
p + facet_wrap(vars(cyl))
# You can facet by multiple variables
ggplot(mpg, aes(displ, cyl)) +
geom_point() +
facet_wrap(vars(cyl, drv))
# it only generate 9 labels
```
From above, we know `facet_warp()` can return less labels, even though they have same parameters, but does not loss any information. we can say facet_warp() and facet_grid() are quite similar functions, which can represent the same data, but with different labels or panels.

## python

Python use "numpy","pandas","sklearn" for datapreprocessing.

### package and function for graphing

The predominant package for graphing called "matplotlib.pyplot", which includes many functions to help graphing.
There are two options that we can show a graph:
~ `plt.show()`
~ `%matplotlib inline`

For goal to graph a line plot,people can use build-in function.
~`plot()`
~`bar()`
~`hist()`
~`scatter()`
~`pie()`
To setup xlab and ylab, python can use
~ `plt.xlabels()`
~ `plt.ylabels()`