From ff86499235d47a307bfe80f473113035c94213af Mon Sep 17 00:00:00 2001 From: Danai Kafetzaki Date: Mon, 23 Mar 2020 22:17:14 +0100 Subject: [PATCH] Add files via upload --- pages/vega-in-r/1_setting-things-up.md | 13 + pages/vega-in-r/2_simple-barchart.md | 175 ++++ pages/vega-in-r/3_changing-data.md | 257 +++++ pages/vega-in-r/4_simple-interaction.md | 192 ++++ pages/vega-in-r/5_field-transform.md | 1080 +++++++++++++++++++++ pages/vega-in-r/6_data-transformations.md | 194 ++++ 6 files changed, 1911 insertions(+) create mode 100644 pages/vega-in-r/1_setting-things-up.md create mode 100644 pages/vega-in-r/2_simple-barchart.md create mode 100644 pages/vega-in-r/3_changing-data.md create mode 100644 pages/vega-in-r/4_simple-interaction.md create mode 100644 pages/vega-in-r/5_field-transform.md create mode 100644 pages/vega-in-r/6_data-transformations.md diff --git a/pages/vega-in-r/1_setting-things-up.md b/pages/vega-in-r/1_setting-things-up.md new file mode 100644 index 0000000..86ecad2 --- /dev/null +++ b/pages/vega-in-r/1_setting-things-up.md @@ -0,0 +1,13 @@ +--- +title: Setting things up +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-a-simple-barchart.html +folder: vega-in-r +series: vega-in-r-series +weight: 1 +--- + +Setting things up - to be updated... + +{% include custom/series_vega-in-r_next.html %} diff --git a/pages/vega-in-r/2_simple-barchart.md b/pages/vega-in-r/2_simple-barchart.md new file mode 100644 index 0000000..82787e0 --- /dev/null +++ b/pages/vega-in-r/2_simple-barchart.md @@ -0,0 +1,175 @@ +--- +title: A simple barchart +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-a-simple-barchart.html +folder: vega-in-r +series: vega-in-r-series +weight: 2 +--- +Here is a very simple barchart defined in altair in R. + +
+ + +The dataset of the chart is: + +```R +Var1 = c("a","b","c","d","e") +Var2 = c(11, 19, 22, 8, 14) +Var3 = c("type1","type1","type2","type1","type2") +dataset = data.frame(Var1, Var2, Var3) +``` + +and below is the code used to generate it: + + +```R +chart_1 = + alt$Chart(dataset)$ + mark_bar()$ + encode( + x = "Var1:O", + y = "Var2:Q" + # color = "Var3:N" + )$properties( + height=200, + width=400 + ) +``` + +What is the syntax in the altair R? It is similar to the Python altair and the major difference is the usage of the operator `$` to access attributes, instead of `.`. +We use the object `alt` to access the Altair API and the first basic argument is the `alt$Chart`. +- The `data` to be visualised is called inside the `alt$Chart`. +- The `mark` is specifed after `mark_`. +- The `encoding` determines the mapping between the channels and the data. Here, `x` and `y` are the position channels. The field type is specified after the field name. `O` stands for ordinal and `Q` for quantitative. +- The height and width of the plot is specified inside `properties`. + + +{:.exercise} +**Exercise** - Make yourself comfortable with the syntax of this basic altair chart. Use the color channel for `Var3` to make the chart below. Then, change the mark and add a variable for size. + +
+ + +{:.exercise} +**Exercise** - Change the height and width of the panel and remake the plot above. + + +{% include custom/series_vega-in-r_next.html %} diff --git a/pages/vega-in-r/3_changing-data.md b/pages/vega-in-r/3_changing-data.md new file mode 100644 index 0000000..63e04bc --- /dev/null +++ b/pages/vega-in-r/3_changing-data.md @@ -0,0 +1,257 @@ +--- +title: A case study +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-a-case-study.html +folder: vega-in-r +series: vega-in-r-series +weight: 3 +--- +Let's now use a more realistic example and visualize the natural diasters dataset [https://ourworldindata.org/natural-disasters.html](https://ourworldindata.org/natural-disasters.html). +This dataset is included in the vega_datasets package [https://github.com/vega/vega-datasets.html] (https://github.com/vega/vega-datasets.html). + +You can import the datasets using the altair library. +vega_data = altair::import_vega_data() + +See the list of the available datasets +```R +vega_data$list_datasets() +``` + +and select the one you want to visualise. +```R +data_source = vega_data$disasters() +``` + +Alternatively, you may read the data from a url using: +```R +data_source = read.csv(url("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")) +``` + +or load the data from a local file using standard R code. + +After importing the data, we can take a first look using standard R code: + +```R +str(data_source) +summary(data_source) +head(data_source); tail(data_source) +``` + +We can now make a plot similar to the one at [https://altair-viz.github.io/gallery/natural_disasters.html](https://altair-viz.github.io/gallery/natural_disasters.html) +For now, we may filter tha data in R and use the subset of the data to make the plot in altair R. On the data transformations section we will see how to do the filtering inside the altair specification. + +
+ + + +Below is the code to make this plot. + +```R +data_source_subset = subset(data_source, data_source$Entity != "All natural disasters") + +chart_disasters = + alt$Chart(data_source_subset)$ + mark_circle( + opacity=0.8, + stroke='black', + strokeWidth=1 + )$ + encode( + x = "Year:O", + y = "Entity:N", + color = "Entity:N", + size = "Deaths:Q" + )$properties( + height=200, + width=400 + ) +``` +The global properties of the circles are specified inside the mark attribute and the properties that depend on the data inside the encoding. +Using the mark type `rect` with `color` and `opacity` channels we can make a heatmap plot. + + +```R +chart_disasters = + alt$Chart(data_source_subset)$ + mark_circle( + opacity=0.8, + stroke='black', + strokeWidth=1 + )$ + encode( + x = "Year:O", + y = "Entity:N", + color = "Entity:N", + size = "Deaths:Q" + )$properties( + height=200, + width=400 + ) +``` + + +
+ + +Next, using the code below, we can make a time series plot of deaths from all natural disasters from 1900 until 2017. + +```R +data_source_subset = subset(data_source, data_source$Entity == "All natural disasters") + +chart_disasters = alt$Chart(data_source_subset)$ + mark_line()$ + encode( + x='Year:Q', + y='Deaths:Q', + tooltip = c("Year", "Deaths") + )$properties( + height=300, + width=600 +) +``` + +
+ + + +{:.exercise} +**Exercise** - Use the `color` channel to make a time series plot per Entity. + +{:.exercise} +**Exercise** - Change the field types. What is the result? + +{:.exercise} +**Exercise** - Make a barchart for total deaths per Entity. Hint: You may do the calculation in R or try a calculation inside encoding. + + +{% include custom/series_vega-in-r_next.html %} diff --git a/pages/vega-in-r/4_simple-interaction.md b/pages/vega-in-r/4_simple-interaction.md new file mode 100644 index 0000000..21935a5 --- /dev/null +++ b/pages/vega-in-r/4_simple-interaction.md @@ -0,0 +1,192 @@ +--- +title: Simple Interaction +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-simple-interaction.html +folder: vega-in-r +series: vega-in-r-series +weight: 4 +--- + +One of the main advantages to use the altair package is the fact that supports the generation of interactive graphics. The code required for adding a simple interaction is relatively short. + +# Tooltip + +A tooltip can be added to the plot using `tooltip()` inside encode. For one variable displayed in the tooltip we can use: + +```R +... +tooltip = "Variable_1" +... +``` + +and for more than one variable, we can use the R function c() as illustrated below: + +```R +... +tooltip = c("Variable_1", "Variable_2") +... +``` + +Mind that if you are importing the data from a url directly in the plot specification, you may need to specify the field type. + + +
+ + +{:.exercise} +**Exercise** - Add a tooltip in the heatmap we created in the previous section, to get the graph illustrated above. + + +# Zooming and Panning + +We illustrate two ways of making a graph zoomable and pannable. The first one is by adding the `intreactive()` attribute, as illustrated below: + +```R +$interactive() +``` + +A second option is to specify the selection outside the plot code and then use it inside the `add_selection` attribute in the chart code. +The second option is an interval selection using a scale binding. + +```R +selection = alt$selection_interval(bind='scales') + +chart = alt$Chart(data_source_subset)$ +..... +$add_selection( + selection + ) +``` + +
+ + + +{:.exercise} +**Exercise** - Make the time series plot of all natural distasters interactive, to get the graph illustrated above. Use both ways of making it zoomable and pannable. + +{:.exercise} +**Exercise** - Go through the other selection types supported in altair. [https://altair-viz.github.io/user_guide/generated/api/altair.selection_interval.html#altair.selection_interval.html](https://altair-viz.github.io/user_guide/generated/api/altair.selection_interval.html#altair.selection_interval) + + +{% include custom/series_vega-in-r_next.html %} diff --git a/pages/vega-in-r/5_field-transform.md b/pages/vega-in-r/5_field-transform.md new file mode 100644 index 0000000..a8332ad --- /dev/null +++ b/pages/vega-in-r/5_field-transform.md @@ -0,0 +1,1080 @@ +--- +title: Field Transform +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-field-transform.html +folder: vega-in-r +series: vega-in-r-series +weight: 5 +--- + +Since we are wokring in R, we can modify the data outside the plot specification and then use the modified dataset inside the plot encoding. +However, using the altair package, calculations inside the plot specification can be sometimes easier. In this section, we are discussing field trasforms that can be done inside encoding. +As we have seen from the beginning of this tutorial, the `encoding` determines the mapping between the channels and the data. We have already used encoding channels such as position channels `x` and `y` and mark property channels, for instance, `color` and `opacity`. +We only need to add `bin = TRUE` in the `x` position channel of a quantitative field to use the binned version of the field in the plot. +Below, there is the code to produce a barchart of the sum of deaths versus the binned years. + +```R +data_source_subset = subset(data_source, data_source$Entity == "All natural disasters") + +chart_disasters = alt$Chart(data_source_subset)$ + mark_bar()$ + encode( + alt$X("Year:Q", bin = TRUE), + y='sum(Deaths):Q', + tooltip = 'sum(Deaths):Q' +) +``` + +
+ + +{:.exercise} +**Exercise** - Check the documentation of the binning parameters [https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html](https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html) and increase the value of the maximum number of bins. + +
+ + + +{:.exercise} +**Exercise** - Using `data_source_subset = subset(data_source, data_source$Entity != "All natural disasters")` make a line plot that shows the deaths from all natural disasters versus time. + +{:.exercise} +**Exercise** - Using `data_source_subset = subset(data_source, data_source$Entity != "All natural disasters")` make a heatmap that shows the count of disasters per year, like the one below. + + +
+ + + +Another filed transformation is the one that scales the original field domain to the range we specify. +For instance, we can transform a quantitative field using the log scale. + + +```R +chart_disasters = alt$Chart("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")$ + mark_bar()$encode( + x = 'Entity:N', + alt$Y('sum(Deaths):Q', scale=alt$Scale(type='log')) + )$properties( + height=300, + width=600 +) + +``` + +
+ + + +Fortunately, not in all years from 1900 to 2017 all types of registered disasters occured. Did you notice that in 1904 there is no natural disaster registered? +Let's enrich the dataset in R with a variable for missing values based on the year. + +```R +data_source = read.csv(url("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")) # original data +Year = seq(1900, 2017, 1) # create year vector +Entity = sort(rep(unique(data_source$Entity), 118)) # create entity vector +data_mod = cbind.data.frame(Year, Entity) # create dataframe with complete set of year and entity +data_source_modified = merge(data_source, data_mod, by = c("Year", "Entity"), all = T) # merge df with original data +data_source_modified[is.na(data_source_modified$Deaths),"Deaths"] = 0 # replace NA with zero +data_source_modified$Missing = NULL # create new variable +data_source_modified[data_source_modified$Deaths == 0,"Missing"] = "1" # the value for missing +data_source_modified[data_source_modified$Deaths != 0,"Missing"] = "0" # the value for non-missing +str(data_source_modified) # look at the new data structure +rm(Year, Entity, data_mod) # remove objects that are not needed +``` + +Now we can plot the full time series, and specify a custom color scale for the presence of absence of the year in the data. +So, the domain of the data is `0` and `1` and the custom range is the two colors of our preference. + +```R +domain_color = c("0", "1") +range_color = c('black', 'red') + +data_source_subset = subset(data_source_modified, data_source_modified$Entity == "All natural disasters") + +chart_disasters = alt$Chart(data_source_subset)$ + mark_circle( + opacity=0.8, + size = 50 + )$ + encode( + x='Year:O', + y='Deaths:Q', + color=alt$Color('Missing', scale=alt$Scale(domain=domain_color, range=range_color)), + tooltip = c("Year", "Deaths") + )$properties( + height=300, + width=600 + )$ + interactive() +``` + + +
+ + + + +{% include custom/series_vega-in-r_next.html %} diff --git a/pages/vega-in-r/6_data-transformations.md b/pages/vega-in-r/6_data-transformations.md new file mode 100644 index 0000000..e775d46 --- /dev/null +++ b/pages/vega-in-r/6_data-transformations.md @@ -0,0 +1,194 @@ +--- +title: Data Transformations +keywords: vega-in-r +sidebar: vega-in-r_sidebar +permalink: /vega-in-r-data-transformations.html +folder: vega-in-r +series: vega-in-r-series +weight: 6 +--- + +As mentioned in the documentation of altair, [https://altair-viz.github.io/user_guide/transform/index.html](https://altair-viz.github.io/user_guide/transform/index.html) in most cases, it is suggested to perform transformations outside the chart definition, so in our case using R. Of course, data transforms inside the chart can also be useful in some cases. +So far, we have been filtering the data in R and then using the modified data in the chart specification. Now, we use the `transform_filter()` to subset the data inside the chart. We make the linechart we have seen in a previous section using the code below: + +```R +chart_disasters = alt$Chart("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")$ + mark_line()$ + encode( + x='Year:Q', + y='Deaths:Q', + tooltip = c("Year", "Deaths") + )$properties( + height=300, + width=600 + )$transform_filter( + alt$FieldEqualPredicate(field = "Entity", equal = "All natural disasters") +) +``` + +
+ + + +{:.exercise} +**Exercise** - Use the filter transform to obtain the data related to volcanic activity and earthquake and make an area chart like the one below. Hint: Go through the documentation for Field Predicates at [https://altair-viz.github.io/user_guide/transform/filter.html#user-guide-filter-transform](https://altair-viz.github.io/user_guide/transform/filter.html#user-guide-filter-transform). + +
+ + +We now also use the `transform_window()` to compute and plot a windowed aggregation of the deaths over all available years. + + +```R +chart_disasters = alt$Chart("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")$ + transform_window( + cumulative_count='sum(Deaths)' +)$mark_area()$encode( + x='Year:O', + y='cumulative_count:Q', + tooltip = c("Year:Q", 'cumulative_count:Q') +)$transform_filter( + alt$FieldEqualPredicate(field = "Entity", equal = "All natural disasters") +)$properties( + height=300, + width=600 +) +``` + +
+ + + + +{% include custom/series_vega-in-r_next.html %}