From bcfbfebb83cb209eb6ea00561ebf4ca7a5d31b59 Mon Sep 17 00:00:00 2001
From: odelmarcelle <delmarcelle.olivier@gmail.com>
Date: Fri, 26 Jun 2020 15:42:53 +0200
Subject: [PATCH] Update bins of sentences

---
 docs/articles/isa.html                        | 177 +++++++++---------
 .../figure-html/unnamed-chunk-28-1.png        | Bin 0 -> 25632 bytes
 vignettes/isa.Rmd                             | 108 ++++++++---
 3 files changed, 172 insertions(+), 113 deletions(-)
 create mode 100644 docs/articles/isa_files/figure-html/unnamed-chunk-28-1.png
diff --git a/docs/articles/isa.html b/docs/articles/isa.html
index cb14f9b..681d4b3 100644
--- a/docs/articles/isa.html
+++ b/docs/articles/isa.html
@@ -162,7 +162,7 @@ <h2 class="hasAnchor">
 <p>The <strong><code>sentometrics</code></strong> package introduces simple functions to quickly compute the sentiment of texts within a corpus. This easy-to-use approach does not prevent more advanced analysis, and the <strong><code>sentometrics</code></strong> functions remain a solid choice for cutting-edge research. This tutorial will present how to go beyond the basic <strong><code>sentometrics</code></strong> settings in order to analyse the intratextual sentiment structure of texts.</p>
 <div id="intratextual-sentiment-structure" class="section level3">
 <h3 class="hasAnchor">
-<a href="#intratextual-sentiment-structure" class="anchor"></a>Intratextual Sentiment Structure</h3>
+<a href="#intratextual-sentiment-structure" class="anchor"></a>Intratextual sentiment structure</h3>
 <p>Does the position of positive and negative words within a text matter? That’s a question investigated by <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a> during their research regarding sentiment implied by CEO letters. Based on a large dataset of letters, they analyze how sentiment-bearing words are positioned within the text. They find that CEOs tend to emphasize sentiment at the beginning and the end of their letter, in the hopes of leaving a positive impression to the reader.</p>
 <p>Their results confirm generally accepted theories of linguistics saying that readers remember best the first (primacy effect) and the last (recency effect) portions of a text, and that the end of the text contributes the most to the reader’s final feeling.</p>
 <p>One can wonder whether other types of texts follow a similar structure? Indeed, the world is full of different text media, from Twitter posts to news articles, and most of them are less cautiously written than CEO letters. Let’s investigate together one of these with the help of the <strong><code>sentometrics</code></strong> package!</p>
@@ -173,7 +173,7 @@ <h3 class="hasAnchor">
 <p>As part of this tutorial, you will learn how to:</p>
 <ul>
 <li>Decompose your texts into <em>bins</em> (equal-sized containers of words) or sentences.</li>
-<li>Compute sentiments with a variety of weighting schemes.</li>
+<li>Compute sentiment with a variety of weighting schemes.</li>
 <li>Create and use your own weighting scheme for a classification task.</li>
 </ul>
 </div>
@@ -194,13 +194,13 @@ <h2 class="hasAnchor">
 ##  -1   1 
 ## 605 344</code></pre>
 <p>The variable <code>s</code> indicates whether the news is more positive or negative, based on an expert’s opinion. We are going to try to predict this value at the end of the tutorial.</p>
-<p>We can already prepare a <code>sento_corpus</code> and a <code>sento_lexicon</code> for our future sentiment computation. For the <code>sento_corpus</code>, we will also create a <code>dummyFeature</code> filled with 1’s. Since sentiment computations are multiplied by the features of a <code>sento_corpus</code>, we want this dummy feature to observe the whole corpus’s sentiments. This <code>dummyFeature</code> is created by default whenever there’s no feature at the creation of the <code>sento_corpus</code>.</p>
+<p>We can already prepare a <code>sento_corpus</code> and a <code>sento_lexicon</code> for our future sentiment computation. For the <code>sento_corpus</code>, we will also create a <code>dummyFeature</code> filled with 1’s. Since sentiment computations are multiplied by the features of a <code>sento_corpus</code>, we want this dummy feature to observe the whole corpus’s sentiment. This <code>dummyFeature</code> is created by default whenever there’s no feature at the creation of the <code>sento_corpus</code>.</p>
 <p>Finally, we remove the feature <code>s</code> from the <code>sento_corpus</code>, as we do not need it for sentiment computation.</p>
 <div class="sourceCode" id="cb4"><html><body><pre class="r"><span class="no">usnews2Sento</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/sento_corpus.html">sento_corpus</a></span>(<span class="no">usnews2</span>) <span class="co"># note that the feature 's' is automatically re-scaled from {-1;1} to {0;1}</span>
 <span class="no">usnews2Sento</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/add_features.html">add_features</a></span>(<span class="no">usnews2Sento</span>, <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span>(<span class="kw">dummyFeature</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/rep.html">rep</a></span>(<span class="fl">1</span>, <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span>(<span class="no">usnews2Sento</span>))))
 
 <span class="fu"><a href="https://quanteda.io/reference/docvars.html">docvars</a></span>(<span class="no">usnews2Sento</span>, <span class="st">"s"</span>) <span class="kw">&lt;-</span> <span class="kw">NULL</span> <span class="co"># R-removing the feature</span></pre></body></html></div>
-<p>We will use a single lexicon for this analysis, the combined Jockers &amp; Rinker lexicon, obtained from the <strong><code>lexicon</code></strong> package. However, we will prepare a second and different version of this lexicon where the sentiments assigned to words are all positive, regardless of their original signs. This second lexicon will be useful to better detect the sentiment intensity conveyed.</p>
+<p>We will use a single lexicon for this analysis, the combined Jockers &amp; Rinker lexicon, obtained from the <strong><code>lexicon</code></strong> package. However, we will prepare a second and different version of this lexicon where the sentiment assigned to words are all positive, regardless of their original signs. This second lexicon will be useful to better detect the sentiment intensity conveyed.</p>
 <p>We used the <code>data.table</code> operator <code>[]</code> to create the second lexicon in a very efficient way. Most <strong><code>sentometrics</code></strong> objects are based on <code>data.table</code> and this allows to perform complex data transformations. If this is the first time you are seeing the <code>data.table</code> way of using <code>[]</code>, we recommend you to have a look at their <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html">Introduction vignette</a> and enjoy this powerful tool!</p>
 <div class="sourceCode" id="cb5"><html><body><pre class="r"><span class="no">lex</span> <span class="kw">&lt;-</span> <span class="kw pkg">lexicon</span><span class="kw ns">::</span><span class="no"><a href="https://rdrr.io/pkg/lexicon/man/hash_sentiment_jockers_rinker.html">hash_sentiment_jockers_rinker</a></span>
 
@@ -230,10 +230,10 @@ <h2 class="hasAnchor">
 <a href="#a-review-of-sentiment-computation-with-sentometrics" class="anchor"></a>A review of sentiment computation with <strong><code>sentometrics</code></strong>
 </h2>
 <p><code><a href="../reference/compute_sentiment.html">compute_sentiment()</a></code> is at the base of sentiment analysis with <strong><code>sentometrics</code></strong>. That’s also the function we are going to use to analyse intratextual sentiment. This requires, however, to play with the most advanced features of the function. Before doing that, let us review the different computation settings to really understand what’s going on.</p>
-<div id="default-computation---from-words-to-document-sentiments" class="section level3">
+<div id="default-computation---from-words-to-document-sentiment" class="section level3">
 <h3 class="hasAnchor">
-<a href="#default-computation---from-words-to-document-sentiments" class="anchor"></a>Default computation - from words to document sentiments</h3>
-<p>When using the default settings (i.e., only specifying the <code>how</code> argument), the sentiment for each word within a text will be determined according to the provided lexicons. These word sentiments are then aggregated using the method defined by the <code>how</code> argument, aggregating up to the document level to form a sentiment value for the document.</p>
+<a href="#default-computation---from-words-to-document-sentiment" class="anchor"></a>Default computation - from words to document sentiment</h3>
+<p>When using the default settings (i.e., only specifying the <code>how</code> argument), the sentiment for each word within a text will be determined according to the provided lexicons. These word sentiment are then aggregated using the method defined by the <code>how</code> argument, aggregating up to the document level to form a sentiment value for the document.</p>
 <div class="sourceCode" id="cb7"><html><body><pre class="r"><span class="no">sentiment</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/compute_sentiment.html">compute_sentiment</a></span>(<span class="no">usnews2Sento</span>, <span class="no">sentoLexicon</span>, <span class="kw">how</span> <span class="kw">=</span> <span class="st">"proportional"</span>)
 <span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span>(<span class="no">sentiment</span>)</pre></body></html></div>
 <pre><code>##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature
@@ -243,11 +243,11 @@ <h3 class="hasAnchor">
 ## 4: 830981681 1972-01-28        158           0.025316456                0.09493671
 ## 5: 830981684 1973-02-15        174          -0.004022989                0.03160920
 ## 6: 830981702 1973-05-31        227           0.009251101                0.06784141</code></pre>
-<p>In this case, the <code>how = "proportional"</code> simply sum words’ sentiments then divide it by the number of words in a document. The different settings for <code>how</code> can be accessed using the <code><a href="../reference/get_hows.html">get_hows()</a></code> function. We are going to present the use of a more complex setting at the end of this tutorial.</p>
+<p>In this case, the <code>how = "proportional"</code> simply sum words’ sentiment then divide it by the number of words in a document. The different settings for <code>how</code> can be accessed using the <code><a href="../reference/get_hows.html">get_hows()</a></code> function. We are going to present the use of a more complex setting at the end of this tutorial.</p>
 </div>
-<div id="setting-do-sentence-true---from-words-to-sentences-sentiments" class="section level3">
+<div id="setting-do-sentence-true---from-words-to-sentences-sentiment" class="section level3">
 <h3 class="hasAnchor">
-<a href="#setting-do-sentence-true---from-words-to-sentences-sentiments" class="anchor"></a>Setting <code>do.sentence = TRUE</code> - from words to sentences sentiments</h3>
+<a href="#setting-do-sentence-true---from-words-to-sentences-sentiment" class="anchor"></a>Setting <code>do.sentence = TRUE</code> - from words to sentences sentiment</h3>
 <p>A drastic change in the behaviour of <code><a href="../reference/compute_sentiment.html">compute_sentiment()</a></code> can be induced by specifying <code>do.sentence = TRUE</code> in the function call. If true, the output of <code>compute_sentiment</code> will no longer return a sentiment value for each document, but each sentence. Sentiment values within each sentence are still computed using the method provided in the <code>how</code> argument, but the function stops there.</p>
 <div class="sourceCode" id="cb9"><html><body><pre class="r"><span class="no">sentiment</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/compute_sentiment.html">compute_sentiment</a></span>(<span class="no">usnews2Sento</span>, <span class="no">sentoLexicon</span>, <span class="kw">how</span> <span class="kw">=</span> <span class="st">"proportional"</span>, <span class="kw">do.sentence</span> <span class="kw">=</span> <span class="fl">TRUE</span>)
 <span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span>(<span class="no">sentiment</span>)</pre></body></html></div>
@@ -258,13 +258,13 @@ <h3 class="hasAnchor">
 ## 4: 830981632           4 1971-01-12         33            0.01666667                0.04696970
 ## 5: 830981632           5 1971-01-12         16           -0.04687500                0.07812500
 ## 6: 830981632           6 1971-01-12         24            0.04166667                0.06250000</code></pre>
-<p>The new column <code>sentence_id</code> in the output is used to identify the sentences of a single document. This result can be used as-is for analysis at the sentence level, or sentences sentiments can be aggregated to obtain documents sentiments, as in the default setting. One way to aggregate sentences sentiments up to documents sentiments is to use the <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> method of <strong><code>sentometrics</code></strong>.</p>
+<p>The new column <code>sentence_id</code> in the output is used to identify the sentences of a single document. This result can be used as-is for analysis at the sentence level, or sentences sentiment can be aggregated to obtain documents sentiment, as in the default setting. One way to aggregate sentences sentiment up to documents sentiment is to use the <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> method of <strong><code>sentometrics</code></strong>.</p>
 </div>
 <div id="trick-with-bins-in-a-list-do-sentence-and-tokens" class="section level3">
 <h3 class="hasAnchor">
 <a href="#trick-with-bins-in-a-list-do-sentence-and-tokens" class="anchor"></a>Trick with <em>bins</em> in a list, <code>do.sentence</code> and <code>tokens</code>
 </h3>
-<p>Analyzing the sentiment of individual sentences is already a nice approach to observe intra-document sentiment, but sometimes it is better to define a custom container for which sentiments are going to be computed. This is the approach used by <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a>, where they define <em>bins</em>, equal-sized containers of texts. The idea is to divide a document into equal-sized portion and to analyze each of them independently. Let’s say we decide to split a document of 200 words into 10 <em>bins</em>. To do so, we are going to store the first 20 words in the first <em>bin</em>, the words 21 to 40 in the second <em>bin</em>, and so on… This way, each <em>bin</em> will account for 10% of the text. By repeating the procedure for all texts of a corpus, we can easily compare specific text portions (e.g., the first 10%) between multiples documents.</p>
+<p>Analyzing the sentiment of individual sentences is already a nice approach to observe intra-document sentiment, but sometimes it is better to define a custom container for which sentiment are going to be computed. This is the approach used by <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a>, where they define <em>bins</em>, equal-sized containers of texts. The idea is to divide a document into equal-sized portion and to analyse each of them independently. Let’s say we decide to split a document of 200 words into 10 <em>bins</em>. To do so, we are going to store the first 20 words in the first <em>bin</em>, the words 21 to 40 in the second <em>bin</em>, and so on… This way, each <em>bin</em> will account for 10% of the text. By repeating the procedure for all texts of a corpus, we can easily compare specific text portions (e.g., the first 10%) between multiples documents.</p>
 <p>Let’s split our documents into sets of <em>bins</em>. The first step is to obtain a vector of characters for each document. This is done easily with the <code>tokens</code> function from the <strong><code>quanteda</code></strong> (remember that <strong><code>sentometrics</code></strong> objects are also based on <strong><code>quanteda</code></strong>, letting us free to use most functions from this package).</p>
 <div class="sourceCode" id="cb11"><html><body><pre class="r"><span class="no">usnews2Toks</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://quanteda.io/reference/tokens.html">tokens</a></span>(<span class="no">usnews2Sento</span>, <span class="kw">remove_punct</span> <span class="kw">=</span> <span class="fl">TRUE</span>)
 <span class="no">usnews2Toks</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://quanteda.io/reference/tokens_tolower.html">tokens_tolower</a></span>(<span class="no">usnews2Toks</span>)  <span class="co"># changing all letters to lowercase is optional but recommended</span></pre></body></html></div>
@@ -331,13 +331,13 @@ <h3 class="hasAnchor">
 </div>
 <div id="exposing-intratextual-sentiment-structure-with-bins" class="section level2">
 <h2 class="hasAnchor">
-<a href="#exposing-intratextual-sentiment-structure-with-bins" class="anchor"></a>Exposing Intratextual Sentiment Structure with <em>bins</em>
+<a href="#exposing-intratextual-sentiment-structure-with-bins" class="anchor"></a>Exposing intratextual sentiment structure with <em>bins</em>
 </h2>
-<p>In their analysis of CEO letters, <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a> identified an intratextual sentiment structure: CEOs would deliberately emphasize sentiments at the beginning and end of the letter, and pay attention to leave out a positive message and the end. Our dataset of news articles is radically different from these letters so we don’t expect to find a similar structure. However, based on our knowledge of news, we can formulate a hypothesis: news articles tend to use strong sentiments in their headlines to attract readers’ eyes. Let’s investigate this using our <em>bins</em>!</p>
+<p>In their analysis of CEO letters, <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a> identified an intratextual sentiment structure: CEOs would deliberately emphasize sentiment at the beginning and end of the letter, and pay attention to leave out a positive message and the end. Our dataset of news articles is radically different from these letters so we don’t expect to find a similar structure. However, based on our knowledge of news, we can formulate a hypothesis: news articles tend to use strong sentiment in their headlines to attract readers’ eyes. Let’s investigate this using our <em>bins</em>!</p>
 <div id="absolute-sentiment" class="section level3">
 <h3 class="hasAnchor">
 <a href="#absolute-sentiment" class="anchor"></a>Absolute sentiment</h3>
-<p>We expect that the first <em>bin</em> in each article presents on average more sentiment than in the rest of the text. Since news can either be positive or negative, it will easier to identify sentiment intensity using the absolute value lexicon prepared earlier. This way, we avoid the cancelling effect between positive and negative sentiments. Simply plotting the mean sentiment values for each <em>bin</em> across documents can give us some insight on the intratextual structure. Once again, we rely on <code>data.table</code>’s <code>[]</code> operator to easily group sentiment values per <code>sentence_id</code> (remember, these represent the <em>bin</em> number!). In addition to this, a boxplot can be useful to ensure that the mean sentiments are not driven by extreme outliers.</p>
+<p>We expect that the first <em>bin</em> in each article presents on average more sentiment than in the rest of the text. Since news can either be positive or negative, it will easier to identify sentiment intensity using the absolute value lexicon prepared earlier. This way, we avoid the cancelling effect between positive and negative sentiment. Simply plotting the mean sentiment values for each <em>bin</em> across documents can give us some insight on the intratextual structure. Once again, we rely on <code>data.table</code>’s <code>[]</code> operator to easily group sentiment values per <code>sentence_id</code> (remember, these represent the <em>bin</em> number!). In addition to this, a boxplot can be useful to ensure that the mean sentiment are not driven by extreme outliers.</p>
 <div class="sourceCode" id="cb16"><html><body><pre class="r"><span class="fu"><a href="https://rdrr.io/r/graphics/par.html">par</a></span>(<span class="kw">mfrow</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span>(<span class="fl">1</span>, <span class="fl">2</span>))
 
 <span class="fu"><a href="https://rdrr.io/r/base/plot.html">plot</a></span>(<span class="no">sentiment</span>[, <span class="fu">.</span>(<span class="kw">s</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span>(<span class="no">`absoluteLex--dummyFeature`</span>)), <span class="kw">by</span> <span class="kw">=</span> <span class="no">sentence_id</span>], <span class="kw">type</span> <span class="kw">=</span> <span class="st">"l"</span>,
@@ -351,14 +351,14 @@ <h3 class="hasAnchor">
 <div id="herfindahl-hirschman-index" class="section level3">
 <h3 class="hasAnchor">
 <a href="#herfindahl-hirschman-index" class="anchor"></a>Herfindahl-Hirschman Index</h3>
-<p>Another way to study the intratextual sentiment structure is to compute the Herfindahl-Hirschman Index across all documents. This is a popular index of concentration, mainly used in measuring competition between firms on a given market. A value close to 0 indicates large dispersion between <em>bins</em> while a value of 1 indicated that all sentiments are found in a single <em>bin</em>. The formula to compute the index of a single document is:</p>
+<p>Another way to study the intratextual sentiment structure is to compute the Herfindahl-Hirschman Index across all documents. This is a popular index of concentration, mainly used in measuring competition between firms on a given market. A value close to 0 indicates large dispersion between <em>bins</em> while a value of 1 indicated that all sentiment are found in a single <em>bin</em>. The formula to compute the index of a single document is:</p>
 <p><span class="math display">\[H = \sum_{b=1}^{B} s_b^2\]</span> where <span class="math inline">\(b\)</span> are <em>bin</em> indexes and <span class="math inline">\(s\)</span> the proportion of the document sentiment found in a single <em>bin</em>.</p>
 <p>Using <code>data.table</code>, we can easily compute the index for the whole set of documents.</p>
 <div class="sourceCode" id="cb17"><html><body><pre class="r"><span class="no">herfindahl</span> <span class="kw">&lt;-</span> <span class="no">sentiment</span>[, <span class="fu">.</span>(<span class="kw">s</span> <span class="kw">=</span> <span class="no">`absoluteLex--dummyFeature`</span>/<span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">`absoluteLex--dummyFeature`</span>)), <span class="kw">by</span> <span class="kw">=</span> <span class="no">id</span>]
 <span class="no">herfindahl</span> <span class="kw">&lt;-</span> <span class="no">herfindahl</span>[, <span class="fu">.</span>(<span class="kw">h</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">s</span>^<span class="fl">2</span>)), <span class="kw">by</span> <span class="kw">=</span> <span class="no">id</span>]
 <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span>(<span class="no">herfindahl</span>$<span class="no">h</span>)</pre></body></html></div>
 <pre><code>## [1] 0.1445487</code></pre>
-<p>A result that shows there is concentration toward some <em>bins</em>! Note that this result is heavily dependent on the number of <em>bins</em> considered. Only index values computed with the same number of <em>bins</em> should be compared. Let’s show the index’s value if sentiments were uniformly positioned within the text:</p>
+<p>A result that shows there is concentration toward some <em>bins</em>! Note that this result is heavily dependent on the number of <em>bins</em> considered. Only index values computed with the same number of <em>bins</em> should be compared. Let’s show the index’s value if sentiment were uniformly positioned within the text:</p>
 <div class="sourceCode" id="cb19"><html><body><pre class="r"><span class="no">x</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://Rdatatable.gitlab.io/data.table/reference/data.table.html">data.table</a></span>(<span class="kw">id</span> <span class="kw">=</span> <span class="no">sentiment</span>$<span class="no">id</span>, <span class="kw">s</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/rep.html">rep</a></span>(<span class="fl">1</span>, <span class="fu"><a href="https://rdrr.io/r/base/nrow.html">nrow</a></span>(<span class="no">sentiment</span>)))
 
 <span class="no">herfindahl</span> <span class="kw">&lt;-</span> <span class="no">x</span>[, <span class="fu">.</span>(<span class="kw">s</span> <span class="kw">=</span> <span class="no">s</span>/<span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">s</span>)), <span class="kw">by</span> <span class="kw">=</span> <span class="no">id</span>]
@@ -370,7 +370,7 @@ <h3 class="hasAnchor">
 <div id="computing-sentiment-with-different-weights" class="section level2">
 <h2 class="hasAnchor">
 <a href="#computing-sentiment-with-different-weights" class="anchor"></a>Computing sentiment with different weights</h2>
-<p>The <strong><code>sentometrics</code></strong> comes with a lot of different weightings methods to compute sentiment and aggregate them into document sentiments or even time series. These weightings methods can be accessed with the functions <code>get_hows</code>.</p>
+<p>The <strong><code>sentometrics</code></strong> comes with a lot of different weightings methods to compute sentiment and aggregate them into document sentiment or even time series. These weightings methods can be accessed with the functions <code>get_hows</code>.</p>
 <div class="sourceCode" id="cb21"><html><body><pre class="r"><span class="fu"><a href="../reference/get_hows.html">get_hows</a></span>()</pre></body></html></div>
 <pre><code>## $words
 ## [1] "counts"                 "proportional"           "proportionalPol"       
@@ -383,8 +383,8 @@ <h2 class="hasAnchor">
 ## 
 ## $time
 ## [1] "equal_weight" "almon"        "beta"         "linear"       "exponential"  "own"</code></pre>
-<p>So far, we’ve been using the <code>proportional</code> method from the <code>$words</code> set. The <code>$words</code> set contains the valid options for the <code>hows</code> argument of <code><a href="../reference/compute_sentiment.html">compute_sentiment()</a></code>. The other two sets are used within the <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> function, to respectively aggregate sentences sentiment into documents or document sentiments into time series.</p>
-<p>With our earlier computation of sentiments using <code>do.sentences = TRUE</code>, we computed sentiments for sentences and <em>bins</em>. Now, for our next application, we need to aggregate these sentences and <em>bins</em> sentiments into documents sentiments. One option is to <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> using one of the methods shown above. Note the use of <code>do.full = FALSE</code> to stop the aggregation at the document level (otherwise, it would directly aggregate up to a time series).</p>
+<p>So far, we’ve been using the <code>proportional</code> method from the <code>$words</code> set. The <code>$words</code> set contains the valid options for the <code>hows</code> argument of <code><a href="../reference/compute_sentiment.html">compute_sentiment()</a></code>. The other two sets are used within the <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> function, to respectively aggregate sentences sentiment into documents or document sentiment into time series.</p>
+<p>With our earlier computation of sentiment using <code>do.sentences = TRUE</code>, we computed sentiment for sentences and <em>bins</em>. Now, for our next application, we need to aggregate these sentences and <em>bins</em> sentiment into documents sentiment. One option is to <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> using one of the methods shown above. Note the use of <code>do.full = FALSE</code> to stop the aggregation at the document level (otherwise, it would directly aggregate up to a time series).</p>
 <div class="sourceCode" id="cb23"><html><body><pre class="r"><span class="no">docsSentiment</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/aggregate.html">aggregate</a></span>(<span class="no">sentiment</span>, <span class="fu"><a href="../reference/ctr_agg.html">ctr_agg</a></span>(<span class="kw">howDocs</span> <span class="kw">=</span> <span class="st">"equal_weight"</span>), <span class="kw">do.full</span> <span class="kw">=</span> <span class="fl">FALSE</span>)
 
 <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="fu"><a href="https://rdrr.io/r/base/list.html">list</a></span>(<span class="kw">sentiment</span> <span class="kw">=</span> <span class="no">sentiment</span>, <span class="kw">docsSentiment</span> <span class="kw">=</span> <span class="no">docsSentiment</span>), <span class="no">head</span>)</pre></body></html></div>
@@ -449,14 +449,14 @@ <h3 class="hasAnchor">
 <h3 class="hasAnchor">
 <a href="#application-to-news-sentiment-prediction" class="anchor"></a>Application to news sentiment prediction</h3>
 <p>Let’s now put all of this in a concrete example. We’ve been using a modified dataset <code>usnews2</code> since the beginning because we wanted to have a variable identifying whether the document is positive or negative. Our goal is now to try to predict this value.</p>
-<p>To do so, we will consider 4 different approaches, in the form of four different weighting methods. We will study which weighting is the best to predict document’s sentiments. The four weighting methods will be:</p>
+<p>To do so, we will consider 4 different approaches, in the form of four different weighting methods. We will study which weighting is the best to predict document’s sentiment. The four weighting methods will be:</p>
 <ul>
 <li>The default weighting based on word frequencies, regardless of the position.</li>
 <li>A U-shaped weighting of words, where words at the beginning or end of the text are given more weights.</li>
-<li>A sentence-weighting, where word sentiments are proportionally weighted up to a sentence sentiment level, then sentences are aggregated with an equal weighting to obtain the document sentiment.</li>
-<li>The <em>bin</em> based approach, where word sentiments are proportionally weighted up to a <em>bin</em> sentiment level, then <em>bins</em> are aggregated with our custom weights: the first <em>bin</em> given half the weight and the other <em>bins</em> sharing the rest.</li>
+<li>A sentence weighting, where word sentiment are proportionally weighted up to a sentence sentiment level, then sentences are aggregated with an equal weighting to obtain the document sentiment.</li>
+<li>The <em>bin</em> based approach, where word sentiment are proportionally weighted up to a <em>bin</em> sentiment level, then <em>bins</em> are aggregated with our custom weights: the first <em>bin</em> given half the weight and the other <em>bins</em> sharing the rest.</li>
 </ul>
-<p>The U-shaped weighting is something we haven’t seen before. This is a weighting method for words as per <code>get_words()</code> that gives more weight to the beginning and end of a text. Its exact formulation can be found at the end of the <a href="https://doi.org/10.2139/ssrn.3067734">Sentometrics vignette</a>, along with the other available weighting. This weighting scheme can be visualized as follows:</p>
+<p>The U-shaped weighting is something we haven’t seen before. This is a weighting method for words, as we can learn from <code><a href="../reference/get_hows.html">get_hows()</a></code>. This scheme gives more weight to the beginning and end of a text. Its exact formulation can be found at the end of the <a href="https://doi.org/10.2139/ssrn.3067734">Sentometrics vignette</a>, along with the other available weightings. This weighting scheme can be visualized as follows:</p>
 <div class="sourceCode" id="cb33"><html><body><pre class="r"><span class="no">Qd</span> <span class="kw">&lt;-</span> <span class="fl">200</span> <span class="co"># number of words in the documents</span>
 <span class="no">i</span> <span class="kw">&lt;-</span> <span class="fl">1</span>:<span class="no">Qd</span>
 
@@ -465,7 +465,7 @@ <h3 class="hasAnchor">
 
 <span class="fu"><a href="https://rdrr.io/r/base/plot.html">plot</a></span>(<span class="no">ushape</span>, <span class="kw">type</span> <span class="kw">=</span> <span class="st">'l'</span>, <span class="kw">ylab</span> <span class="kw">=</span> <span class="st">"Weight"</span>, <span class="kw">xlab</span> <span class="kw">=</span> <span class="st">"Word position index"</span>, <span class="kw">main</span> <span class="kw">=</span> <span class="st">"U-shaped weight scheme"</span>)</pre></body></html></div>
 <p><img src="isa_files/figure-html/unnamed-chunk-19-1.png" width="700"></p>
-<p>Let’s compute sentiments with the four different weighting schemes. We will store the results in a list, <code>sentimentValues</code>.</p>
+<p>Let’s compute sentiment with the four different weighting schemes. We will store the results in a list, <code>sentimentValues</code>.</p>
 <div class="sourceCode" id="cb34"><html><body><pre class="r"><span class="no">sentimentValues</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html">list</a></span>()
 
 <span class="no">sentimentValues</span>$<span class="no">default</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/compute_sentiment.html">compute_sentiment</a></span>(<span class="no">usnews2Sento</span>, <span class="no">sentoLexicon</span>, <span class="kw">how</span> <span class="kw">=</span> <span class="st">"proportional"</span>)
@@ -474,30 +474,18 @@ <h3 class="hasAnchor">
 <span class="no">sentimentValues</span>$<span class="no">bins</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/compute_sentiment.html">compute_sentiment</a></span>(<span class="no">usnews2Sento</span>, <span class="no">sentoLexicon</span>, <span class="kw">tokens</span> <span class="kw">=</span> <span class="no">usnews2Bins</span>, <span class="kw">how</span> <span class="kw">=</span> <span class="st">"proportional"</span>,
                                           <span class="kw">do.sentence</span> <span class="kw">=</span> <span class="fl">TRUE</span>)
 
-<span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="no">head</span>, <span class="kw">n</span> <span class="kw">=</span> <span class="fl">3</span>)</pre></body></html></div>
+<span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>[<span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span>(<span class="fl">1</span>,<span class="fl">3</span>)], <span class="no">head</span>, <span class="kw">n</span> <span class="kw">=</span> <span class="fl">3</span>)</pre></body></html></div>
 <pre><code>## $default
 ##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature
 ## 1: 830981632 1971-01-12        192          -0.010156250                0.10130208
 ## 2: 830981642 1971-08-04        243           0.036831276                0.08539095
 ## 3: 830981666 1971-08-24        326           0.007515337                0.03849693
 ## 
-## $uShaped
-##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature
-## 1: 830981632 1971-01-12        192         -0.0345837756                0.13075232
-## 2: 830981642 1971-08-04        243          0.0264033780                0.09045472
-## 3: 830981666 1971-08-24        326          0.0006524499                0.02734571
-## 
 ## $sentences
 ##           id sentence_id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature
 ## 1: 830981632           1 1971-01-12         28           -0.09285714                0.12142857
 ## 2: 830981632           2 1971-01-12         37            0.01081081                0.01081081
-## 3: 830981632           3 1971-01-12          6           -0.01666667                0.15000000
-## 
-## $bins
-##           id sentence_id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature
-## 1: 830981632           1 1971-01-12         20           -0.11250000                0.11250000
-## 2: 830981632           2 1971-01-12         19           -0.01842105                0.06052632
-## 3: 830981632           3 1971-01-12         19            0.02105263                0.02105263</code></pre>
+## 3: 830981632           3 1971-01-12          6           -0.01666667                0.15000000</code></pre>
 <p>Before going further, we need to aggregate the two last results to a document level sentiment measure. We are going to aggregate sentences using the <code><a href="https://rdrr.io/r/stats/aggregate.html">aggregate()</a></code> function while we will repeat the same operation as before to compute the <em>bins</em> aggregation with the custom weights.</p>
 <div class="sourceCode" id="cb36"><html><body><pre class="r"><span class="no">sentimentValues</span>$<span class="no">sentences</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/aggregate.html">aggregate</a></span>(<span class="no">sentimentValues</span>$<span class="no">sentences</span>, <span class="fu"><a href="../reference/ctr_agg.html">ctr_agg</a></span>(<span class="kw">howDocs</span> <span class="kw">=</span> <span class="st">"equal_weight"</span>), <span class="kw">do.full</span> <span class="kw">=</span> <span class="fl">FALSE</span>)
 
@@ -517,7 +505,7 @@ <h3 class="hasAnchor">
 ## 1: 830981632 1971-01-12        194          -0.004965374                0.09939751
 ## 2: 830981642 1971-08-04        243           0.035614035                0.08657895
 ## 3: 830981666 1971-08-24        336           0.006670419                0.03841824</code></pre>
-<p>Finally, what remains to do is test our results against the variable <code>s</code> from <code>usnews2</code>. Since we know the number of positive and negative news in <code>s</code>, we can quickly and in a naive way measure the accuracy by ordering the documents by sentiment values.</p>
+<p>Finally, what remains to do is to test our results against the variable <code>s</code> from <code>usnews2</code>. Since we know the number of positive and negative news in <code>s</code>, we can quickly and in a naive way measure the accuracy by ordering the documents by sentiment values.</p>
 <div class="sourceCode" id="cb38"><html><body><pre class="r"><span class="fu"><a href="https://rdrr.io/r/base/table.html">table</a></span>(<span class="no">usnews2</span>$<span class="no">s</span>)</pre></body></html></div>
 <pre><code>## 
 ##  -1   1 
@@ -525,75 +513,96 @@ <h3 class="hasAnchor">
 <p>Thus, we classify the 605 documents with the lowest sentiment in each measure as negative, and the remaining documents as positive.</p>
 <p>Let’s start by adding the <code>s</code> variable to the existing measures by merging each of them with <code>usnews2</code>. The use of <code>lapply</code> allows to do the operation of all measures at once.</p>
 <div class="sourceCode" id="cb40"><html><body><pre class="r"><span class="no">sentimentValues</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="kw">function</span>(<span class="no">x</span>) <span class="fu"><a href="https://rdrr.io/r/base/merge.html">merge.data.frame</a></span>(<span class="no">x</span>, <span class="no">usnews2</span>[, <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span>(<span class="st">"id"</span>,<span class="st">"s"</span>)]))
-<span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="no">head</span>, <span class="kw">n</span> <span class="kw">=</span> <span class="fl">3</span>)</pre></body></html></div>
-<pre><code>## $default
-##          id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
+
+<span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span>(<span class="no">sentimentValues</span>$<span class="no">default</span>)</pre></body></html></div>
+<pre><code>##          id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
 ## 1 830981632 1971-01-12        192          -0.010156250                0.10130208 -1
 ## 2 830981642 1971-08-04        243           0.036831276                0.08539095 -1
 ## 3 830981666 1971-08-24        326           0.007515337                0.03849693  1
-## 
-## $uShaped
-##          id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1 830981632 1971-01-12        192         -0.0345837756                0.13075232 -1
-## 2 830981642 1971-08-04        243          0.0264033780                0.09045472 -1
-## 3 830981666 1971-08-24        326          0.0006524499                0.02734571  1
-## 
-## $sentences
-##          id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1 830981632 1971-01-12        202          -0.016162828                0.11392764 -1
-## 2 830981642 1971-08-04        251           0.049212232                0.09111569 -1
-## 3 830981666 1971-08-24        349           0.009439859                0.04566347  1
-## 
-## $bins
-##          id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1 830981632 1971-01-12        194          -0.004965374                0.09939751 -1
-## 2 830981642 1971-08-04        243           0.035614035                0.08657895 -1
-## 3 830981666 1971-08-24        336           0.006670419                0.03841824  1</code></pre>
+## 4 830981681 1972-01-28        158           0.025316456                0.09493671 -1
+## 5 830981684 1973-02-15        174          -0.004022989                0.03160920 -1
+## 6 830981702 1973-05-31        227           0.009251101                0.06784141  1</code></pre>
 <p>Since we used <code>merge.data.frame</code>, we need to convert the objects back to <code>data.table</code> and then we can order each of these tables.</p>
 <div class="sourceCode" id="cb42"><html><body><pre class="r"><span class="no">sentimentValues</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="no">as.data.table</span>) <span class="co"># converting back to data.table</span>
 
 <span class="no">sentimentValues</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="kw">function</span>(<span class="no">x</span>) <span class="no">x</span>[<span class="fu"><a href="https://Rdatatable.gitlab.io/data.table/reference/setorder.html">order</a></span>(<span class="no">`baseLex--dummyFeature`</span>)]) <span class="co"># order based on the baseLex sentiment values</span>
 
-<span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">sentimentValues</span>, <span class="no">head</span>, <span class="kw">n</span> <span class="kw">=</span> <span class="fl">3</span>)</pre></body></html></div>
-<pre><code>## $default
-##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
+<span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span>(<span class="no">sentimentValues</span>$<span class="no">default</span>)</pre></body></html></div>
+<pre><code>##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
 ## 1: 830981961 1976-02-20        123           -0.06707317                0.10691057 -1
 ## 2: 842616972 2011-11-23        206           -0.05412621                0.12305825 -1
 ## 3: 842616769 2010-11-20        186           -0.05322581                0.08978495 -1
-## 
-## $uShaped
-##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1: 842613535 1991-05-02        213           -0.07650705                0.10549752  1
-## 2: 830981961 1976-02-20        123           -0.07607828                0.09651269 -1
-## 3: 842615597 2003-11-24        202           -0.07382476                0.10713842 -1
-## 
-## $sentences
-##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1: 830981961 1976-02-20        125           -0.07707298                0.12142702 -1
-## 2: 830984376 1987-12-17        225           -0.06662902                0.10809238 -1
-## 3: 842614104 1994-02-18        205           -0.06074249                0.09806308 -1
-## 
-## $bins
-##           id       date word_count baseLex--dummyFeature absoluteLex--dummyFeature  s
-## 1: 830981961 1976-02-20        127           -0.07186235                0.10620783 -1
-## 2: 842616769 2010-11-20        186           -0.05663281                0.08476454 -1
-## 3: 842616972 2011-11-23        208           -0.04818296                0.11309524 -1</code></pre>
+## 4: 842614104 1994-02-18        195           -0.05256410                0.08384615 -1
+## 5: 830984835 1988-11-20        175           -0.04942857                0.07228571  1
+## 6: 842617451 2014-12-17        159           -0.04874214                0.08270440 -1</code></pre>
 <p>Finally, we compute the accuracy by counting the number of times the value of <code>s</code> is -1 in the first 605 documents and the number of time the value is 1 in the last 344 documents. We obtain a balanced accuracy measure by combining the true negative rate and the true positive rate.</p>
 <div class="sourceCode" id="cb44"><html><body><pre class="r"><span class="no">index</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/table.html">table</a></span>(<span class="no">usnews2</span>$<span class="no">s</span>)<span class="kw">[[</span><span class="fl">1</span>]]
 
 <span class="no">rates</span> <span class="kw">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span>(<span class="kw">trueNegativeRate</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">sapply</a></span>(<span class="no">sentimentValues</span>, <span class="kw">function</span>(<span class="no">x</span>){<span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">x</span>[<span class="fl">1</span>:<span class="no">index</span>, <span class="no">s</span> <span class="kw">==</span> -<span class="fl">1</span>]) / <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">x</span>[, <span class="no">s</span> <span class="kw">==</span> -<span class="fl">1</span>])}),
                <span class="kw">truePositiveRate</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">sapply</a></span>(<span class="no">sentimentValues</span>, <span class="kw">function</span>(<span class="no">x</span>){<span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">x</span>[(<span class="fl">1</span> + <span class="no">index</span>):<span class="fu"><a href="https://rdrr.io/r/base/nrow.html">nrow</a></span>(<span class="no">x</span>), <span class="no">s</span> <span class="kw">==</span> <span class="fl">1</span>]) / <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">x</span>[, <span class="no">s</span> <span class="kw">==</span> <span class="fl">1</span>])}))
 
-<span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span>(<span class="no">rates</span>, <span class="kw">balancedAccuracy</span> <span class="kw">=</span> (<span class="no">rates</span>[,<span class="fl">1</span>] + <span class="no">rates</span>[,<span class="fl">2</span>]) / <span class="fl">2</span> )</pre></body></html></div>
+<span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span>(<span class="no">rates</span>, <span class="kw">balancedAccuracy</span> <span class="kw">=</span> (<span class="no">rates</span>[, <span class="fl">1</span>] + <span class="no">rates</span>[, <span class="fl">2</span>]) / <span class="fl">2</span> )</pre></body></html></div>
 <pre><code>##           trueNegativeRate truePositiveRate balancedAccuracy
 ## default          0.7256198        0.5174419        0.6215308
 ## uShaped          0.7289256        0.5232558        0.6260907
 ## sentences        0.7256198        0.5174419        0.6215308
 ## bins             0.7272727        0.5203488        0.6238108</code></pre>
 <p>In this case, the U-shaped weighting performs best but we can already see the improvement brought by our custom weights in comparison with the default settings. In a supervised learning setting, it can be useful to optimize a custom weights scheme on a training dataset. An example of such a model can be found in the paper of <a href="https://doi.org/10.1111/fima.12219">Boudt &amp; Thewissen, 2019</a>, where <em>bins</em> weights are optimized to predict firm performance.</p>
-<p>That’s the end of this tutorial. Want to go further? Have a try creating weird <em>bins</em>! They actually don’t have to be of equal size, their specification is up to anyone. Also, keep in mind that we have only covered news articles in this tutorial, which is not representative of all type of texts, feel free to investigate how sentiments are positioned within different types of documents.</p>
 </div>
 </div>
+<div id="hierarchical-aggregation---bins-of-sentences" class="section level2">
+<h2 class="hasAnchor">
+<a href="#hierarchical-aggregation---bins-of-sentences" class="anchor"></a>Hierarchical aggregation - <em>bins</em> of sentences</h2>
+<p>As we learned through this tutorial, we can always define more complex methods to compute and aggregate sentiment. The reason why we use different aggregation levels such as <em>bins</em> or sentences is that looking at words does not capture the semantic structure of the text. The most appropriate way to compute sentiment should be through sentences, as sentences usually convey a single statement.</p>
+<p>Earlier, we implemented the <em>bins</em> approach by creating equal-sized containers of words. Each <em>bin</em> then contained a similar number of words. This naive split had the effect of cutting some sentences between two bins. From a semantic point of view, this not desirable. Hence, we’re going to define here a new <em>bins</em> approach that respects sentences integrity: <em>bins</em> of sentences.</p>
+<p>This approach is similar to the previous one, but instead of dividing the texts into equal-sized containers of words, we are going to divide them into equal-sized containers of sentences. This means that each bin will contain approximately the same number of sentences.</p>
+<p>To implement it, we will need to play a bit with <code>data.table</code> operations to aggregate from sentences to <em>bins</em> of sentences. The first step is to compute sentence sentiment using <code><a href="../reference/compute_sentiment.html">compute_sentiment()</a></code>. Then, we’re going to add a column to the resulting sentiment object. This additional column will contain information about the future <em>bin</em> in which each sentence will be aggregated. This is a mapping from sentences to <em>bins</em> of sentences.</p>
+<p>The following operation creating <code>bin_id</code> is slightly complex. The best way to understand it is by following the logic from the most internal part of the script up to the final <code><a href="https://rdrr.io/r/base/apply.html">apply()</a></code>. The innermost function here is <code>splitIndices</code>, which is used to split the <code>sentence_id</code> of each document in equal-sized vectors. The second level, the <code><a href="https://rdrr.io/r/base/lapply.html">sapply()</a></code> function, determines to which split vector belongs each <code>sentence_id</code> and returns boolean vectors for each. Finally, the last <code><a href="https://rdrr.io/r/base/apply.html">apply()</a></code> call the function <code><a href="https://rdrr.io/r/base/which.html">which()</a></code> on each of these vectors, resulting in the correct <em>bin</em> indices.</p>
+<div class="sourceCode" id="cb46"><html><body><pre class="r"><span class="no">sentiment</span> <span class="kw">&lt;-</span> <span class="fu"><a href="../reference/compute_sentiment.html">compute_sentiment</a></span>(<span class="no">usnews2Sento</span>, <span class="no">sentoLexicon</span>, <span class="kw">how</span> <span class="kw">=</span> <span class="st">"proportional"</span>, <span class="kw">do.sentence</span> <span class="kw">=</span> <span class="fl">TRUE</span>)
+<span class="no">nBins</span> <span class="kw">&lt;-</span> <span class="fl">5</span>
+
+<span class="no">sentiment</span> <span class="kw">&lt;-</span> <span class="no">sentiment</span>[, <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span>(<span class="kw">bin_id</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/apply.html">apply</a></span>(
+                                 <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">sapply</a></span>(<span class="kw pkg">parallel</span><span class="kw ns">::</span><span class="fu"><a href="https://rdrr.io/r/parallel/splitIndices.html">splitIndices</a></span>(<span class="fu"><a href="https://rdrr.io/r/base/Extremes.html">max</a></span>(<span class="no">sentence_id</span>), <span class="no">nBins</span>),
+                                        <span class="st">'%in%'</span>, <span class="kw">x</span> <span class="kw">=</span> <span class="no">sentence_id</span>),
+                                 <span class="no">which</span>,
+                                 <span class="kw">MARGIN</span> <span class="kw">=</span> <span class="fl">1</span>
+                                 ),
+                               <span class="no">.SD</span>), <span class="kw">by</span> <span class="kw">=</span> <span class="no">id</span>]
+
+<span class="no">sentiment</span>[<span class="no">id</span> <span class="kw">==</span> <span class="fl">830981632</span>, <span class="fl">1</span>:<span class="fl">6</span>]</pre></body></html></div>
+<pre><code>##           id bin_id sentence_id       date word_count baseLex--dummyFeature
+## 1: 830981632      1           1 1971-01-12         28           -0.09285714
+## 2: 830981632      1           2 1971-01-12         37            0.01081081
+## 3: 830981632      2           3 1971-01-12          6           -0.01666667
+## 4: 830981632      2           4 1971-01-12         33            0.01666667
+## 5: 830981632      3           5 1971-01-12         16           -0.04687500
+## 6: 830981632      4           6 1971-01-12         24            0.04166667
+## 7: 830981632      4           7 1971-01-12         24            0.07708333
+## 8: 830981632      5           8 1971-01-12         17           -0.18529412
+## 9: 830981632      5           9 1971-01-12         17            0.05000000</code></pre>
+<p>With this result, we can now use the new column <code>bin_id</code> for grouping. We cannot use the <strong><code>sentometrics</code></strong> functions here, as they are not built to take into account a <code>bin_id</code> column. Instead, we use a <code>data.table</code> operation similar to what we did to compute the <em>bins</em> aggregation with custom weights. This time, however, we will simply use the <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code> function, meaning that each <em>bin</em> of sentences will contain the average sentiment value of the constituent sentences.</p>
+<div class="sourceCode" id="cb48"><html><body><pre class="r"><span class="no">sentiment</span> <span class="kw">&lt;-</span> <span class="no">sentiment</span>[, <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span>(<span class="kw">word_count</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/sum.html">sum</a></span>(<span class="no">word_count</span>), <span class="kw">sentence_count</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span>(<span class="no">sentence_id</span>), <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span>(<span class="no">.SD</span>, <span class="no">mean</span>)),
+                                             <span class="kw">by</span> <span class="kw">=</span> <span class="fu">.</span>(<span class="no">id</span>, <span class="no">date</span>, <span class="no">bin_id</span>),
+                                             <span class="kw">.SDcols</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/utils/head.html">tail</a></span>(<span class="fu"><a href="https://rdrr.io/r/base/names.html">names</a></span>(<span class="no">sentiment</span>), -<span class="fl">5</span>)]
+<span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span>(<span class="no">sentiment</span>[, <span class="fl">1</span>:<span class="fl">6</span>])</pre></body></html></div>
+<pre><code>##           id       date bin_id word_count sentence_count baseLex--dummyFeature
+## 1: 830981632 1971-01-12      1         65              2           -0.04102317
+## 2: 830981632 1971-01-12      2         39              2            0.00000000
+## 3: 830981632 1971-01-12      3         16              1           -0.04687500
+## 4: 830981632 1971-01-12      4         48              2            0.05937500
+## 5: 830981632 1971-01-12      5         34              2           -0.06764706
+## 6: 830981642 1971-08-04      1         60              2            0.03981481</code></pre>
+<p>Finally, we can re-create the graphs used for our initial analysis of the intratextual sentiment structure, but using <em>bins</em> of sentences. In this case, there’s not much difference with the previous analysis. However, using <em>bins</em> of sentences paves the way to more complex and semantically accurate analyses.</p>
+<div class="sourceCode" id="cb50"><html><body><pre class="r"><span class="fu"><a href="https://rdrr.io/r/graphics/par.html">par</a></span>(<span class="kw">mfrow</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span>(<span class="fl">1</span>, <span class="fl">2</span>))
+
+<span class="fu"><a href="https://rdrr.io/r/base/plot.html">plot</a></span>(<span class="no">sentiment</span>[, <span class="fu">.</span>(<span class="kw">s</span> <span class="kw">=</span> <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span>(<span class="no">`absoluteLex--dummyFeature`</span>)), <span class="kw">by</span> <span class="kw">=</span> <span class="no">bin_id</span>], <span class="kw">type</span> <span class="kw">=</span> <span class="st">"l"</span>,
+     <span class="kw">ylab</span> <span class="kw">=</span> <span class="st">"Mean absolute sentiment"</span>, <span class="kw">xlab</span> <span class="kw">=</span> <span class="st">"Bin of sentences"</span>)
+
+<span class="fu"><a href="https://rdrr.io/r/graphics/boxplot.html">boxplot</a></span>(<span class="no">sentiment</span>$<span class="no">`absoluteLex--dummyFeature`</span> ~ <span class="no">sentiment</span>$<span class="no">bin_id</span>, <span class="kw">ylab</span> <span class="kw">=</span> <span class="st">"Absolute sentiment"</span>, <span class="kw">xlab</span> <span class="kw">=</span> <span class="st">"Bin of sentences"</span>,
+        <span class="kw">outline</span> <span class="kw">=</span> <span class="fl">FALSE</span>, <span class="kw">range</span> <span class="kw">=</span> <span class="fl">0.5</span>)</pre></body></html></div>
+<p><img src="isa_files/figure-html/unnamed-chunk-28-1.png" width="1152"></p>
+<p>That’s the end of this tutorial. Want to go further? Have a try creating more weird <em>bins</em>! They actually don’t have to be of equal size, their specification is up to anyone. Also, keep in mind that we have only covered news articles in this tutorial, which is not representative of all type of texts, feel free to investigate how sentiment are positioned within different types of documents.</p>
+</div>
 <div id="acknowledgements" class="section level2">
 <h2 class="hasAnchor">
 <a href="#acknowledgements" class="anchor"></a>Acknowledgements</h2>
diff --git a/docs/articles/isa_files/figure-html/unnamed-chunk-28-1.png b/docs/articles/isa_files/figure-html/unnamed-chunk-28-1.png
new file mode 100644
index 0000000000000000000000000000000000000000..bfe331059948452f7b0283051c59594331c42e67
GIT binary patch
literal 25632
zcmeFZbzD?m)Hix&=*}M~4F;eHD%~R?79~iRsDPw|AU!CElwcw~grrCh(jch_4APyF
z(jX0Y&k*`N@BQcgeP2F9oxRW5XYaMvcjevzYAQ+;q)em`1W{bSrf?gANWh<XmBe`9
zm-Y#CF!(j7c1u$c{DvS22r`2tBp?a!GYm3Aga5JrVZY2I%)lp10$dCOpFT5OS2GF7
z476bz!B?1B82F&gKr0$;h6X?P!z7?E35hT>D9jA}1nm;oF4!-07#e)~!a$$CeY6>b
zHj_YyfsYv)v|$^;S0A>$5Bvsy?4$Sh+uGXt!X)~_%=*ya6NYOE!!}~S_Cc4v{eAGP
z{l2jMKJ-4gguMxSZ9#927tD|3(KTI12x9fee#1h4Ej@rB4(PhV70oBna|53D+8WRc
zu0@jhB&?LtnE5D2AMw>6gO5mJf{M6*Uw;$*#sBnh>AtAYxo^(nM10`}>3SKR>nl_g
zHjNqg?@XIZzYTN;gWL;jCjh@{u0n7KnxIBP5K<hU1A<H_VZ;#hLH_?<{=aMT|9ghe
zjf^c5+gXb<X7t|QE8UnjCYgnCkim!%FZY-jnuqVR`4A(ZNbr5xN&Sa(<QWb)HS#R@
z-sz_d8XX?Ezg)GeThDfM(cnkuPCsi49W{6W9r$)m&I%gssP@*<eF5R40w8=2$)@Zv
zT8zLt5X46C)Z9FJ=kFEtE#S}hG)U;ax%hAJ;3{|Uz%!sr&qD*e@qYSR)i{!aw^(*m
zH(qbsm{r|Zgy0wi1P@~urNXohQF+T|PAHs)U^*NStN~MdV{j8RX8VE>m5iS^x~A+%
z!iX#Jz+?H81-$o*SJ$cJ!9N6WY6!Zh<P4pIh+n%Yyg0wsONgN_4<a%)MWl6!ARnaF
z`+m5cWd$`pU0=eGk$>xaM0U+QL#fyD0tYLY-(|F0Qf4RoCwTSK_x|56C#EQQd6zFS
z-N<r9uI4jjBf%@3l$L%C;g3JZ{;4RvJpTd0zwd|rM_JlotC%Ntt^i*-lVDj1OfNW{
zZJHL7;mLqXFF^bO%k;Xn<5Y$0(8BaWu^!p@OYB__q?hX^PwNB+tizme_vEsPd(PPt
zgEeme;1?E*W^<Kf+Rd6@=<L+vumkJ*t$XGbT951+u6HKe!;Au?)BpF&Q}eIN1?6XL
zFN86JwY=}NevK1m(H0yO>y}1F{Sq4AC57Ur=It)S)sQ=hTl!Ej7xefE3G(7wmqNQY
z6`n@1$pzoLE@7DAYX0F6akZAqDG2tO8Yxjmh#*G0F~f+%e4+MfVMs2!qmLMYJdKZf
zDFDxw^^CERC}Ik&k2`BDSsD)EUz3AoHn^Z|jn1CWLqhY>5)$j4fZEubH}61yKBi|4
zux9m3c`Xjw!SFc{v~aaaZCH1Ba=EjhQ~cy3P50k>m!JG9z=u<Fkib$`GN9i#)s46J
zO39@bnD!s(5^`m^QVSO_Ttq_o^^MSPC3Wo*oghhmTIm&C--J#$AvN;Plat8ZscFl$
zF2x+Tud~TKY@7WQ$teX$0(?}N5s?YIn_F-4q}R763@4Db#0d8?%1W8lRUtSvSU60q
z06r!r_N9oYA^Bj|4s%K+)C9VeQzKjgC3>gzdhNprc1QDw6*CFQIiR^sIcR`)#-gzL
zGR)I<r`j;WyOWaCrdA5KJY~itsPoFZzY`+29<4h6_V{Ejw^~dV3*o<zgL=2$LB+Q_
zTFS|_+59d&nJ$EI^W~|$33E!!h*EBeq_KE3x~=f!pUa=4l(Xv16I-Y5d(209&66M$
zp|n0^G&S;RDFcEL?G~{F(R1xzdq|dxfJm41Aj9sN2X%OA=cV6qQQ@Ny0))dYTHL1~
zo0OE><314mR4ugAGjkswHMgo6C`FBgmz_Y$>c{-hvf9n@qGp8gbZ5Xeay#m+3p&|n
ziR@{RPs^B)0{WhRI}A6%HgY;@k2}E2S&;1dhJQOu&B8XaJF2Y<7{Xpc^s8%-NB7L#
z15PL1a)V^7*Dg-L;HNnu0xvw|K+lZH0YeOnLftW$lt5UpP2NOE!=4%AgQi6nsQX>!
z9)KIrw6@3qmE?4k5hMBn((;Kp9y4d-!PThjVA;(213^%J%-b9C<LsQuRa*Ecqd`0q
zuttn93rXR3>AduPG)OmBDkNAVL$Ctvn(4`E)=JRKFKNj6ZiY*bWu~m~tkSLc=!`xp
zSZazW%)&gw@$l-6c+Ta}u;ipG(9E<l)Y~&-fYzH;ZQ6>*+@eOhxica~&|B{_JGV6Q
zym*A@@KJ+<1gMbA@&_3Od+ndtNt^Itsr7}h?q;p7+>YMhv`wxK`bgq`uIVM!GZeBO
zUod6ks5vuP67E{U1WWB&hjrI!wQxb4{FjD2kvsBGFIF;Z4lW4~c_zq_Afzj3Dnol_
zbnrP$PQlg4u`8#~8U*26D?owq?v5uFkc-x9=iwM<w{wan_^1<DX-04Tcepf~6yP3r
zp>Sn3;%2QvF2FCzw}t)EuRwu5K)Ovc^l6Z24UM!&xegAf7Qp)HoQ_Pu0&$URiOFih
z#E473+YHdI976hK-vKM7>B`balEBs8uEV=)w6bzL^t>%Qsr<~ILiE#3P-oAK=JDd)
zXGOwgReWtoh<|9M{L^u~pj7Q0?2+{vuYP9Jhe(0j%JBi9Zzq8H>#J*N)8m#>Y~GO`
zbBjivnSQp`_JFhrWz3c59(c}dDUVq!w-zn*MGz6&FAdl!L4h02P^0GKw34xRbTdQ?
zy9@|O&cL$VeFBD?UP9W<0dm7Bah49Yk_vJo*9$(7@0WIub+Af*Y(7Uek*Hl!58e8U
zMhS6yzw(IT)Y&vJ9zr0i0SRo6a~NC<PP<JUU4C8`)u;o<PfC$jHfF4yA<2qLcrxxN
zdIBsrkj#sR1d?W**<M*&F^t-?0Vw2bdG3xzDid~8JAQo`>X)&53eZek<c$vcbi%`t
zV)ulOLF()6e+i=A9;edv^JNL&6l0tskEIQ*WoPICxu{V^)?0{+fyHfe*|!jJ%=?eD
zC*lg8$?5ardg18u)`SMKj4W<b6kZ#dn(N-wY4M$U%vhS<nT#^qRkJutjJW<DR;k_*
z^%uMDWGW*L`TsQ^9x^dQ^e=!JW`-Qe2JHoW<kx1QGPbyotBG%xitM(Ew7d1JlP7{S
zH7vOBQ3_-*lT5)L%L_m61@*mz-db=%pB|r`fXNwntOKIHzCeJ?ELZp+^fJv5pg~fH
zM=r!d^zx{Q^qzF6$NKS^e+ZKKSpkBJa6z@y=|$wcSXD;*zgAv)dV9t64IpF=`M)dw
zR5+Xxm`N$8Xpmd~Lj)31PYtlrA>ESym5YS5Q_9BMGP|Ddl5q=_%8Qeuk;t1=(4Ula
z_9FaM%Q%8mOB?th`@i+Ljx;RZYF(f7N-5QvwzTm$Y&dPAj=b}|xHY8ZdbV;n`I?SC
z;UR2Xt4x40-aby8whEO;;By!s-C(MVRPJ?aowHc2D6Y`gw_3upEWak~cUBIf&Ecy&
zt6$cjW%Y_pzuy_3gQgK~o!b%g;Yb{4b1znM>Bqb{;PM}M^msTl;l=`t@Q^f&=XrM1
zojiM(T+fXBzi{FStQqUK_E4lzh^(jQ#%xE=x2}RN1==q~!F-iQPA;575%DC5zUef2
zp9d_+_euet=5L08AseRP!pH^ref%FwS3*XEJmla8L1<U_+^z^8Va3admq8wI>RqTi
zXqj!*X1TDX5j-(1O;b7Ea{MIH);}uNf92*>OaDpSD^ndH2A0vmLk8k(8)S*TQ9^!p
z{H(C<XPK1ukKk8f2xcL{7Agx2+rFi*r&qb%fRXyTdj@d~L^Tz}5@~Fajf8|w$yxii
zxV3+^d{KSF@8~*ZWn+L_bgn8X1TPkU-mb;XiiS9DDovK#X?ONc9Z*W}3Yo!a6TkF%
zN0(AJ@iF%bK==z@x0)E!k}(Rz2sk5&PJG&Szvg;ATl2)qlS7L~Ia86cdlVh?AM*3w
z3F(cp6_T9ko}g;u{mn&Z7~QKJKUfU!jLa_+4e~+Zy%C2g$&`~TfK27vbEEjEchg7-
zW7=KqvgoSOyQXkBb<_z=M8cqBj)+6iwbdWyH254fA16v$oY+v`XPjOb;$dn{NkYv?
z039uoQ`T=CLX&^I1@~c3aj0l`d3^4EmDQgOB7P;-(x_W9<cx24ZxAEg(3cQQ(lO?3
zI;Tu`XtvUj{=s-*+tc);(~OKOar*OS!4j~56UY<B$d{{5z#YLv1>>oS-yk1cs2Hx}
zv{;Y3!|?hNbg2>=n3|USg~}5Nt`#FdPd^MK*<c*>iOq!h{9;AQ{lFhE__2ecg%M+J
zsSk-8-jp1yM@#05JYLjY{9KABtp5y4Ycc3uwZar6RI&dQJC)RhFILp-hsM~>OB)wb
z#C{e&KI)7m>77$!&^Z6#Qj~yo;;(FtmsfvvpE?D#vLbDT9>>h^w^yZlWv|q`omI#x
z874twhJ-Bf!hjfuMB}M_a5Onz7(aO6`XJ5Hi<`sWZAt}J-K-#>M)oo>k7>Zw0A7m|
zp=KU6eY-z&#+Wbek;}^K-SA*Y!5flj?Ql#sZz5j3$T|C&7-55ZG=NerL;SK&V<Q|c
zlzXX{{4r_!U$Lrq#;tkP%|nDKM5S($m2FBMvb3@eRwY5o3C8n6cP59U7Irz#LOC%o
zV7!RoqtJDA8*d?CX{f_7LDdgbi;bvPAAE^+Dc_I@3BzgZv7>iHVQ$s9F?n#bHU8rM
zrYcMR0G~rd2!*G=`y6^pBAg))6e|xIVw(#m52An%YS+=MuxmT7eLQ+*4*xeSg;s5(
z_Ey)tw$`ckTuP=pHXOGz(6jcACO7d)duBv%?Bm|(#)jJ776_eXMwiRVPDPrUsaAAj
z0iZ^!kP`<k`h&2k>YYF9(y`>m1IqjP)#J<*o}V8GM=^0c&olz@Ad&A9P!Wj-<=%*G
zw?j+bllBWW?QJ9!gL8LSV8qnGnBat9%)eNW{q)TOjnLS<j<IV(IFZZZRN-Yf>zw@=
zwJfWMOYp_}8EuF2=TK{gP3612&}K#U%ir?W_5MbgP87Fy$1Ls2`U=kW)Q@=%wRd)(
zh@(Xw@RqW%cGvC?IA$(;Ogww*+k#8|yi|AFOVKx2;kSOw1Xt6)chBPnbiiGj+~Ka!
z*8C>4-ckF5tzjprtpKt6`GFh3aJ^O>CsZITQkQ~{!dXc+40h38sa$GH<C*LBz>Vh!
z$doyvGy)*peHP@?Sa?Rb4m-iQ-JNllIKQ>h_tOkWMt#GhSA_h`xH=UvEex<J)2tvA
zqj!myU=Tu%?K#uEC4{Z@<C8}(M;{r$1Di4vAxdaG_siR*g2GGuw?SMPH3PD{9}+;j
zk;76mb1uo8fk>Ma-a_LOdnZdBVHR8FIV}n=X78FyL=a05)J+gR$?3TF5A%-xz&BJK
zmCCD6yjW}~J%HJssQg3tB)j7-F(N`OPokYZ@>-I)PtiSu;I<U`JK$5-YWZU+UtTz8
zDowGsHy`58wlh3cy7Dx|JGgXyqZfn+uF=L<-^!|sW5TNk8l}AYn)v|}8s<Z%<@clT
zm|3z*h<6`sV(eeweTzBJ{1?Vtk@$cWfUl}I^jczD^F)`^?Ugbbq_c30K}h|Ws*#I#
zlcG+eC0dN5%&Vl)IPs@lwdvxl#eBEpL@<gNQF+ys2sQnSJ(0-W;CtSOeaun?3c%vE
zpLQ-qtPIRfQ7?pnH=3>937cwM7Hi*R)LG-%NPf1if3pWQ#_evB3ORQ;+KCTb#Ai7z
z2`NC9Hb7i`5C9CK-qEC5LB(kf?IXX%evVn*j<?CXL5GBT+~zN|Ej`(>*C8K#$#}NR
zYuNcnj@#+euQtOPcF85TgmgDb`d$w>1wL~L@&3ZH^_&;5$@SGB)ElmUSm`n)g8z#Y
zS@w8P!I!bJaUQFPX4r3@J)C~}i%ejK)D@Ne);}}2OKqj%_^`fJPKRzTB_isx*fmt+
z`e;+98MAyHEcF&YCmfT^@+&HHgB1^$`&r{G$lFA5ekw!3>xHy9ptIn5^%*v^1+|0F
zQ&`kB&tK;-LM-mzC83X>bG}knC;%8yAVEC1MRIYiZgtJ=C9NxIa3*E9`rFSA%oa>9
z0cqTnAiV^j3rBz8X;LCo*Y`Uf7r%Gze{jX9H4w`Mi5Z_%{I5YE=A!oyi+w;M0u&-i
zdYeo(Rz~{Y*y4pT*Ry%r@)!J&_2TX_wV||juH-jeeH(OnRm!zwU5Et+Jh8(u5?EoR
zOJqMN?$?@Vs#^ZZi6v=W*mJ`!WpLlHXDHpeKh!SyeDKVtmvagDC=-8F#V46w^TnTe
zVxZ5p=m2W@eqS0mEY*d)?4}NDUu`6b+oQ`Q_QY;(oxzA_bVFo831EDeO3Du;#cLPR
zPAU39NpB)MUM)!>Y|AHXLVqs{9$CDu{*CD*0O9Rbd}xq=6rrEjn`Rm$Azn6QjW;by
zJ1GcNy$kh`VaF<!0CtnpKAdj<I@0LURzS2d@YDt`EzSU=>*p@nttus$UTq|?{iDkP
zi&t!j5oP!KJ9sA{A<sEcV@GeS8nNepYQ|OLNrTj4VxH8v|DvrEz}m<?(F?%xCz_0x
zZFsfGm)*@yVDV7|QqGdy9BR4&&BVmkcP?aK{n%}<KcG|xo7xVluC^4O7pGe#e2Pa+
zLTvHhDX2`$0A0V~D9onq*jkb{7i7A~9&5)9t??%saNZB86Hk5b2+b@|3t{#49f34%
zQ_t14Ju?~Bh{-fhxY|NHeXkMl9_2Zp$IDH2TD7TGx*gq=NvO0p20!YujQd}RoVe36
zy5wF*2-d-p^bB;JdirkaZE66x^p9!R72=*1vWEtZ7WN1d8*(idSG)t{6ZWG;p5}la
ztUN@@v~D#NE^V}RZ9iPEU;M(o^+nKVz9)SJPc|ji9|;ypL&HJ~mRi`Dw3FXi6#r;{
zJ)iH=DqJ3`^b}@J8TU`8$L&4d-`%X5d>QnpDn+8-QG<`|vzz2<3F!;mP!sexfk4ZN
zYR>PzN1u6aG@K5cP=me5D5nTr9*5JMFqnSrG#7h%YYr(QCXT$~%SeL#q@r)sMc>~9
z9pj|JeiQ#8Z+Gd?Fd}D`IL8$+U3N^3;>)r6Sk$5Y81%*JG3DMRzIs-d6Dea!pYS<I
zDFPvSy+5$jSkJ_f;WBf3;L?YkFsv~w+sS2l-v1>}?A5PK4|C7O25FtKUDI#T@I(xV
zY{h>gLkRRI-K#pwm0><4#?sNRyK6cu`k_|k^l1*0s1wpj<(fs5h<Q@B3jHo^28~_a
zA$%=Xy-_X;(z)jledHI2UQ2kpuB1-ZfYLMcELP=lt%mi)N9jV#(iSVIV3pdoZ{zZt
z%)du;s0*t!ER^t3n9EcMMl1Vz#RPMipcib{t|2NJ>|+Tg>O#OxW!5PpPR^sdt={2P
zlNM6WU8lDEcyTOo10q$i0NzB=Nil}c%_<VR@II$Mdv%WZSh84<BHFgha;vs$qcP<{
zT+o_|ay&k2*q0FXi!)^f%AXE-9Y^qfh7Kl89|J#cxNpCaw6*z1d-r1Ysg^SUY*Lcd
z;A)TXQJteeE;wHCKlMTrBg(rj>{qVNgtqE0Jp4$4WeXk{kC-cLO2DP0^H4#2nYCFA
z>QiH`iGzb+sD8~9yJ(&`^}v_0&-dq@nLUVsM5gh+h3KP8At%XxA>+sEeQMfR$2)bQ
zPi<Wh4)@}@y8fuTW;lin(I#7#rB4}n$gEG_Fd`X5Rrs2Wx|f&~!jsvA<HjA=ix;E2
zL3|MKfIFXZZSR`;3CCCLwTWg!a`CZ%oFwFkHiu6eVtj;l^<;0FgFerfod}+J;CT>O
zl)26X7*Ccq#YI!~QQYp%%+|&B8#_u%sXc{isoZci3pid_UVTW)`SMS(>dd99d>~p)
zOq=1-UuItsPPiBG;2|do+dqR*AkMEqbN8?)hqplD_u7|+mU<B{QbOtVKhKoIQ`2di
zUD6fbi4Ja&^gV|)WD=TRO{hy*{<%Ou69Bj<0Ll*+s?JQf!uH~Z`p~r%9yZ}^2GTVJ
zh*bFzRJ?U>b$I>$dOR(bxeW0=i}p9(O#O3rH<W*wRSk7BK@#hwDNHjP%LU5@m?dcE
z*~P1=0)9&(M0HG4XA*GGQfKQXMW9F!?%yrAd6KXszl|3bRWbRk@7vd*cv(kaOCVlf
zYk{ACDKjUzciN3UG}4pI&keYvXN4eF+3W<_wAaKW#NDUmS{BPSH;H~q1u8X_`^ibu
zAm!+okl$ezL(`M;SWBaZ1aU>B*j+X-rMq%lqvN+ePdgoqN2mzKGjK<?@ihThsq8Dh
zFE4qrTI?)&(o?~BU|58D3ZC+m0<nz^tk)C~@?dDD=tD=<k7nXa$g$q|&8}`^5Q}xD
zPQeQ;@`leo4F9tl?a=8-7M)@iIXJviahkEPdh}uHH|=|}e_!+s6S5x?5{vNNOQOVS
zfh|Ax;O!QFM^&LF`Q&3xqCm`xX!g-@p53!Rr0Xal+h58a5e!5bDS(}R`f8z?G+o*8
z5(#5*KY51Qi6M@nf&Ssysj6=Z)3;WS?ux`fr-`w*{k(;d9`O5~SILtv2$&8Z1snnd
zt)k!+A|0VaKqpXwEZ+$Eh9QyvMVdV|Qi2jN&49fFC=o+-W2(jrDbvjrl2~G$h)Jjq
z(5cuoR0Gkkm`D)XtRkUA7_)V`7<+hoExL}8jb4ko0w5;$gs172+uEi1G@$nn#N7c8
z<G(qV5)enO3T&N#pWNT^ogT4`IO-h(f+=2@d+>6V?DGfEIsb&W6?L>q)PQiO>l85p
zlvEk)Cg)N{)Q*UgM93jZ)6=-U&@!<0lpprd{qi~7Y;6`{+;Y(>pX<h!Qm%)`*&jnf
zJLIJICFN2VY-%r=c+Fyh@-}x^dmL3m0v}{p@sMw_VLKnJG)}=0a0@}lj>g3i@$7ck
z>9t9`+|(JG`H1wJl!MzBkVm)ySjg<ngCB6U(6-kFH>LZN_dE~`%F?6BZpqyvsT1>7
z7YMylR*vxut`;?3ZL<u=>P}*pb2f{gm#D(!huR(<zb6K&;k|f;0D8?cLwyX~9ETWN
z@L?epI;;f%c+25Jvzn&}6=!?U<ol9B;%;+PYpoLq#w&Kk-fMM^x2cY?+!88|X!4mN
zj9spn8zdA+fFmfSK{l$$J(l2Vd<+mMET7t?R7o--lf4<<bIZ}+hI1%Zz^447Rx!%H
zUy}M1UL5MZ$aW}yhxx~R;>D!5(6#s!$sha=48HPSB%KQe=+X^ovf@Rm#@t;8Ho^@k
zNv)j~8~bGLEakU2eWWSI34t$KB5FGr?6W_fcIEmB+cWJ@19iM8&ey2=HS;^EPy7im
zEtHz#>|C-uL(M^W?|_Y7<Assae)acdf=-S~ktW*5H?ysT8ETp@D%sUH9|sF|53~nd
zjhSXdUV#RQ9|_0ohL3#Tz5guSi|lAMhiiy9M3F2fD+sS$1ATU#pI7t?p7&yAcxJ5R
zSP#6U#|L2R_?J5Ukdy0g!Wgo7ct%4i{94vf`J)oIhUDBXg-=-6jY!1C2tJ=a)`>NQ
zC^y>8O^dhl>Lzjt{=w&vt8VjTDa^|)KIu!xXa*MFcGjOz>r%sj_>9Yl$u-b5>@u?5
zZJt!arD!?(L^~tD_MuyHsNn!Rt1F`}MG1}7{mH95C-)t?9ppvhbo(DZ5{i=RU?p;U
z_e6~LwUqB$SQ3Lb8|p32;=-!dc$fu)EC0+aNFJ{{%s@YLJLrzvdVpVEz7qjC38YYR
zT+o;HIc9fO<U~tT5$jnGmeLO+TDLe6H(p76?Et=7g%y)``w{ZS!x|Lhv{!B-`+Yc8
z7_X3>!@tIRLrEus@>&Ot#j(1vfE={<S)%vNg%@w_^UbileIDJ<nL^7Eb7I%zv4jFV
ztRmR?sT{C43AlwJPj4eCsg{}jKxfFJjW0V#rX(c#2GYrN6;Bjz_f6p&zl!bs2KHmp
z0yWg5nyqA<0j&!4p2Vi@^nrU}oFX+_4l<uO1NA8#Gjcv9JaMR{QMO`#yVZ4yYL%16
znfM&RwKL5MFL>i*yN_Y+8wKJv@eS6eH(<B#o}KC}Od8D8yxdDdinTUaV5uQV?VM>0
z93f&UmBfhcV|@p}_O+aUk{(hBsM47Ey0$YhS1mYu$&}0XGg6OkO~r7gpWyI25rb$J
z2iXT;j|1cDCDeXZbAtK~d6?X%0BR!cH^l7iAkJ{`-lw2~SMS8{-@&V4oaCB%@cpbD
zeL@+geY+wM2gx+Z-iAPy+;rF|=vc1!;*Xrh!u~m|<&FjVQ|BET(g|M(T!q~1zx)O%
z-Dm%+ozH>zJkg6LEyiJyjRR)GPB42!OGg}GjV^|cKoa{bn=kLQsHnp;*$_wi;IGo~
z_TQ5d`1xG_;TmMGI1AIOXFn$VBWP>+OX__2#@^1b<CN+u2mhlN=ZRnN-X_9G?j74l
zSdB3w^VAZ$v31|f-c{b<3gVW%1akDR+yDade{Y=}X0LXDBFp&$m5k-RCz@-1QP>l}
zE-6k!k^hVd5RjFjwsE;WJPP?E1Th_Zi$}n&A%pk@cio>Xx1D3H4;ZH{VdIyV2sALH
z>3v<ns6R}bhFUdv&UkI(ujNO9_!Y|UGUzD+y_2o~$@+}pged-d{2JQ3D_zomXCQ}%
zS)c4^b1}oi{I+{Ye2W==6K)!?!gsND?AX@13YPJb&9zS?FP=f|TkqRT-rcw=d4==P
zJaJ-wIv-OLi4+qf==?t#`xRfbFyLYQa)YS1xpR3qo?m0UGRyi`3e#{wrU+)__x;Yt
z%$EBK3<NdQ_r|pUc|)v4=6-LnnF83uWHxlK)>QASBgij_cy;`J!?VIr5g8RbPgI}`
zJmL;)Hz;?`itX<xIk40%3X#OQyS_9HpGc>Sqiu|r{;m`dob`1VyZ)6AL`uNPZ3)7p
za^~})>~cgr3;r2G9Qwu~^pN-h*;G`fa@-N41Lc~mJ*FBmOPSTIBGgn(<JVq{^y=wd
zKGvrI#9Zg^*a+ra<t*Sw8LJoW2vjGCjWVJ5|CP>m6t2}Cz0-J4bix3?l<aq3$437?
zLga{<xdr~sbCAI?5+J<q8JIg_ta-0_F1wt{I&@@mOL1*tTDL6qnM>>f@UIWNo^b-y
z=7+pn9h$55FD{TzX#`UYg~&a4lx7-M!nvng|LfLJ`*fPU#;!(1BH|dFOQCVL>bwb#
z9vDZJxV-HHgtw{nI%CK07Jl^3ZE3Q^F{<oy>u<rrI$mT7c|IsRag5A9|2tU{#w<9l
zMYziS)mhe^sZ^2@q_wQ52kzDE%~|c5Kc@aBty8@a$aRD(DkhM_YokA<w;uhVHekYp
z6G!b@?-ZrB-B@p(a(eHs!v1nQ+Kml<tiOB-QSV%lw`rp<<A)Qjo-u+XG6jaGS&-jF
zwfx;Dyd?_dE0i*5kCd|<UKqz~jpyUNNG}C4MBYtgsbJ%G)Aw|5?IMq#2qKoJ{mH)-
zz;<6aQm8FidXa>!>t^`5cJne~4bR;J<38Ct`2B8k;zf}#17=L<H5VUNpe8(ty5baL
zp6zsq&kQQ2*79aYJ=HLw?**w|p?Fy8PDFli0JSKsAq&R;1oN&3<fgT+-Sx16-|SS^
z+@o^hh3kIDs|XB()|2v1_6YU_>(w=nPod>n@~<ZQ{iyn*sQTa3@J|$0c@K*`IWTE3
zkkh`+<@F`yOHaf%C0EVoZ#mb68M0CvMca6}iEr(e?yPKAu*bL<e-CL?R~=M2>;3-v
z)c9D`-4V%*V-p)_FyVKybw<zk8V0_-JdvA$_f?~yDNDAeekBd$l@GQ0bI8e?X3G9Y
z3^BW{UpSF(velb7S%SxVYV2&!Z{X}S4H_gHKj$sP#-sD4QXE2$qjm~I6B+$Ka#vKW
zReU%Ihyc$9bbBpxLjJIqew5S^qo;5|1C-VHsOdHC{h+juRE0lh@xJC3)E*mH=inH>
z$^N^IpX=_qoSNAk$dM$H-JohIyvc6+v5Uy}qg_y(%J(Z4I}d+8pgwgh1VpH`l4tqb
z!U*Gu71#bBS&FMaZjbuwFA=~wdHaU5hJ-%*{HgdD`;TC2am)SoVC=57hE;qyLwPl^
ze(>&%ok!=(5ge!QFnf6R7r8QbgG3$9f4~^R!{~2lrM-lNrX;AsqWs2&=jt7(w`@dU
z{x=njsQ$c??f;%dO{~=R$B5lUCR)06(O&q&rnGkO(SX3hO7FXB9vkfxOSF|QGxm$E
zQc!krY}>+ovpRE!)aMbs%|eI$<irnIV2#&r9eXXnpXLZkxilmdtysePXHRINVO*Qo
zH_^hq0$1d;bWsx*n$a7CSzPi_yLXmz=(-<0c_HZye7OL0kvDUwtY=xAv&`PkBHpKw
zx)l-fBSFL}_$$R~YFyMelA3mdN=#E&&Xvn%#ZC^^Cg+^}iUrWuZro+4`yqgujgrv2
zipm*5<HBkX$(~N7AwJdIxfjn=mjQ~n&T4|ZLcAQ0Zel+nd-NA5-wki)R(%9&Qd=tA
zySb&*_`VBG{!1pGiwV|q^R}(eyvZBw>tpI?#HeORiFEK%DkJ?!;}3?Bg5D)6FF9T2
z^=@L7VIliIC$iywMiI_g)7?FAaVeK27=*T_Z^W|3eE!H<m|gCTOzSDEQ5M$+V}tk)
zK*9-%#Sr}wNg1-=PG~>btpx`KNv6;t`f}Mx&HH5k9Xx4PzWy=fAe$)BtgztFo;dNJ
zf=gCm%S`Wsh0?p6X)fpg!^MU$H3X>2<h8UNhaV9wBy;W;uPdQ3>SwYmi2JsF_jm9?
zn?@qtY9SKOTKjTVKn&Tw&v4Q3Pct*aYq!GNph_wH4n&)fiJ^GsUh&Z;AfB<?X}5ml
z4V|dR?j2e3-wlpDZ8R60G)aC7d_~KY+;(L^dSvO_Ze2DwY3FHP(1Ht{Cs^1iL!_Vo
zdB~TZg8<caX(t6T);sWRjsJiJ2@;7yVoTC*vBqLTnJ|p!%W)`jZ+NJpXsj--xJ919
z_q9cQ(k&C?NEM%sn<a-e7q0#vvAp)bMq=JCh<x#5dWWZ-<Lzm(GY{a~flfG2Z#hQ%
z6c1n_`ngQtdl&py9PhADn18{(P1ehzeZwhk$e3*CbMX#-+iSl3c9P%Rb??0dqwrb@
zI7Gv**7ptLVa?;#!AOYSKtG=O+`8AzK3dV8s1i@pM?!4g9%Cx;3rl#vftovS>ve8#
z*l<=-D^P>>=uz{@vxkYOR#&bXYAl%M7dG=^vkY9oL?96|tM}}W`E59vP!U(2w&3o9
z*Bd-Ja$2}bPC33(Cc%2@CS9&Gn-~1RMt2<`j4Q4S1b$4f)Qn^ACbil-ceA!0#%HY-
z<KAM|l|QR#@MMRIF}sa%YfSpb+kQQYU;v|PS>CC5$ZtetXuq#(tHpBfk_)15VLdw1
zAU;{mggr-OwM2AYB)F>874wl6c@X!&-o?}E$@a2Mcm5%Z{IYa(61VnL8vT^Yy0gk*
zzyl<?kMvr(|C?3vLbr1S#X*;-FdZ8j_rktNti)C<)|V@nnu#0(AJC78l*);3X%Q%{
z9r#T}^)!U`lG)S2jlzawWfKqM!6;%@US<5yz~|r~198v);XXvSL*t+IVpXBtk5dvf
z5hZUD-6p52VIP@Wbftwpk}4O#2Ubho?f<R9AcLzBot(q>!}<v-mXL#zXW+BKH{Zs`
za@J#Jsrb$0aq2ESAlCK26dp~B^#7)khcb7Z*Ounq?SkUQ7=xk|CY9po`ePHOh7eT;
z5=OUbW|9Rn%$B}<7$RJd&PYgCi%-`<1;vUkw^3|PnuOvZKwO+`xgzB+j58zCQ!hvP
zKMqy$8Tc~fc%b|eyRYvWP-NjDiZA=6oBHs(4K}2VEp3ctpD1M&xK%7LH~&QVX4$&_
zu_wVT3cUj*79m$FynC&BW|i`HmqS0_!VNyfcx54e3#P@=%zxI+U^jN9B=*%2rqi(^
zpMJn<M?AhR|M*U)J7gO{vQ%p3^Yl9?@lg;W7f3yOq@b3m5FhQ>l>{hns(N|P%BD{T
z(if)_?C|#133QIW3{vnM@{K<V#syFzjK9a-QsQL4I(4)4(Jaq-#>J23bjG);TdD6d
zQ676SwotK@b43jQOe1Q@AE@=|leXu-e<$z&wG)5pEqWH*n-qrr*!3Oj5>q@i&&4NR
zBrCDnr?1BPpW?MygL{n{fRcliJYo_AJrgA`WPr_%F4xHpyfWy>$PcEYbh~$kU}uU1
z(HJW9$(}S=PVy9pj|xMEn$^BRG8LNlu1-eiNKcOFl$j4sZ?!(O$Y5NgyGOP!60yCJ
ze4=#K9l8Citsq~J`KL1|r#y_=Q6Tc-ROZDn8D(a_7GwP7y&*0br(KTdq658K&t5uv
zS;3R`QophJ$G#C$$I;ZoctLA|J9I;kN=a{ql+Z(As8Rq(qWVD1r4t=_H@rW2`>e_Q
z6Qq=cVmcpwJh;Jh>l~poqvy8Qlef7Q0vUp>R8rU+#$jA10S~j|;VKS2c4x6na22C%
z)4r5lGS#<kL{+B~8{qY5R5I+iFHp)c24>+<5#E&nFP}5T6vD!OXYfjH*yCGdxr!xs
zjpemkI9--mSQG?W_R!S_;jN;KXDj=kR7{lj`8q?#euMh*@v~LJ2-Z-9|FQdeodU5E
zAQs!P-?2pR0p8<vn~q?1)EE7Ar)BeX7!P!$zJc6DHo*+9{>O-nb>7{5XU6t&T&R=$
z_t`3i7bJ1D@<&S@0NkIp#?}Sf-id|%d0F=rxyYl3ZonGWn=`|XVFE;}gHNwKN=nJK
zFk!retsNm!Rgq}MV+Hl82Vu!@u#lIjl-$k*5jx+s8arlg1&}UZdm=UOwP%G6+gN!-
zjLNe_#bnw<9nYQx$v+u-$#dCtY=6k`x2qUFMpqcq+=*8H=RMm*L&ikBwC8+3I(@mL
zCxWT#O#ZhyW!|GBYoBF*VThKS<70K^&s-moGTYt^(aM(J!gQ-f$MXOjoXq01mnm~s
zco${EMoIGaJw-k!wkSWdX8ZAVtBTq+Y_bTr*&pk@AA2$A5bM41?JV}0+oC`;F%L5r
zex*Z}KFFt|zkLv-O5I`Y6q{<>e7G2O;t7L}qSZj9Ga7AuObA=ZGKqfZNAA0KP*UL|
z6J~wpzfK)Cj7x|7^AY`jTrX<$GP<zmK)1?-vZKq2m|LSPbURPv#n<b++axVJsquZ%
zq@NDS1#}RIl%>H^U8Bu5AE{Zp-SZv%?!z9GZ|LpHH>{qDk0Qo}YXCbGS4i?x4qn(h
zZ@AaWa*95ja_3rf)0X_$*^QJc+p5K5;R_%`fD~57knyj$`Ovt>1JBR-jsEmu56IOu
z{vt1{G`3VPI<Nk8+H3@<NmX244cJvOXSt$OKKJ+RTn}e=2-NV!-I1d^M9En1xvE@~
zegs`7i4k2*DS@o6vIa(uffxhHY~A2Yy)}hT$7unOL+l+`y75-x$wDF2#3|=ES7}Cr
zo|&E<JZrk9c*G9B*pRl*J3cBT>OD4W9lo>O$E4{`+);}crO=(~9{r+2JB3E?-?$Pe
zvvJ^VC&cau5MPy!NtTdL<!9>#fwnD%f5TnT@4=3RtzR=nFW<#C;xdw$7)~c`s!wC|
zI{b0zW`}7a5cpXgvzTABuQxXNC8qT$cCnau=c?cUjhxF`r3C-e=3gMkbEyBUp>%;k
z-6;m72y2PR3SV~M{EW}zOX%rk^V^r=pskxZdxGcKu*~usYcT=wE^ORP5u0%$j)aoR
zJWTGqRo=>HaiRoX+ndmNk=urJoavjI!iNDEY*1X`lM;Iz<FkEN=ii=VonM#Jb|N!1
zOnMuMv=n=i26|TfoU|btz$tl!XF;4&2r3$x?AeJ}WE@aJwG|W=pK;%fZHwi|@1o)h
z%2QNjq{3rP@}Sr0+|_P6lmuWxEVN|Ei!$CJ?{(r`dD$P4uY|1xUhj1=a-YKs`?KZq
zpcOrpA$v8kbs>(+vzKbT-}`v}wvYqkOn<wQm2GjHZ$+xt*O0-NF`;X>UKqJl&9Wog
zMInpaf-95#ddG8OMViq)dh0#w!LZl(y`#N$_`!D6`@l9CD9sMS+_!qxTvT9r6n4es
zL0}f|J^n-n;DX_xzT)&pZ|LU`eE3THT+c32Vyl0zcyPSm{rHV6pkmp~&+V=aSKO5q
zW&f61j9uFz;@G_avDNg%3WN@Pa8%#7c4Hr-6KyP7PwzPhr2<>XN!|F5#t`9z`=p!N
z;SCBsv9tbDKT=NJT<6LD<en87g9^juo&YuHUl~4_;!SK(o!sB;AJkmE@&JQx!e2Mx
zp)V^`6nG_1T%7hU8}P1s^y!dKu}<LBN>;7YeUi{C=3#7?E#Ma!?>T;+WQ$T}Lf0Fn
zf=u9=N)y&Tj_!tpB#HKSzcj|^bN&irbFjcu3(6L5Ne*+V_c5CDrWnNm-T~1?p(ewZ
zSGhsJQ-Mm{*Po>qYUi?U%4={`XLko6{f9Sg-TOabcPrr7EaRzJ>iYWgF~P)^;FyOr
z<NWPqSOdWf>uLaPbvEdCnBpmjTMvqq6@4VaZiR~t;NBhtbH8y(-ftzY3^)p$NK)4*
zVs<@l%<LK~;$9m>YL(e~I779d)_J(3*nZ2FO1o@OGu1sKkuoOlWtrEwfHu}g5d)bh
zsRVhpOlZ`)(UvWDtLj^ax1TtWPd!|)2q!Kel6;?*_3}eP>IWL1Cy`#O1*8boY6Dmj
z4^$g}B&{IG9YKzW5W;(r$Ezx!R1p{M?X|z2xYA6vDe|zG9}nZYnbP%%P29!chy}4T
zc>sO?$eOLv@?J{z75eM!#jsfH28~112k&M;h%)Gf&8}L1!)^yTly>X_MZtKYo>!tR
zn3q@Gq}W_!_wQ_7<_oN{m@mA0X`^6g#U<`$pu;@+h=sAK%_iurvb>3Y62<276K7zo
zPtKDSxf{|Gt*krJtdjowR%X!mWX|jh-=9l)wuzaqS%(tr3iA=}tM7BeEc}kx4x~J_
zy@T1FunJS2i-YmRy|2vW>05dj;Z9hw=eSeh-6skgU}daLoZ$#dfNG==v}D)}h6vQL
z^2n*g_F}4ssl3bup9h!FV^I!2HVe0ux>n_PZ*4U9YyNx|4t6$B#JzUg*H%B^Enz5g
zoE@%4d{UUexwbnvoNd2gJ+IOh6x>Xmhgs}u+{T9X|8yQ`R5F*Pkv@O%n$ihMvU7>;
zVd@$t+P+O+g3pCKli8M~D3=IaeF^3N%a0&p=g#C<OmtD;8?W$qJA=-D+>mg>jg;Zs
zuLIQkvtdyF@;}^*b)p2alN?AX#0v<t6uSc(*-d=#)E>)wgllYKgIEV*#*F-FxR%G4
zf(g^mH@TB!g~X}aiASr(@^H*dO48u|odY-~p+GcNRKEUxPUwEeiEjFPr)>A&WFjX9
zc3baC;BuEZE(E>ixu&#d_ao)BA~-YZDwBDbmj>+Z7j2h}NxXxxW+6^|fD(S~NXeJ=
zBTU=ML_Mo-ah=h>pnm>!%(CiweT%>9AEw~sHyNPJ@8Qp1TDj5gpN<HM#gWdos&An7
z{Bc+p^cH)21&pT>uRq*Sls(;keAl|;pYGs&tg5N$I0;|tI+}#0LryNGxk=B~y~rAh
zs*ivSD~M)zYGzG7(vcz~QI=P?t9$4uGTJQhDuS4W3p&I98GA)=$#Fa8*^tuV4S^ze
z*X3<-nc>00ujef@MUPk%#KZBw*+AbjrL)iOF?j4IjL}SC%M7_*b>;nXoX`4~>HDC!
z`tBym;}@Q|bv}ew&tztGnXi1|-C^T<@92H!A4&U7i2BWhub*IB>0<Zi3!#16eYSEs
zkBd}CI`bh3A|@F_%vDkvX86qHW+}y!tzY(@RCRO<C0*x1nWXA<Y&PqN^2zK<j{c!V
zWPGR{J^Y)dHWfoMnuU-4d-mpYMR{^oSKKtv^(SE0OtCf~@H@Bm)NP+FT=BZ_^o%`v
zsIdoMMk#Jxr{Q2@pR2Sku9ysmoVW2Xa}ImFFU1HLRGuq8Wu*}rn)bgNW^818!oB^L
zn#+SdT}hty?1!{<o}^QO(CM6mf)k7%ne>tVmUhumJDKc99)&OIHsq0I+@4Y%YqYSd
z@HNQtN@C%?$$QdtqiAfcImy9vz;gHdjGA}QjB9srj6Uv^`b6F)haG#@w!Fh|A#XMo
zPwMq%<M;UAri(wp?#Uwo0QY@=RnN*(o@$Co?hzXOcIm=+_gNj95nU76t}baJVnh()
zj#DWr5^JC1Cg%VOC|zx5$O!GEvY5%w)n|v@i!xDMfLYiNIKF^1BaK2^fcF2kQCjVT
zvwX2~AwgdlcvK5e^^FoySUoeYy;<lqS(!^trYk}!q96ue9Us^cn6{OT`Bs#1#G26<
zP?cyHB9%A(#);*-1N1#tJv)9V#6@InWq5iBA61@qq>Mq}+gr~NZ#C>0G2-s8o^v{w
zb+lxRGHi9FaX%DlKYzrQK(7KP*lVV*(t}%-;#>=ir$wvkDRJL?Az}q02y;|DX16q0
zLrkE6yy{6$FNytyNdtUVz1YmB1oq}nom0Nb{aoV97CXXvAkFCsEfwjK!x@6h3m~r-
zdV}UNjqbu_8-``ZZLpFqUJTu5@lixaOb-wi1WD5yy<e8xVnZwOf`XO&7g6)pDpRAw
zhS`d=NVe3YiGhk77jdQ-d>PWYGhHyAE@w^`^}_xhspl{Of-R9HrsfYsz~#cB>NpXq
z-GPF%2u)TLRQ9X58qql~2TCCM4)VUr=qsFk&pGXuMEj@1xxubRnKNs9SqTSl<OFSr
zuz7E}$tt_T`NpEdY^uXV^_lsx-rS$g5_}Hcx1VlTfg0wq1c}uhY<**hz>8BuBO*iZ
zInB4weYg_F4^d7<QMf4fA)jFUErqpUJcEXd%u~#^F7NCb!vr7n&8<b%*^W&0ug|#>
zao{ZO#Lv8atA#YVH);t{K@NFR8}*Mw7}bgcPz4l)Y|Az$9g;0e$|Q|?y4Xswkr<w*
ze?XQv)hb<Nch2|IZQ#Q>>(C1<RLHux7=OWEJqNNLe2F%;cj}bo%Y|&vq8-Ys{)v&2
zX~yI`b-7?e2C@3N|8j8_urL&(rCN~+@{0dOD&1Y>7)$0f4iwP`#T-{0X=@wf7cQ&;
zjZTahVP+nYPx^ElgcY`mi|5T5zz!lQ@&ZQjLW-i>`5i`|YFQZrVmJBZ)UDodfNew8
zzErPrZ}@?<zQ<A?G%o4xp(aj+!1n%`*I_H+>OJPFm>Aocd<7yM>LzASnB2;uct>Q+
zXtgf~1)iF%<5YDM=~c#KY=M&nLG<N0G%~<FF&OQwctpzg@)NVa)O3#p31SySSFl5=
z^Bq?5KMg{;zkWOx#x2b}_uHkPR>g-`0s1QU7IMM`R#D3pMHJ=Su7#B1xAU7~L0sHL
zH>@j5TrsnW5k^d=4i97ICR{LpBKLGJ(wN#U+p;<&SwuZ<i;%XIX_#ydv0CtCh)*rp
zyx|%1{TWY4$RcN8Z3Id00Dp01;zhSNt}9{e%2B@+g+2Z;>jkO8@UQ1KUsY!UD@Y%-
z%6HL)+~T=^{2%U>2~OE`ksg8SchcR%ABW(e^55j$^Y<1Zb#6>My93PY&mIo%?^L23
zhHM?}*c}qBf?{HYPho~@*6+)}c<4U;#5hnPE{~TFObp75UmakO{A<16JV}Ck*i|Ty
zb9+u?n~+NYEu!;x&uZ?1NslPCYfeEGEfU?P!3%E`ggm3HmtDYyFx@|_eE;(XUEuxq
z4Y4osW$%p-$&J@^{<#75s;GV>1|>xDd#rr>3|!o3v^MVX9db=fO(Kx=$;{0dFv)rZ
z-76~ns0^^7_uj5=vRQ~3GcvI1da;$4R$32Bu^>iayt;G366`M1k&Vx}S~$DSpFJx&
zAKle5DENG>=Iew>d`N>|W@g9HX4nAY!m6&5Q^PU7<oEA7W@E07f9>qV-jCZmrV#Cl
z<Tw68zB+C=)v|2%Z-<i_D~jp7La%h_I8_8f&)nyri+UDtm5T~3qV;!UtU@%2o7)7N
zzusw2mh^z8Duq+8i1eAk+nXJ3KO2=HO>9OyOX(*ZI?tS`Kxu80@dFRL`v>f+3sCT8
z=pML4$N8t}>iJiA`hDjPsFWK`WYO`P?*uH>{Awi0uC+}%KI&HX4f#e{$TOo&^=AN0
zh;`xrY>|RIC7I8v%C0-3_0P}wf)I-4A@RfkVoXmCsu1P<RIxC(x_Jh-BOU*}4fOhr
z*ZiE*#sAsel@L|3ucWqBB-&o}Lx{e(r&sN0W8q0;dP-H$b=mdj5+5R-%K@-Ehuhj|
z8IPk9<zD@h4(_vZzhh$F%L(fDrTt4n5fe7lUe7~bp}c~aUa;G{Q`O>AD}afj%k7#|
z53B?Wjy5VM3_(J>_wv6jM8jv};*;eB^rNs3I@sLZ3hIvE$2V1}X~c(THnnS62_A1|
zwbr#fpPyUsuGiA0uHEC6C8@G6^!CC{8$T^H`u1OXO@T^?49dc`s9u&^!VD5Uz1Jd$
zc<3y(K!rSAD>JfOcHxm|L|Rjo=DqkgnIUh%J}s?>7P`%N?VzIfynd$NRWw>1u69=E
zV>}H~&ofHMFdQ~jy3;m$i5ZR7|6lFBz8~(ib~{^rD#Fxvo;{WTQ0Oz>q~jPLnyLI2
zY?_a~K?4Zg5q?*loT2UdK6WeSmdbgXPtQ(T@vr6-1RTm6aH>kMPEDM-_XyV%tB#dG
zeolekpZ4Am|GM4P+u@?Tk2xgZ7=CbY$+P{`{WnlPp4FV62pWy?(qrb>C*EX8YJWu~
zYz2mVQ!<wDM3WVt3!3|C@GL$klLD+HID|k>Bkw8+q9kHZmmcS+41BP<=#*37^Os)j
z1fa05Z!AG<3(O{@j6t4Gzc1yOI;41*-Bo#Y@<Ff{$TX`K1H>Mjt#Ke{zIBl1(1FK+
z=YHY&Su|SpAEJ;WIw+Q69XYhfZ7B(($fd1qV$ph0mxPUz09n+5ThBZSUhHD`{i%4X
zmFR;rTc|X$wqpIeI`nqJ4)&<0f>^AJ_l7duFz1W7j<sOETEXLDW5bVC-#msnp_3mE
zwmrMysc|!i5(U{jXNW{M2i4~L;C5x+r#HTD@dIys5?eU34TUw$b>2^mf0huk0u&|D
z!r3=e^DPp8sFA^=LF6G&e$Xsq#}h&6j8H}P(iM-2Z^yk&y~gX6m5jML;9=?>uK<(u
zDi5_Y%BUI(yi}B3lO(<GQ1ajECw7yF*b8kT5?O3}dEe#mQQ11OW`LjW6s54(rOOf}
zwl=U5Zq-`Rh)8}&on(bmxQR5Hb=%8hPmV!h4*Cwfm0a@P*4y9wu`9o3ki9Mwbg{4M
z1%mo|%he+6sWM+Ky@2Q$X6yOjY8fCWChjRV?>OU1+~7cT3hxS=wEnn5cM)sxkkQfL
zy>Fi>KmKCAm!d%_AL{WLo;y$__8;NHRqK}_R><h6gpFxI7!My1Tgy%!SZb3;_Crbu
zaztzcf?BZ^{UpcZOVJKEm!@`uj((O~2+L;eW^B&ijZN`U(qTRv{s|PVO?XbL>(xW5
zWD7*J=fM8=PjMKX#6Dh9M36itg({`emQ=;(F@o?ZbtL+0(7VzVaAwAb$Tj)+2sL_S
z`n_h`d|vN>CQ8MUTiETj)|8G49M@2m%sZD2nk9`r;u=1s7m%dlPCrnFH4M19EpC?I
zkXy4^WdB0pMRlL!bcz*Tv=P57(sk?ZH7NJm!RBBgz|hR1D<h!5@LSviT8YgKGo2@F
zkg20iUE0OdW>&L#$45rINQX2HpMUuwu%9jdi4%|WjDA$?Uzm3yK-FrfnLjyUNesqH
zv?~GeTx^YUz>o&9)s|jR$N?rP=tAc_x5eD-b=;P>_4henG~E@vO#M?0-3p47clNM{
zx0K{|jw)?%$=_o`o#?jpenISuuW94pQh(ljm<m)TBkou&-NSK~sY%Jeb8_EqAsf36
z<rB3dseih6A1tQt+E7I51ziA7I0elGIxg;xXr36oou5<mP>irCgHpgV1@^PO_cV@+
zoWF{1l)v(!ewA=x&h*0kmuoc&pI|v&KL9dRuCVOPNU!|vcBxL6<8u?qdzVtB{e)gs
z=M#G#F=Ec6rHT$IanFX6RJyM_nMh#e;=~~n+f&M%<D%Y5F((7jw$L@Y$Vg$ZU$a(x
zqq_ky6%pHHwdJ;3^(-2gX-93<;v4v4|2;S2?M16}5i*OOJ~;73`RN)Rla-UCBJ71c
z=3$r*hFy{Qy*uqlVhh-wcYdX%_ewnB=PRop4^K(3s2>iAwdhUfPw(TuiY*r2;t#el
z2i{&ePSN{<SdI9`+Y{xjM0fhy!$)pJpDbO~N2Yg<)*q5a9U5ivFul)#pJMea^1yC#
z5pgrjtfe|QGvq}wH10|1#Tr#@Bo-W6*>0l@N`iYVgd|SJxjbPAE$%cGJOHd#dSq!!
zRn|SnZpM?uoISm7tqVl%9UKvma1OGkG{V13{WPb5wFjuG140`<qBJ7BO#L;xfadTb
zpxq0{#mm$_)W}R_aGro*9pgzRa1sSLimn0C9X>zj@7nOyboYZII3|RWfg$B<Hv0fG
z*!TcaU%KEJiz$<;S3cQ91NtE6cznr|126vkuc)TNZ$mNy+Jn7=I7&aR61d0-NK?}(
za__*ZO2d|W@Dn2(sC|Z8IvM`g)r2sg;cuN2rpFeT1{+e0NPs-$>#*Vw`trdZ&^MHA
zVZ7FdyZ7bXq2-zhjO=A<=M-!CBCoa&0ee_q=DQwq3{Jo~AWkXJm}GgrNYcoaVWjGR
z@8F*zI7wWB$q(q)q~ekSXN72op7H=^7Aa@eG90e!MeMqga8c{=_9Dt<1@Kz1yQtp#
zG|aM*gI*$sy)IyT{dbuft&#uO;2#epC*(kdOpzj@pK1GR+?5n8dcA7R^v{t-ECHQu
zde*go3M){-7)$Am`zVISc$vq-_R|j+lEv#k`2XN3Z@>^EGR26p-hcCS&oGT@uT@M)
ziO1*QTU2}w>OgD7pEOy|brtq8I0x5qNGnH|^Aj#;?}ccX+{IC%V1%XA`Rbo({oh`1
z>u*rrZb!EjRMk;juh_tONhVloTUK*Uk&l<l?!S6DS-2c}B~b6NBp)4SvQD=D-KO~*
z3ZoUM7ln_5eQ>fA!{xyUJL-w=vB8dE$rA=&3S>`|u}ZSfmRu}tuFSBsA;k%^jTli}
z5R1!r#@Ewv7lO#M5hR+z<BgAg>9Zb4$l1y*66GeQGf)8#5SiuPAW=UmnLZhgozYWm
z9lq1Lf^Z5I5A(#Xj@}JmljFEYEBaZ>jkE5HeuHmX-mA`TR=XvA{xiL{VJ*ZI12!&6
z1t}R?odWiK9n*k{23GED^B2m_OGp2n+32|>D^4GC$L09^9^Z7V<o@Ove9WyZ(|7K=
zwi40z?*ng+&5mL?_2jT%l3n42vm1PGbnlP}`w`yFC^$6VX&%6)VjUO7Z|AnWcWf|c
zrVy41GroK6uX)kRi2U<)g5%xPZo$j(JcXN8udT>hfWV!d2r^)eEWS4I9Msf@$(h(`
z%HDn<U~G2BGY9B|>ziD-vyQGmgYd7^I5m_$C$0*i(Nbx_$~u7OOH^v))`dUM3|^ny
z8M}7Bv4)=34`_>Efy+YR1RCo?($hWilhGdUzge1Zk(y!;CR)PsNTAQ0mj4TPbHOU}
zh__mOx??YKM)ZN0Ctn>ZF_ZN7KwmE>wa$E^pSf1QbqP4Tc&xYHcsVzojuR%)ev!Fm
zhm%ZTPne>f@zObj-&*Y9|2C7eCf$v-yAR=E2EUvAjfY2#p)R_d68EHUT{ExNDc#b}
zZP3aKTs$rD;i{qNKJhNMeokj}rd=)0Ld^Y-V;>y&b(Kitdlpfc$N$sbnT0iVZE^fW
zCPfCVsLY|YP(TH>0T~lUQ6LBc6`=uC21No&K!zkh;Q}qQNI;FqXbhl$pbTNKHV`ON
zWD){ICIQ1xg#eNO$vr`Q=<|K)ckg}O-`;EQ{a<VElas9Vi<yHSaL#9Yx?e%wk?L*6
zo&&guo+;OttEvpYMT7hV$mSjmArNq2ng&ga{zmN2g_Y-tX#B-$*LpPz-ks;7ahbp1
z_ebZox2vcJ1$Vj)9SoR1DtP2BS%SnOVQ*o+7VP`};hNR1FW2Iara%l_7^<<>ZCc%A
z3Vwi;_3#1#Ilx+#l|<#h5%jk9_rc1w-32yMe{X2h7G(&qZ>2*sdnLDu0)%dZETw(Q
zkZVG2&jyZK@Fq&%Zo}!^jC+5&0bvPPbI%191t=|$GWXxv+$o|0WE3eMxMJN{206C(
zT`TDMqW+@_zgPBOHM1=q)QR?fy%F|7ghrT%et7wAkiTO*vb_??{lGcY?_`~yU##&W
zW%;%j+t{eB;+%f9235_Qp@v_<?e)`q{N@c<d3+|^u#%&!>@`%RIFfn!Zl(WSctn$k
z4&lPJYkuPB!UiP<PFnhu|Aka^d&Rj+)#sqbV=)h`HDEBExN3)Yss2p~Vsg3~t-9Vr
zMaM>p(jEnxA|K1hKX_m^q8@0aJ4TI_AZA`Af(EkCy;k&^JkkYmNhC5Komq2)_;JD%
zT<votXrM!vRXFQr@2C+MPa?h1Z6CSQ_4$(uRc(5R1RC%ruJo{vlZ!LW%)w%1h3Un%
z8Ejbk|F`=ediSo4Aqb$o4WbYJfdJZJLFoo!>u8ANge_evsbHP+0W)W2M!`)C2QTY{
zDCIMyk}ekC*zLomxU@ZSHYk?uTaj?4(4d?`Yj^XwWxy#?PFyqzg_r9-{q#w(%^sf~
zlqlD*BjY0-^MQ{k)8Qohm4Ji#C7j!7P?7CHe*rM#<1!M5>M7wv6^R}f4j(w{jyU|l
zuFB5=Gt%+IS^i7^;jjJY04FqaQY;Bo=Y_xN&TC!0#lcSF=YsyIRVPlS!ug>Ll9hF;
z*~j;4X*@*e9mlIM$CD(MTuEjMLjDY)Azr)-@}bf|;wZjih|x9KnyDac$uAqe#K>T*
zyQJPPu(CeCdTSgf8q<g4-tjY_!F0?Cl}lDF<cvZ4{L=uFaKA~<pnG;i4?naLm>?fn
zwvgxFh70SQONgzvo8otqO}(+pL*v1Hr%1(eCBZ2O%`=3~c=1L^o6;z*crwN`YFhi3
zhq{&R#Vd~%sVOA!GyORPg@@x~OFrs~0%62y(1V56KIU<g04$8_+^Ted*PpCef0UR>
zKkdw`z>y|?nC=c}uO6)%iOX%UK_z;Nx5AE_6JnOETFDs<!6NY(rrn_D^D2d&AKn_?
zdnEZd)L(kl!E?!+kN|QLMiywPM<na3Y&8`7f+d{gX;9Ve`Q8CeJvuH6)-57fWxLXF
zJ$I7Uk$Zh}^ZOA_;d<!N;gU%)Jk2TPx@Nql(o|Z8gKAZN4Dfv#agW98@)D*qer7hC
zi?w;S*|V|W{@VVfVW{@`keW|SVnKjnMh}d{k}YvcMp&OAjK+)aO4rl$F=L<HdL`u{
z!Cr41?<|i<hdYH|+f04c=0DYE)pL(@m3-kO@0y%*kb>rJlU}F&8~8cR7)<DTJO5d2
z630lkH=!h8bSC?9l}yi%r0en}MQI3eE5b2>@e`rFvp+3bw+!qubV;^PjMs!7)>j;Q
zYuj~?bc37$76K_c2g!m~D{B)Zl)VND;uI-17(X2(s1Na2lUmD*<m<!nntx8Ebvvjg
z2sYUshi#~Q?>K;M%dhVrvv(9lx@NTw7TwD@iDcTVhEw0_mz;Gktp5%;w`|%~J!UG!
zw4Aq}HtjqA<`@e+B9hJ-t<ao8YbkRP$l3J0$l3nTUZaJlo?2X`4~iwOuqHJ|wzW&%
z6<fzsPS@q4R+2UABo7L!=(p#pBk*=83zr&dL%;+7_KDAQTaHdwlY-KLdM_mP9?6^B
zdzyC^`bDEFUi--xoldtmb@~oJRbzphX&DTDHb;A8q_KVCxHKrM&^kb{hU95#o}9m1
zH<O*9#^TpiSa9v9r*4xcKGjQ#(_qHt1j>@trXZM1fs8*aZ>Sa~N}s-T7CCVIh~Ykj
z6Q?#O3K`2ki%JYxZiRuWRWDib1m^?nJ-LA~E=<hsIP8syg!kV3Gx{*coRD^XODAVD
zhq90TKJ7XsUK2Hy7UQ5=nr{{^xCOy1v47%S2(oYgxmfVdI`u<R=Xqs(G?xMbn%~(x
zcx0@V_`1({z7AV6y9*${UKi5eZ88HB%?nx&yR_x@1VYB$Wp}s_*dk=;YpwubDI29C
z25Ecv7(!NeuZx<80nGlrLxG#Mj8F(5b9wU@MPU_;mQSo5NNG|A^E%40$H*JRS=ng7
zW@lsqSX9mfp@pB|%`pa7fnCRT24&Gw`iw>Qi>d&Rc6t)Pnl2GtLotsnhcUNXE8&YN
zeI17_9gM;+D&5wKsz>XIAhv>Bcztv>(Qt-;0!!yE2WDvr5AlY%s-;nZP2SX5p?z^n
zVnO?&J1^-AqjJs!N|^9{hZJSx|4Q}cPmQy;q($>aIM`V}Wj$fec)lZ0$^=~=qJ5um
zBJAa@rTJWGlS4FUn<;S0m-AgSS7wt6tawhp-@MYSQIZ`XA1$w4jF`M^LJnMZaf`eP
z_~FqQ&g$63vC1_L@UO4i3NX*v3cfC75}WP8T#zk&b;YiOkGa6ht(w5irM$#^nnox-
z%#M23SoV>AQ|OP^hM}HeYmX#5a&u$Hre8i;ZTR-DC*#`0ca>ZYP_$#S+SX#mV9Ot#
znnaMJwR;#=<$5LYT@|7aU4(TX3ayeA^9jZJyYRZGCjD!}Eys^k@m;VTbF)I}@jGJN
z|MjW7sJze{DuewPdNKvV3SKKfzWQBRryOO3*GopF{iu)?Rka>6z#S%9?pghzywQ7=
z4LM8QiH`2q9<S~zG4Zm|Gx&%Qz;CqNl~?YObr|ND;{e^{(is~w7xmx`zSL!3p*LFP
zC{Dv_%~Kp4xb-!7B5@8zqfrzz;J2CE!gtM0m<o>ctfxr72uFH4C~3Vj05f2#my+fq
zr`&#mL!<SQ5$UmjS9fpiM8r}lQ#7yT(jw=4?ka>#F?+t|&r<vCE#zAEV3~x;eBdPx
z4wyW_Gm+Yru5U9HZ`jv31-zhYSM*L3TXt7O$Y4i0zHep&z^>+V;C}>s`vK)%4+qe-
z0|iSQpm|4M12DT5^NF1ViGm&6s!0nh@vxZIeo4T7Tdf=ufM4Y5uXokBb|Uy<i(=_g
zQ_}BMfcsSuyCZJXeLKLpP_il1{(<om3Zq#bOR<85!AIj9P?*}atO5GiWfbsh(ib<d
zg7;6Bv2S)tZN3$Xc3YoqnKc^G1s0cw6TY#V4b9Pd^*7OJDfOpY%4+hpx|&`2xhXZ^
z*Cn43KM!IaQ;sS!e=hkv2A5$&BNRmQ=IAJQL*j1($S#9i$tc$GQZS6cr}cIidw*BS
zi47bE3l)8rMg!|5LqP5iZMKtEH!wZ;`9i}m>t)I}GgUQOd@2|&`@V*Ya<o)QDuoK>
zH^F8$z?rp_Rl@AGXM8x-ZzMa)$Cmw{48k){81^<SXRXr3RQ7(6q)#?F5^hk=M}xoW
zL`HB5aP+&ODI(99aksX7TpE3NZ&{4|jfae8Phu7V@7OfX3Mg;PjSOEn^y*$<B<HvS
z=y>&Q#|r~MCrB;<01Wsd3H+|OO<GR?{^y%Hd<b+)KUWLeh6LXYfOD317G-9hG5-c3
C04a9>

literal 0
HcmV?d00001

diff --git a/vignettes/isa.Rmd b/vignettes/isa.Rmd
index bbc756e..9f787b4 100644
--- a/vignettes/isa.Rmd
+++ b/vignettes/isa.Rmd
@@ -17,7 +17,7 @@ Tutorial contributed by [Olivier Delmarcelle](mailto:delmarcelle.olivier@gmail.c
 
 The **`sentometrics`** package introduces simple functions to quickly compute the sentiment of texts within a corpus. This easy-to-use approach does not prevent more advanced analysis, and the **`sentometrics`** functions remain a solid choice for cutting-edge research. This tutorial will present how to go beyond the basic **`sentometrics`** settings in order to analyse the intratextual sentiment structure of texts.
 
-### Intratextual Sentiment Structure
+### Intratextual sentiment structure
 
 Does the position of positive and negative words within a text matter? That's a question investigated by [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219) during their research regarding sentiment implied by CEO letters. Based on a large dataset of letters, they analyze how sentiment-bearing words are positioned within the text. They find that CEOs tend to emphasize sentiment at the beginning and the end of their letter, in the hopes of leaving a positive impression to the reader.
 
@@ -30,7 +30,7 @@ One can wonder whether other types of texts follow a similar structure? Indeed,
 As part of this tutorial, you will learn how to:
 
 * Decompose your texts into *bins* (equal-sized containers of words) or sentences.
-* Compute sentiments with a variety of weighting schemes.
+* Compute sentiment with a variety of weighting schemes.
 * Create and use your own weighting scheme for a classification task.
 
 ## Preparation
@@ -55,7 +55,7 @@ table(usnews2$s)
 The variable `s` indicates whether the news is more positive or negative, based on an expert's opinion. We are going to try to predict this value at the end of the tutorial.
 
 We can already prepare a `sento_corpus` and a `sento_lexicon` for our future sentiment computation. 
-For the `sento_corpus`, we will also create a `dummyFeature` filled with 1's. Since sentiment computations are multiplied by the features of a `sento_corpus`, we want this dummy feature to observe the whole corpus's sentiments. This `dummyFeature` is created by default whenever there's no feature at the creation of the `sento_corpus`.
+For the `sento_corpus`, we will also create a `dummyFeature` filled with 1's. Since sentiment computations are multiplied by the features of a `sento_corpus`, we want this dummy feature to observe the whole corpus's sentiment. This `dummyFeature` is created by default whenever there's no feature at the creation of the `sento_corpus`.
 
 Finally, we remove the feature `s` from the `sento_corpus`, as we do not need it for sentiment computation.
 
@@ -66,7 +66,7 @@ usnews2Sento <- add_features(usnews2Sento, data.frame(dummyFeature = rep(1, leng
 docvars(usnews2Sento, "s") <- NULL # R-removing the feature
 ```
 
-We will use a single lexicon for this analysis, the combined Jockers & Rinker lexicon, obtained from the **`lexicon`** package. However, we will prepare a second and different version of this lexicon where the sentiments assigned to words are all positive, regardless of their original signs. This second lexicon will be useful to better detect the sentiment intensity conveyed.
+We will use a single lexicon for this analysis, the combined Jockers & Rinker lexicon, obtained from the **`lexicon`** package. However, we will prepare a second and different version of this lexicon where the sentiment assigned to words are all positive, regardless of their original signs. This second lexicon will be useful to better detect the sentiment intensity conveyed.
 
 We used the `data.table` operator `[]` to create the second lexicon in a very efficient way. Most **`sentometrics`** objects are based on `data.table` and this allows to perform complex data transformations. If this is the first time you are seeing the `data.table` way of using `[]`, we recommend you to have a look at their [Introduction vignette](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) and enjoy this powerful tool!
 
@@ -82,18 +82,18 @@ lapply(sentoLexicon, head)
 
 `compute_sentiment()` is at the base of sentiment analysis with **`sentometrics`**. That's also the function we are going to use to analyse intratextual sentiment. This requires, however, to play with the most advanced features of the function. Before doing that, let us review the different computation settings to really understand what's going on.
 
-### Default computation - from words to document sentiments
+### Default computation - from words to document sentiment
 
-When using the default settings (i.e., only specifying the `how` argument), the sentiment for each word within a text will be determined according to the provided lexicons. These word sentiments are then aggregated using the method defined by the `how` argument, aggregating up to the document level to form a sentiment value for the document.
+When using the default settings (i.e., only specifying the `how` argument), the sentiment for each word within a text will be determined according to the provided lexicons. These word sentiment are then aggregated using the method defined by the `how` argument, aggregating up to the document level to form a sentiment value for the document.
 
 ```{r}
 sentiment <- compute_sentiment(usnews2Sento, sentoLexicon, how = "proportional")
 head(sentiment)
 ```
 
-In this case, the `how = "proportional"` simply sum words' sentiments then divide it by the number of words in a document. The different settings for `how` can be accessed using the `get_hows()` function. We are going to present the use of a more complex setting at the end of this tutorial.
+In this case, the `how = "proportional"` simply sum words' sentiment then divide it by the number of words in a document. The different settings for `how` can be accessed using the `get_hows()` function. We are going to present the use of a more complex setting at the end of this tutorial.
 
-### Setting `do.sentence = TRUE` - from words to sentences sentiments
+### Setting `do.sentence = TRUE` - from words to sentences sentiment
 
 A drastic change in the behaviour of `compute_sentiment()` can be induced by specifying `do.sentence = TRUE` in the function call. If true, the output of `compute_sentiment` will no longer return a sentiment value for each document, but each sentence. Sentiment values within each sentence are still computed using the method provided in the `how` argument, but the function stops there.
 
@@ -102,11 +102,11 @@ sentiment <- compute_sentiment(usnews2Sento, sentoLexicon, how = "proportional",
 head(sentiment)
 ```
 
-The new column `sentence_id` in the output is used to identify the sentences of a single document. This result can be used as-is for analysis at the sentence level, or sentences sentiments can be aggregated to obtain documents sentiments, as in the default setting. One way to aggregate sentences sentiments up to documents sentiments is to use the `aggregate()` method of **`sentometrics`**.
+The new column `sentence_id` in the output is used to identify the sentences of a single document. This result can be used as-is for analysis at the sentence level, or sentences sentiment can be aggregated to obtain documents sentiment, as in the default setting. One way to aggregate sentences sentiment up to documents sentiment is to use the `aggregate()` method of **`sentometrics`**.
 
 ### Trick with *bins* in a list, `do.sentence` and `tokens`
 
-Analyzing the sentiment of individual sentences is already a nice approach to observe intra-document sentiment, but sometimes it is better to define a custom container for which sentiments are going to be computed. This is the approach used by [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219), where they define *bins*, equal-sized containers of texts. The idea is to divide a document into equal-sized portion and to analyze each of them independently. Let's say we decide to split a document of 200 words into 10 *bins*. To do so, we are going to store the first 20 words in the first *bin*, the words 21 to 40 in the second *bin*, and so on... This way, each *bin* will account for 10% of the text. By repeating the procedure for all texts of a corpus, we can easily compare specific text portions (e.g., the first 10%) between multiples documents.
+Analyzing the sentiment of individual sentences is already a nice approach to observe intra-document sentiment, but sometimes it is better to define a custom container for which sentiment are going to be computed. This is the approach used by [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219), where they define *bins*, equal-sized containers of texts. The idea is to divide a document into equal-sized portion and to analyse each of them independently. Let's say we decide to split a document of 200 words into 10 *bins*. To do so, we are going to store the first 20 words in the first *bin*, the words 21 to 40 in the second *bin*, and so on... This way, each *bin* will account for 10% of the text. By repeating the procedure for all texts of a corpus, we can easily compare specific text portions (e.g., the first 10%) between multiples documents.
 
 Let's split our documents into sets of *bins*. The first step is to obtain a vector of characters for each document. This is done easily with the `tokens` function from the **`quanteda`** (remember that **`sentometrics`** objects are also based on **`quanteda`**, letting us free to use most functions from this package).
 
@@ -154,13 +154,13 @@ head(sentiment)
 
 In this case, the `sentence_id` simply refers to the number of the *bin*. Let's now see what we can do with the *bins* we just computed.
 
-## Exposing Intratextual Sentiment Structure with *bins*
+## Exposing intratextual sentiment structure with *bins*
 
-In their analysis of CEO letters, [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219) identified an intratextual sentiment structure: CEOs would deliberately emphasize sentiments at the beginning and end of the letter, and pay attention to leave out a positive message and the end. Our dataset of news articles is radically different from these letters so we don't expect to find a similar structure. However, based on our knowledge of news, we can formulate a hypothesis: news articles tend to use strong sentiments in their headlines to attract readers' eyes. Let's investigate this using our *bins*!
+In their analysis of CEO letters, [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219) identified an intratextual sentiment structure: CEOs would deliberately emphasize sentiment at the beginning and end of the letter, and pay attention to leave out a positive message and the end. Our dataset of news articles is radically different from these letters so we don't expect to find a similar structure. However, based on our knowledge of news, we can formulate a hypothesis: news articles tend to use strong sentiment in their headlines to attract readers' eyes. Let's investigate this using our *bins*!
 
 ### Absolute sentiment
 
-We expect that the first *bin* in each article presents on average more sentiment than in the rest of the text. Since news can either be positive or negative, it will easier to identify sentiment intensity using the absolute value lexicon prepared earlier. This way, we avoid the cancelling effect between positive and negative sentiments. Simply plotting the mean sentiment values for each *bin* across documents can give us some insight on the intratextual structure. Once again, we rely on `data.table`'s `[]` operator to easily group sentiment values per `sentence_id` (remember, these represent the *bin* number!). In addition to this, a boxplot can be useful to ensure that the mean sentiments are not driven by extreme outliers. 
+We expect that the first *bin* in each article presents on average more sentiment than in the rest of the text. Since news can either be positive or negative, it will easier to identify sentiment intensity using the absolute value lexicon prepared earlier. This way, we avoid the cancelling effect between positive and negative sentiment. Simply plotting the mean sentiment values for each *bin* across documents can give us some insight on the intratextual structure. Once again, we rely on `data.table`'s `[]` operator to easily group sentiment values per `sentence_id` (remember, these represent the *bin* number!). In addition to this, a boxplot can be useful to ensure that the mean sentiment are not driven by extreme outliers. 
 
 ```{r,fig.width = 12, fig.height = 5}
 par(mfrow = c(1, 2))
@@ -176,7 +176,7 @@ We can see that the first two *bins* of articles tend to show a larger absolute
 
 ### Herfindahl-Hirschman Index
 
-Another way to study the intratextual sentiment structure is to compute the Herfindahl-Hirschman Index across all documents. This is a popular index of concentration, mainly used in measuring competition between firms on a given market. A value close to 0 indicates large dispersion between *bins* while a value of 1 indicated that all sentiments are found in a single *bin*. The formula to compute the index of a single document is:
+Another way to study the intratextual sentiment structure is to compute the Herfindahl-Hirschman Index across all documents. This is a popular index of concentration, mainly used in measuring competition between firms on a given market. A value close to 0 indicates large dispersion between *bins* while a value of 1 indicated that all sentiment are found in a single *bin*. The formula to compute the index of a single document is:
 
 $$H = \sum_{b=1}^{B} s_b^2$$
 where $b$ are *bin* indexes and $s$ the proportion of the document sentiment found in a single *bin*.
@@ -189,7 +189,7 @@ herfindahl <- herfindahl[, .(h = sum(s^2)), by = id]
 mean(herfindahl$h)
 ```
 
-A result that shows there is concentration toward some *bins*! Note that this result is heavily dependent on the number of *bins* considered. Only index values computed with the same number of *bins* should be compared. Let's show the index's value if sentiments were uniformly positioned within the text: 
+A result that shows there is concentration toward some *bins*! Note that this result is heavily dependent on the number of *bins* considered. Only index values computed with the same number of *bins* should be compared. Let's show the index's value if sentiment were uniformly positioned within the text: 
 
 ```{r}
 x <- data.table(id = sentiment$id, s = rep(1, nrow(sentiment)))
@@ -201,15 +201,15 @@ mean(herfindahl$h)
 
 ## Computing sentiment with different weights
 
-The **`sentometrics`** comes with a lot of different weightings methods to compute sentiment and aggregate them into document sentiments or even time series. These weightings methods can be accessed with the functions `get_hows`.
+The **`sentometrics`** comes with a lot of different weightings methods to compute sentiment and aggregate them into document sentiment or even time series. These weightings methods can be accessed with the functions `get_hows`.
 
 ```{r}
 get_hows()
 ```
 
-So far, we've been using the `proportional` method from the `$words` set. The `$words` set contains the valid options for the `hows` argument of `compute_sentiment()`. The other two sets are used within the `aggregate()` function, to respectively aggregate sentences sentiment into documents or document sentiments into time series.
+So far, we've been using the `proportional` method from the `$words` set. The `$words` set contains the valid options for the `hows` argument of `compute_sentiment()`. The other two sets are used within the `aggregate()` function, to respectively aggregate sentences sentiment into documents or document sentiment into time series.
 
-With our earlier computation of sentiments using `do.sentences = TRUE`, we computed sentiments for sentences and *bins*. Now, for our next application, we need to aggregate these sentences and *bins* sentiments into documents sentiments. One option is to `aggregate()` using one of the methods shown above. Note the use of `do.full = FALSE` to stop the aggregation at the document level (otherwise, it would directly aggregate up to a time series).
+With our earlier computation of sentiment using `do.sentences = TRUE`, we computed sentiment for sentences and *bins*. Now, for our next application, we need to aggregate these sentences and *bins* sentiment into documents sentiment. One option is to `aggregate()` using one of the methods shown above. Note the use of `do.full = FALSE` to stop the aggregation at the document level (otherwise, it would directly aggregate up to a time series).
 
 ```{r message=FALSE}
 docsSentiment <- aggregate(sentiment, ctr_agg(howDocs = "equal_weight"), do.full = FALSE)
@@ -223,6 +223,7 @@ But as we have seen, some *bins* are more likely to present strong sentiment val
 This is exactly the situation where we would like to test a specific weighting scheme! Say that instead of giving 10% importance to each *bin* in the document sentiment computation, we would give only about 5% importance to the first one and share the rest between the remaining  *bins*. Sadly, **`sentometrics`** does not directly provide us with the tool for this kind of computation, we will need to create our weighting scheme and aggregate by hands. Luckily, the use of `data.table` makes these customisations painless.
 
 First, we define our customized weights for *bins*:
+
 ```{r}
 w <- rep(1 / (nBins - 0.5), nBins)
 w[1] <-  w[1] * 0.5
@@ -261,15 +262,15 @@ class(docsSentiment)
 
 Let's now put all of this in a concrete example. We've been using a modified dataset `usnews2` since the beginning because we wanted to have a variable identifying whether the document is positive or negative. Our goal is now to try to predict this value.
 
-To do so, we will consider 4 different approaches, in the form of four different weighting methods. We will study which weighting is the best to predict document's sentiments.
+To do so, we will consider 4 different approaches, in the form of four different weighting methods. We will study which weighting is the best to predict document's sentiment.
 The four weighting methods will be:
 
 * The default weighting based on word frequencies, regardless of the position.
 * A U-shaped weighting of words, where words at the beginning or end of the text are given more weights.
-* A sentence-weighting, where word sentiments are proportionally weighted up to a sentence sentiment level, then sentences are aggregated with an equal weighting to obtain the document sentiment.
-* The *bin* based approach, where word sentiments are proportionally weighted up to a *bin* sentiment level, then *bins* are aggregated with our custom weights: the first *bin* given half the weight and the other *bins* sharing the rest.
+* A sentence weighting, where word sentiment are proportionally weighted up to a sentence sentiment level, then sentences are aggregated with an equal weighting to obtain the document sentiment.
+* The *bin* based approach, where word sentiment are proportionally weighted up to a *bin* sentiment level, then *bins* are aggregated with our custom weights: the first *bin* given half the weight and the other *bins* sharing the rest.
 
-The U-shaped weighting is something we haven't seen before. This is a weighting method for words as per `get_words()` that gives more weight to the beginning and end of a text. Its exact formulation can be found at the end of the [Sentometrics vignette](https://doi.org/10.2139/ssrn.3067734), along with the other available weighting. This weighting scheme can be visualized as follows:
+The U-shaped weighting is something we haven't seen before. This is a weighting method for words, as we can learn from `get_hows()`. This scheme gives more weight to the beginning and end of a text. Its exact formulation can be found at the end of the [Sentometrics vignette](https://doi.org/10.2139/ssrn.3067734), along with the other available weightings. This weighting scheme can be visualized as follows:
 
 ```{r}
 Qd <- 200 # number of words in the documents
@@ -281,7 +282,7 @@ ushape <- ushape/sum(ushape)
 plot(ushape, type = 'l', ylab = "Weight", xlab = "Word position index", main = "U-shaped weight scheme")
 ```
 
-Let's compute sentiments with the four different weighting schemes. We will store the results in a list, `sentimentValues`.
+Let's compute sentiment with the four different weighting schemes. We will store the results in a list, `sentimentValues`.
 
 ```{r}
 sentimentValues <- list()
@@ -292,7 +293,7 @@ sentimentValues$sentences <- compute_sentiment(usnews2Sento, sentoLexicon, how =
 sentimentValues$bins <- compute_sentiment(usnews2Sento, sentoLexicon, tokens = usnews2Bins, how = "proportional",
                                           do.sentence = TRUE) 
 
-lapply(sentimentValues, head, n = 3)
+lapply(sentimentValues[c(1,3)], head, n = 3)
 ```
 
 Before going further, we need to aggregate the two last results to a document level sentiment measure. We are going to aggregate sentences using the `aggregate()` function while we will repeat the same operation as before to compute the *bins* aggregation with the custom weights. 
@@ -307,7 +308,7 @@ sentimentValues$bins <- sentimentValues$bins[, c(word_count = sum(word_count), l
 lapply(sentimentValues[3:4], head, n = 3)
 ```
 
-Finally, what remains to do is test our results against the variable `s` from `usnews2`. Since we know the number of positive and negative news in `s`, we can quickly and in a naive way measure the accuracy by ordering the documents by sentiment values.
+Finally, what remains to do is to test our results against the variable `s` from `usnews2`. Since we know the number of positive and negative news in `s`, we can quickly and in a naive way measure the accuracy by ordering the documents by sentiment values.
 
 ```{r}
 table(usnews2$s)
@@ -319,7 +320,8 @@ Let's start by adding the `s` variable to the existing measures by merging each
 
 ```{r}
 sentimentValues <- lapply(sentimentValues, function(x) merge.data.frame(x, usnews2[, c("id","s")]))
-lapply(sentimentValues, head, n = 3)
+
+head(sentimentValues$default)
 ```
 
 Since we used `merge.data.frame`, we need to convert the objects back to `data.table` and then we can order each of these tables.
@@ -329,7 +331,7 @@ sentimentValues <- lapply(sentimentValues, as.data.table) # converting back to d
 
 sentimentValues <- lapply(sentimentValues, function(x) x[order(`baseLex--dummyFeature`)]) # order based on the baseLex sentiment values
 
-lapply(sentimentValues, head, n = 3)
+head(sentimentValues$default)
 ```
 
 Finally, we compute the accuracy by counting the number of times the value of `s` is -1 in the first 605 documents and the number of time the value is 1 in the last 344 documents. We obtain a balanced accuracy measure by combining the true negative rate and the true positive rate.
@@ -340,12 +342,60 @@ index <- table(usnews2$s)[[1]]
 rates <- cbind(trueNegativeRate = sapply(sentimentValues, function(x){sum(x[1:index, s == -1]) / sum(x[, s == -1])}),
                truePositiveRate = sapply(sentimentValues, function(x){sum(x[(1 + index):nrow(x), s == 1]) / sum(x[, s == 1])}))
 
-cbind(rates, balancedAccuracy = (rates[,1] + rates[,2]) / 2 )
+cbind(rates, balancedAccuracy = (rates[, 1] + rates[, 2]) / 2 )
 ```
 
 In this case, the U-shaped weighting performs best but we can already see the improvement brought by our custom weights in comparison with the default settings. In a supervised learning setting, it can be useful to optimize a custom weights scheme on a training dataset. An example of such a model can be found in the paper of [Boudt & Thewissen, 2019](https://doi.org/10.1111/fima.12219), where *bins* weights are optimized to predict firm performance.
 
-That's the end of this tutorial. Want to go further? Have a try creating weird *bins*! They actually don't have to be of equal size, their specification is up to anyone. Also, keep in mind that we have only covered news articles in this tutorial, which is not representative of all type of texts, feel free to investigate how sentiments are positioned within different types of documents.
+## Hierarchical aggregation - *bins* of sentences
+
+As we learned through this tutorial, we can always define more complex methods to compute and aggregate sentiment. The reason why we use different aggregation levels such as *bins* or sentences is that looking at words does not capture the semantic structure of the text. The most appropriate way to compute sentiment should be through sentences, as sentences usually convey a single statement.
+
+Earlier, we implemented the *bins* approach by creating equal-sized containers of words. Each *bin* then contained a similar number of words. This naive split had the effect of cutting some sentences between two bins. From a semantic point of view, this not desirable. Hence, we're going to define here a new *bins* approach that respects sentences integrity: *bins* of sentences.
+
+This approach is similar to the previous one, but instead of dividing the texts into equal-sized containers of words, we are going to divide them into equal-sized containers of sentences. This means that each bin will contain approximately the same number of sentences.
+
+To implement it, we will need to play a bit with `data.table` operations to aggregate from sentences to *bins* of sentences. The first step is to compute sentence sentiment using `compute_sentiment()`. Then, we're going to add a column to the resulting sentiment object. This additional column will contain information about the future *bin* in which each sentence will be aggregated. This is a mapping from sentences to *bins* of sentences.
+
+The following operation creating `bin_id` is slightly complex. The best way to understand it is by following the logic from the most internal part of the script up to the final `apply()`. The innermost function here is `splitIndices`, which is used to split the `sentence_id` of each document in equal-sized vectors. The second level, the `sapply()` function, determines to which split vector belongs each `sentence_id` and returns boolean vectors for each. Finally, the last `apply()` call the function `which()` on each of these vectors, resulting in the correct *bin* indices. 
+
+```{r}
+sentiment <- compute_sentiment(usnews2Sento, sentoLexicon, how = "proportional", do.sentence = TRUE)
+nBins <- 5
+
+sentiment <- sentiment[, cbind(bin_id = apply(
+                                 sapply(parallel::splitIndices(max(sentence_id), nBins),
+                                        '%in%', x = sentence_id),
+                                 which,
+                                 MARGIN = 1
+                                 ),
+                               .SD), by = id]
+
+sentiment[id == 830981632, 1:6]
+```
+
+With this result, we can now use the new column `bin_id` for grouping. We cannot use the **`sentometrics`** functions here, as they are not built to take into account a `bin_id` column. Instead, we use a `data.table` operation similar to what we did to compute the *bins* aggregation with custom weights. This time, however, we will simply use the `mean()` function, meaning that each *bin* of sentences will contain the average sentiment value of the constituent sentences.
+
+```{r}
+sentiment <- sentiment[, c(word_count = sum(word_count), sentence_count = length(sentence_id), lapply(.SD, mean)),
+                                             by = .(id, date, bin_id),
+                                             .SDcols = tail(names(sentiment), -5)]
+head(sentiment[, 1:6])
+```
+
+Finally, we can re-create the graphs used for our initial analysis of the intratextual sentiment structure, but using *bins* of sentences. In this case, there's not much difference with the previous analysis. However, using *bins* of sentences paves the way to more complex and semantically accurate analyses.  
+
+```{r,fig.width = 12, fig.height = 5}
+par(mfrow = c(1, 2))
+
+plot(sentiment[, .(s = mean(`absoluteLex--dummyFeature`)), by = bin_id], type = "l",
+     ylab = "Mean absolute sentiment", xlab = "Bin of sentences")
+
+boxplot(sentiment$`absoluteLex--dummyFeature` ~ sentiment$bin_id, ylab = "Absolute sentiment", xlab = "Bin of sentences",
+        outline = FALSE, range = 0.5)
+```
+
+That's the end of this tutorial. Want to go further? Have a try creating more weird *bins*! They actually don't have to be of equal size, their specification is up to anyone. Also, keep in mind that we have only covered news articles in this tutorial, which is not representative of all type of texts, feel free to investigate how sentiment are positioned within different types of documents.
 
 ## Acknowledgements