Skip to content

benjavalero/replacer

Repository files navigation

Replacer

Replacer is an online tool whose purpose is to provide a straightforward interface to help fix the most common orthography or style errors in Wikipedia.

The tool was created to address the need to manually review certain corrections that are difficult to automate due to the existence of false positives.

Use Replacer responsibly. Take your time and review each replacement carefully in its context. There are many false positives; when in doubt, choose the option to save without changes.

It's available at https://replacer.toolforge.org

The tool is optimized to be used on mobiles, tablets, and bigger screens. It's possible that it doesn't work completely in old browsers, such as IE.

This README is based on the tool page in Spanish Wikipedia: https://es.wikipedia.org/wiki/Usuario:Benjavalero/Replacer

Technical details can be found in https://github.com/benjavalero/replacer/blob/master/technical-design.md

Frequently Asked Questions

  • What is the purpose of this tool? To help Wikipedia users, especially active contributors, perform certain relatively large-scale corrections that cannot be automated due to the presence of false positives and therefore require human review.

  • Who can use the tool? Anyone with an auto-confirmed user account. Edits are made in your name.

  • Where do the replacements come from? Most come from lists of simple and composed terms, which are reviewable and maintainable by the community. Other more elaborate replacements, such as date fixes, are generated by the tool itself based on the manual of style, and errors should be reported on the discussion page.

  • Why do I need special permissions to perform a custom replacement? Custom replacements were originally designed as a functionality to address uncommon or anecdotal corrections that do not warrant inclusion in the general listings, as they would create more noise than value. However, this feature was later abused for replacements with high match counts that remained outside the general listings and therefore outside community oversight for maintenance. To address this misuse, the tool was initially restricted to only allow custom replacements with a low number of matches. However, there was no community consensus on an appropriate threshold, which led to additional complaints in Wikipedia discussion spaces. Ultimately, to prevent abuse while maintaining functionality for legitimate cases, custom replacements are now available only to users with special permissions (verifiers, reversers, and bots). Regular users can still view the list of pages containing matches for a custom replacement and review them one by one.

  • How to prevent vandalism? The tool requests responsible use. Unfortunately, the author cannot be responsible for or review each edit. In any case, after years of use, incorrect edits are normally anecdotal. The tool provides sufficient context in most cases to determine whether or not to make a replacement and to distinguish it from a false positive, and there is always the option not to replace anything in case of doubt.

  • How to detect vandalism? Experienced users may sometimes find replacements that "sound like vandalism". In these cases, it is recommended to open and edit the page directly in the traditional way, including reverting what is appropriate.

  • Why am I seeing so many edits in my watchlist? The tool complies with the policy allowing a maximum of 5 edits per minute with no daily limit. All edits from the tool are tagged so they can be filtered to not appear in watchlists. Likewise, edits made by users with bot permissions will be tagged accordingly.

  • How to avoid cosmetic edits? There are style replacements, such as ordinal and coordinate corrections, that can only be performed by users with bot permissions or by regular users while reviewing (with or without changes) other replacements.

  • Couldn't the use be limited in some way? Of course, as long as that limitation complies with Wikipedia policies applicable to semi-automated tools.

Users

The first time you need to log in with an existing user account registered in Wikipedia.

All editions in Replacer are performed in the name of this user and therefore count as his/her contributions. Due to the importance of the changes performed by this tool, it is requested that users are at least auto-confirmed.

The description of the editions, available in the history of the pages, includes a reference to the use of the tool and to the replacements performed.

It is recommended not to perform a large number of edits in a short time to make it easier for Wikipedia users who monitor recent changes. The tool may occasionally limit the number of edits to comply with the bot policy.

Languages

Once logged in, the upper menu (on the right) displays the username and the language associated with the Wikipedia to work with, by default the Spanish Wikipedia (es).

Currently, there exists also the option of working with the Wikipedia in Galizian (gl).

Do not confuse this language with the one used in the common texts of the application: buttons, menus, etc. At this moment the application is not internationalized yet and the interface, whatever the chosen Wikipedia to work with, will be displayed in Spanish.

Edit Pages

The tool contains several sections which help to find pages with replacements:

  • Random Page. It opens the editor with a random page and all its potential replacements.
  • Replacement List. It shows a list with all the replacement types and the approximated amount of pages having replacements of these types. Each replacement type contains a link to open the editor with a random page and the occurrences of the chosen replacement.
  • Custom Replacement. It allows to introduce a replacement not existing in the tool. It opens the editor with a random page and the occurrences of the chosen replacement.

The purpose of this section is to make easier those corrections with few occurrences and thus not appearing in the general listings. The tool only allows the review of custom replacements to users with special permissions such as verifiers, reversers or bots. If a replacement has lots of occurrences it's recommended to include it in the general listings so that it can be reviewed by the community. In a few hours it will be indexed and available in the application.

Once Replacer finds a page to be reviewed, the title of the page is displayed at the top.

Next to the title is a link that allows you to edit the page in Wikipedia in the traditional way. If all the replacements to be reviewed are in the same section, then the link will open the edition of that specific section. Next to this link is another one that allows you to access the history of edits for that page.

Below, the potential replacements are displayed independently. Each replacement shows first a snippet of the text surrounding it, highlighting the replacement in red. Beneath the text, the different replacement options are listed, including the option for the original text which is selected by default.

In addition, a button allows you to edit the fragment of text around the replacement in case the given options are not sufficient.

Review for vandalism. Some orthographic or style errors may be symptoms of vandalism. In these cases, instead of editing the isolated replacement, it is recommended to edit the context from Replacer or to completely undo the vandal edit from the page history in Wikipedia.

After reviewing all the replacements, there are the following buttons:

  • The button Save changes will be active if there is any replacement to be performed. In this case there is also a badge with the amount of replacements. After clicking this button, the selected replacements will be applied immediately in Wikipedia.
  • The button Mark as reviewed (without changes) will always be active, even if there are replacements selected. On clicking this button, no modification will be performed in Wikipedia, and Replacer will mark the page internally as reviewed.
  • The button Review Later will temporarily skip the review of the page, offering it again later.
  • When reviewing replacements of a specific type (from the list or a custom replacement), if the page contains replacements of other types, an additional button will be shown: Show all replacements, which allows you to review all potential replacements detected on the page.

After clicking any of the review buttons, Replacer automatically will find the next page to be reviewed.

A page edited or reviewed without changes will not be offered again to be reviewed until the page has new relevant editions.

If none of the offered options is the right one, it's recommended to edit the page manually by using the link by the title, and save the page without changes.

Notice that orthography or style errors are usually signs of vandalism or a bad translation of the page or section. In this case, again it's recommended to edit the page manually and apply the action more convenient: revert the vandalism, rewrite the page or section, or add a warning template, like Copyedit.

Replacement Types

The replacement types are grouped in the following categories:

  • Orthography. Potentially misspelled words.

  • Composed. Expressions with more than one word that are potentially misspelled, or terms that do not fit in the Orthography category.

  • Style

    • Coordinates with incorrect format, for instance when quotation marks are used instead of the appropriate symbols. (Bot-only)
    • Date for dates with wrong format, for instance:
      • Dates with the month in uppercase: 2 de Septiembre de 2019
      • Dates with the day starting with zero: 02 de septiembre de 2019
      • Dates with the year containing a dot: 2 de septiembre de 2.019
      • Dates with missing prepositions: 2 de septiembre 2019
      • Dates with wrong order: Mayo 3, 2020
      • Note: In Spanish it's recommended the cultivated use of «septiembre» instead of «setiembre». Nevertheless, this replacement is only offered along with another fix in the same date. For instance the date «2 de setiembre de 2019» is not offered to be reviewed.
      • Dates in format of month and year are also offered to review when preceded by certain connectors: Desde Septiembre de 2019
      • See Manual of style for more details.
    • Degrees with incorrect format, for instance when the ordinal symbol is used instead of the degree symbol.
    • ó with accent for unnecessary uses of conjunction "ó" containing a diacritic (only in Spanish).
    • Ordinal for Spanish ordinals with wrong format, for instance or 1er. (Bot-only)
    • Century with no small caps helps to replace centuries with the appropriate template which displays the century Roman number with small caps.

False positives

There are certain parts of the content of page where most false positives usually appear. With the risk of letting some replacement without fixing, the tool ignores all the replacements contained in these parts:

  • Some XML tags and all the content within, even other tags, e.g. <code>An <span>example</span>.</code>
  • Template names, e.g. Bandera in {{Bandera|España}}
  • Some complete templates, even with nested templates, e.g. {{Cite|A cite}}
  • Template parameters, e.g. param in {{Template|param=value}}. For some specific parameters, we include in the result also the value, which is usually a taxonomy, a Commons category, etc. We also include the value if it seems like a file or a domain.
  • Text in cursive and bold, e.g. ''cursive''
  • Quoted text, e.g. «In Paris», "In Paris" or “In Paris”
  • URLs, e.g. https://www.google.es
  • XML tags, e.g. <span> or <br />
  • XML comments, e.g. <!-- A comment -->
  • Categories, e.g. [[Categoría:España]]
  • Links with suffix, e.g. [[brasil]]eño
  • The first part of aliased links, e.g. brasil in [[brasil|Brasil]]
  • Inter-language links, e.g. [[pt:Title]]
  • Filenames, e.g. xx.jpg in [[File:xx.jpg]]
  • Known expressions which are (almost) always false positives, extracted from the list in: https://es.wikipedia.org/wiki/Usuario:Benjavalero/FalsePositives (Galician version here). The list is refreshed every hour.
  • Some proper nouns which can also be common nouns. If they are preceded or followed by a word with uppercase then they are ignored. For instance, Julio in Julio Verne, or Domingo in Plácido Domingo.
  • Words in uppercase that are correct according to the punctuation context, e.g. Enero in {{Cite|date=Enero de 2020}}. The following punctuation contexts are considered:
    • After dot
    • Parameter values
    • Unordered and ordered list items
    • After an HTML tag like a reference or a table cell
    • Wiki-table cells
    • Starting a paragraph
    • Starting a header
  • Words in the page title
  • Table-related styles, i.e. lines starting with {| or |-
  • Some complete sections, e.g. Bibliografía

Cosmetic Changes

When the tool makes changes to a Wikipedia page, it also performs some cosmetic changes which usually have no effect on page visualization but improve the internal maintenance of the wikitext (see R9.1 in Wikipedia: Bot policy):

  • Links with the same link and alias, e.g. [[Coronavirus|coronavirus]] ⟹ [[coronavirus]]
  • Space links where the space is in lowercase, e.g. [[archivo:x.jpg]] ⟹ [[Archivo:x.jpg]]
  • Space links where the space is not translated, e.g. [[File:x.jpg]] ⟹ [[Archivo:x.jpg]]
  • Template DEFAULTSORT including special characters, e.g. {{ DEFAULTSORT : AES_Andes_2 }} ⟹ {{DEFAULTSORT:AES Andes 2}}
  • Categories containing unnecessary spaces, e.g. [[Categoría: Animal]] ⟹ [[Categoría:Animal]]
  • Unicode white-spaces, e.g. \u2002
  • Templates containing the useless template word, e.g. {{plantilla:DGRG}} ⟹ {{DGRG}}
  • Tags with no content, e.g. <div style="text-align: right; font-size: 85%;"></div>
  • List items ending with a break, e.g. * x <br> ⟹ * x
  • Headlines with the complete text in bold, e.g. == '''Asia''' ==
  • Headlines ending with a colon, e.g. == Asia: ==
  • Unnecessary small tag in sup or ref tags, e.g. <sup><small>2</small></sup> ⟹ <sup>2</sup>
  • Double small tags which make the text too tiny and less accessible, e.g. <small><small>Text</small></small> ⟹ <small>Text</small>
  • External links with double HTTP, e.g. https://https://www.linkedin.com ⟹ https://www.linkedin.com
  • Break with incorrect syntax, e.g. </br> ⟹ <br /> (see the explanation)

Some specific replacements with proven safety are also considered cosmetic changes, even though their type in general cannot be automated and should not be applied by a bot. For instance, some cases of centuries with no small caps.

If appropriate, these changes are communicated to the wikiproject Check Wikipedia to update its counters.

Indexation

Replacer conducts a weekly index of all Wikipedia pages to detect new replacements, as page content may change and replacement lists may be updated. The indexation uses monthly dumps from https://dumps.wikimedia.org/backup-index.html. However, not all pages are indexed:

  • Only namespaces Article and Annex are indexed. For example, user and discussion pages are ignored.
  • Redirection pages are ignored.
  • The tool also ignores pages containing certain templates, for instance those tagged to be deleted or containing many issues.

Source Code and Suggestions

The tool has been developed (and is maintained) by Benjavalero. The source code of the tool is available on GitHub.

Any suggestion or correction is welcome, both on the discussion page and on the GitHub issue tracker.

Project developed with IntelliJ IDEA logo. thanks to the JetBrains OpenSource support.

About

Straightforward tool to help fixing the most common errors in Spanish Wikipedia

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors