Instructions and Examples:


The tool allows automatic alignment between parallel texts in the same language. Its purpose is to display various degrees of textual variants based on syntactic alignment.

The tool performs automatic syntax-based intra-language alignment. It performs automatic alignment of different versions of a text. Its concept is based on a modified version of the Needleman-Wunsch algorithm (for more information, see the Bibliography).

It also provides additional refinement criteria, which can be chosen by the user according to the degree of similarity between the texts and to the purpose of the alignment:

  • Ignore nonalphabetical: ignores symbols such as punctuation and numbers, anything that is not an alphabetical character.
  • Case sensitive: detects variation between words in different cases.
  • Ignore diacritics: ignores any type of diacritical character, including punctuation.
  • Levensthein distance:allows more tolerance on the alignment of similar words, based on a revised version of the Levensthein algorithm (for more information, see the Bibliography).

The website currently allows two types of workflow for the alignment:



Enter your text:

  1. This section allows alignment of single sentences. Copy and paste the sentences to align in plain format. The sentences should be grouped together, separated by one carriage return. One empty line should separate the groups of sentences to be aligned.

    Text 1 Sentence 1
    Text 2 Sentence 1
    Text 3 Sentence 1
    
    Text 1 Sentence 2
    Text 2 Sentence 2
    Text 3 Sentence 2
    
    Text 1 Sentence 3
    Text 2 Sentence 3
    Text 3 Sentence 3

    Example: two different versions of the Genesis

  2. If necessary, select additional criteria for the alignment by clicking on the checkbox below the text: you can select Ignore non-alphabetical characters, Case sensitive, Ignore diacritics, Levensthein distance metric.

  3. Click on "Align" for each group of sentences that you want to align. If something went wrong and you want to start again, click on Reset.
  4. After clicking on “Align”, the tool will display the aligned sentences. The texts will be aligned automatically, with the selected additional criteria highlighted in green on the top. In case no criteria are selected, they will appear grey.


File upload:

  1. Choose the file to upload.
    Important: the file can be either txt or csv, but it has to be structured in the following way: the sentences to align should be grouped together, separated by one carriage return. One empty line should separate the various groups of sentences to align.
    Text 1 Sentence 1
    Text 2 Sentence 1
    Text 3 Sentence 1
    
    Text 1 Sentence 2
    Text 2 Sentence 2
    Text 3 Sentence 2
    
    Text 1 Sentence 3
    Text 2 Sentence 3
    Text 3 Sentence 3
  2. Before uploading, select the desired criteria. You will also be able to go back and reset the instruction if desired.
    Check the box of the desired criteria: Ignore non-alphabetical characters, Case sensitive, Ignore diacritics, Levensthein distance metric.
  3. Click on the “Upload button”. The texts will be aligned automatically, with the selected alignment criteria highlighted in green on the top.


Color-Key:

The degree of alignment is displayed in different shades according to the type of match (see Examples 1-3) and to the choice of additional refinement criteria (see Example 4). A color-key is displayed under the aligned text. When there is no complete match, the longest common substring will also be displayed.
  • Example 1: Complete match. Completely aligned tokens are displayed in deep green


  • Example 2: not aligned tokens are shown in red. The longest common substring is displayed at the bottom.


  • Example 3: tokens aligned according to case are displayed in light green.


  • Example 4: Additional criteria applied, highlighted in light green on the top. Tokens aligned due to the application of an additional criterion are highlighted in blue-green (Levensthein distance).


  • Example 5: texts aligned with nonalphabetics removed and case sensitivity ignored. Aligned tokens in light green.


Note: The display of multiple sentences is currently different. Matching tokens are displayed in green, not aligned tokens are displayed in red. Tokens that match only in some of the given sentences are highlighted in light green.

The field below the aligned sentences shows the degree of matching. Moving the cursor over the single squares, the name of the single tokens appears.