Skip to content

Queries

Alessio Campanelli edited this page Mar 30, 2026 · 1 revision

Fulgor supports two types of queries: pseudoalignment and k-mer conservation.

Pseudoalignment

Given a query string Q, a pseudoalignment query returns the set of colors (i.e. the references) containing all of its k-mers. If the threshold parameter -r tau is set, the resulting colors contain at least tau percent of the k-mers of Q.

Output format

ASCII

Warning

ASCII output can be very large. If your disk space is limited, consider using --format binary or --format compressed to reduce the memory footprint of the output.

The result of a single pseudoalignment query is a line containing the following tab-separated values:

query_id num_colors color_0 color_1 ... color_n

where colors are sorted in increasing order. query_id is the index of the query inside the .fastq file provided with the parameter -q. To retrieve its name, execute the bash command awk -v n=query_id 'NR == (n-1)*4+1' query_filename

Example

21      1       0
949     3       0       3       7
203     1       0
953     2       0       8
42      0

This means that:

  • the k-mers (all, or at least tau) of query 21 are found only in reference 0. The same is true for query 203.
  • the k-mers (all, or at least tau) of query 949 are found in references 0, 3, 7.
  • the k-mers (all, or at least tau) of query 953 are found in references 0, 8.
  • the k-mers (all, or at least tau) of query 42 were not found in any reference.

Binary

Binary output follows the same structure of the ASCII output. Every value is encoded as an unsigned 32-bit integer, removing the need for tabs and line breaks.

** Example ** The previous example would be encoded as follows, using hexadecimal representation.

00000015 00000001 00000000 000003b5 00000003 00000000 00000003 00000007
000000cb 00000001 00000000 000003b9 00000002 00000000 00000008 0000002a
00000000

Note that spaces and line breaks are used only to visually separate the values.

Compressed

The compressed formats starts with two unsigned 32-bit values, sparse_threshold and very_dense_threshold, which are used to later decompress the color lists. In particular:

  • if the number of colors in a result is strictly less than sparse_threshold, it is written as the elias-delta code representation of the gaps of consecutive values;
  • if the number of colors in a result is greater than or equal to very_dense_threshold, the complement result set (all integers not part of the result) is written as the elias-delta code representation of the gaps of consecutive values;
  • otherwise the color sets are represented as a binary string of length N (the total number of colors in the index), the ith bit is set to 1 iff i is part of the result.

TODO example

k-mer conservation

TODO

Output format

TODO

Clone this wiki locally