Skip to content

ID/DIV plots don't shown sequences for certain edge cases #15

@ressy

Description

@ressy

Hi Chaim,

I've found some edge cases where no tile is drawn in the ID/DIV plots even though there's a sequence there.

If I use this example id-div.tab file:

sequence_id v_gene    germ_div mab 
1           IGHV1-AFS 0        80  
2           IGHV1-AFS 5        85  
3           IGHV1-AFS 10       90  
4           IGHV1-AFS 15       95  
5           IGHV1-AFS 20       100

I then get tiles drawn for sequences 2 and 4, but not 1, 3, or 5:

id-div

1 and 5 are apparently just getting removed by ggplot's X/Y scaling. The docs say:

Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

(I guess they define the limits as an open interval but just don't say so?) If I nudge the limits a little or switch to an equivalent coord_cartesian in plot_all, it then shows those two points:

id-div2

Sequence 3 is due to a weirder issue. I think it's because the MASS::kde2d call the plot function uses splits the contribution of that one point perfectly between two bins in X and two bins in Y, so it smears it across four bins. Then the color scaling in scale_fill_gradientn misses it, since the maximum value around that region is just a tad too low to apparently count as one sequence's worth of stuff. If I tweak the .../2 on this line I can make the tile get drawn, but I'm not sure about the details of these variables:

b     <- (sum(g$z) / length(data$germ_div))/2

If I instead do .../5 it shows up:

id-div3

..but that throws off the definition of what counts as one sequence, so I'm not sure what the right approach would be.

Since I'm not familiar with these kernel density estimations here's a quick demo of what I'm trying to describe, with one observation on a 10x10 grid positioned either "just wrong" (split evenly across four bins) or "just right" (centered on one bin):

g <- MASS::kde2d(0, 0, h = 1, n = 10, lims = c(-1, 1, -1, 1))
pheatmap::pheatmap(g$z, cluster_cols = F, cluster_rows = F)
g <- MASS::kde2d(1/9, 1/9, h = 1, n = 10, lims = c(-1, 1, -1, 1))
pheatmap::pheatmap(g$z, cluster_cols = F, cluster_rows = F)

example1

example2

In practical terms this is less of a problem than that clipping issue above, though, since the individual point is still there and can be selected in the interactive plot (it gets drawn when the plot adds the individual sequences for the selected island). In contrast I don't think I can access sequences 1 and 5 during island selection unless the limits of the plot area are expanded a little.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions