Skip to content

Drawing of the distribution curve #3

@ericmelse

Description

@ericmelse

Dear Ben,
Using violinplot, I observe an unwanted extension of the drawing of the distribution curve beyond the minimum and the maximum value of the measurement values. It is not possible to 'chop' that from the distribution line unless you are able (as an user) to delete the data points used to draw the line beyond the real data values. That is mostly likely not something you want me to do.
So, I will try to explain my point with the following example for which I use my own data. In this case the annual salary of 30.000 working professionals in The Netherlands. That data was log transformed, centered and bounded between 0 and 1 for this example. Github will not allow me to upload a dta file, so, I will email that to you.

The code to draw the distribution using tw kdensity is:
tw kdensity bd_zs_lnMsalYer21 , n(200) ysc(off) xsc(noex) ylab(none) ytit("") xtit("") legend(off) name(kden, replace)
which results in:
Distribution_line_manual_kden

The code to draw the distribution using violinplot is:
violinplot bd_zs_lnMsalYer21 , n(200) left nowhisk nobox nomed xtit("") ysc(off) xsc(noex) name(violin, replace)
which results in:
Distribution_line_violinplot

My 'problem' is that the distribution curve is now extended beyond the minimum and the maximum value of the measurement values.
I suppose that this is due to way the formulation works (but notice the alternative that joy_plot offers below).

The code to draw the distribution using joy_plot is:
joy_plot bd_zs_lnMsalYer21 , ysc(off) xsc(noex) fc(none) ytit("") xtit("") legend(off) cap("") name(joy, replace)
which results in:
Distribution_line_joy_plot
Fernando's joy_plot draws a 'closed' object by which the extremes are gradually thinned. I suppose that sort of signals that there are less and less data points in the distribution. Something comparable to your 'rags'. I am not asking you to do something simular although this might work very well when drawing split-violin plots as the upper and lower end of the distribution will be 'sharp'. You would have to read his ado file about how he implemented this (certainly you can write him too).
But, most wanted by me for now would be to have violinplot not draw a distribution line beyond the minimum and the maximum value of the measurement values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions