Skip to content

[feature request] Allow setnames() to skip column names that don't exist #3030

@M-YD

Description

@M-YD

There have been a number of similar posts to this, but nothing that seems to directly address the issue that I am proposing and which I think would be a very useful update.

The issue I am having is that I have implemented setnames() as part of a large and complicated function (500 lines) that I have built and to be presented with an error after having spent quite some time waiting for it to execute is unhelpful and also annoying because it means that the time I had spent waiting for it to complete was for nothing.

What would have been really helpful in this scenario is an if/else-type condition when checking old columns and skipping them if they didn't exist.

Something along the lines of:

if(!(old %in% colnames)){ # Checks if old columns names do not exist in existing column names
    next # Skip to the next item in the list if TRUE
}

The reason I would check it with a ! (NOT) is because the very nature of setnames() checks to see if the names do exist; if so, it continues as normal because that is literally what it is designed for. Therefore, checking for existing names within setnames() is futile because that is what setnames() is doing in the first instance.

The trouble with the current setup of setnames() is that it works on the assumption that I know precisely which columns exist at any given point.

I don't.

I have a good idea, but it isn't always correct because there are literally dozens of potential scenarios that could occur based on numerous variations of how the data I am extracting are gathered.

Approximately 90.00%-95.00% of the time, the columns that I am working with are typically the same ones, but there are those odd occasions (the other 5.00%-10.00%) where this is not the case and all of a sudden, something breaks unexpectedly and I am left trying to figure out a new way to hack another version of this function together to handle this single use case just one time, which is neither a good thing to have to do nor good practice.

All of this could be avoided with a simple check inside setnames() to see if the columns exist or not.

Referring to columns by number might work in some cases, but I think it's neither here nor there because different people will need to refer to columns in different ways; names work best in some scenarios and numbers in others.

In my case, names work best because the columns aren't always in the same order and as such using column numbers could break my data frame and it's possible that I wouldn't know until the end.

Also, in my case if the column doesn't exist then it isn't a problem for me and I am happy to proceed onto the next name in the list. I'm sure that some people would like a warning notification if this happens, and I agree that it is a helpful feature to have, which is why I'm not advocating for it to be removed - not at all. Perhaps a more condensed version of the warning can appear on the fly, or even a summary at the end would be good to have.

An if/else-type condition like this would most certainly make things easier for me, at least, and I'm sure several others would agree.


Update:

@HughParsonage suggested an additional argument and this is precisely what I was thinking of.

Something along the lines of:

setnames(DT, old, new, skip_absent = FALSE)

It would include an additional argument (skip_absent) which is set to FALSE by default and which won't affect that the way that setnames() functions for anybody unless the skip_absent flag is specifically set to TRUE at the time of calling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions