Conversation
cgzones
left a comment
There was a problem hiding this comment.
I like the general idea.
Would it make sense to try to merge exe and cmdline, e.g for firefox I am getting:
/usr/lib/firefox/[green]firefox[green-end]|[purple]Web Content[purple_end]|/usr/lib/firefox/firefox -contentproc ...
In the above example firefox has modified its comm to 'Web Content'. Keeping the format 'exe|comm|cmdline' makes the merging of exe and cmdline impossible. On the other hand if one had coded a format 'exe|cmdline|comm' and cmdline is long (where merged with exe or modified and not merged), a modified (not merged) comm gets pushed to far right and maybe even out of the screen. As we desire exe to be the first column always, the current solution is a judicious compromise. UPDATE: A new setting (on by default) "Try to strip exe from cmdline, in merged Command" addresses this suggestion, by stripping the redundant exe from cmdline and displaying: |
|
I would like to thank @gnarendran for his patch. I've been using the patch for more than a year and I really like it. |
|
Thanks @marxin for the screenshot. There it is reasonable to expect relative pathname in cmdline to be merged with exe. i.e.,
/home/marxin/Programming/skia/out/Debug/dm│out/Debug/dm -V -W dm_output
to be presented as
/home/marxin/Programming/skia/out/Debug/dm -V -W dm_output
The latest commit does that.
|
|
Would it be possible to have exe, comm and cmdline be tree columns (with the exe column maybe just a basename) accompanied by an option for the cmdline column to (optionally) use the merging? |
|
@BenBE Firstly, regarding displaying only exe's basename, I want to point out the main motivation behind this pull request is to know the process's executable path (from the standpoint of security), as any process is able to arbitrarily modify (and obfuscate) its own comm and cmdline. But exe can't be obfuscated. So showing exe's full path upon demand (like when 'p' key is toggled), is a necessity. Regarding the columns:
Given that both exe and cmdline are of variable length (pathname any where from 1 to 4095, even just basename from 1 to 255), when set as separate columns, one column will push the other one out of the screen. In fact for pleasing view there can be only one variable length column and it has to be the last (like the Command column of current htop). This is one reason behind incorporating all of exe, comm, cmdline in the last column. The other reason is that anyway in the case of > 90% of the processes, the three get merged, with comm shown in separate color, so no information is lost and screen space is optimally used - for example please see the screenshot at the end - there htop had the CAP_SYS_PTRACE capability, and was run with 'Show program path' ('p' key) disabled.
|
|
I figured as much regarding the limit of the tree view. My intention with that suggestion was to have e.g. an abbreviated exe column (with e.g. first 20 chars of basename) followed by the merged exe/comm/cmdline column. This would allow to easily sort by executable regardless of the actual command line. Apart from this I like the merged view as is; was more like some additional thoughts to provide exe and comm as additional columns (abbreviated) so these are easier to spot on the screen. |
Thanks for the suggestions. Actually this pull request sorts by the first field displayed in the Command column - as exe is the first to be displayed, sort by exe is what this pull request does. If 'Show program path' (toggled by 'p' key) is set it displays/sorts by full path of exe, and if not it displays/sorts by the basename of exe. Update: Also relevant: |
|
@BenBE, @cgzones, @marxin, @Explorer09: I learnt that some users redirect htop output and script it, and they might be surprised by the new merged Command column. So for backwards compatilibity, have made the Command column display cmdline by default (as it is now), and merge only upon the new setting "Merge exe, comm and cmdline in Command" (toggled by the key 'm'). While at it, also added Exe (abbreviated basename) and Comm columns as suggested by @BenBE. The pull request description has been updated with complete details. |
|
Looks quite ok. |
@cgzones As I had done a sync (fetch upstream etc.) followed by my last commit, git log now shows commits from master in between my commits. Is there a recommended way to squash my commits alone? Update: git rebase etc., seemed hard in this case, so this is what I did locally: a) Made a patch htop-dev-pull-42-patch.txt of my commits b) git reset --hard to revert all my commits c)git pull htop-dev d) apply the patch e) git commit f) git push --force. BTW, thanks for the review.
Yes this is intentional. We only do a simple suffix/prefix match to merge. Handling ./htop etc., would require proper canonicalization (like handling also .///htop ../../projects/htop/./htop etc.,) - a bit out of scope. |
c9fad7d to
5c85d0a
Compare
|
The push has updates to man page documenting the 'm' key and Exe, Comm columns, and coloring for thread/process basename in Exe column with kthread identification - just for completion. |
BenBE
left a comment
There was a problem hiding this comment.
Patch LGTM.
Wondering if one could simplify overall affairs if you cached comm, exe and cmdline in the process struct and for exe and cmdline also stored the basename offset. With these information merging cmdline/exe is basically 3 string compares: a) does exe match argv[0], b) does exe+exebase match argv[0]and c) doesexe+exebasematchargv[0]+argv0base. Once that's found you're left with matching commagainstexe+exebase`. Unless I'm missing something here.
The multitude of loops in the merging routines is kinda confusing to read.
| if ((fd = open(filename, O_RDONLY)) != -1 && | ||
| (amtRead = xread(fd, command, sizeof(command) - 1)) > 0) { | ||
| close(fd); | ||
| command[amtRead - 1] = 0; |
There was a problem hiding this comment.
Is the last char read always one we want to discard? Like \n?
There was a problem hiding this comment.
- Regarding caching: LinuxProcessList_readCmdlineFile() indeed stores, in structure LinuxProcess: procComm, procExe, procExeBasenameOffset, procCmdlineBasenameOffset (argv0base in your comment); and in contained structure Process: cmdline (in a member called comm - this is the preexisting name, though confusing in the present context). So is your question why not move the procComm etc. from LinuxProcess, to the Process structure? That is because the new members are Platform (Linux) dependent. But if the question is why we re-read the members, it is because comm, cmdline can be changed anytime by the process, and even exe is changed by kernel when the process execve's (even while the pid remains the same.)
- The processing of cmdline is actually tricky. Normally processes store the arguments there separated by NUL (which are translated to '\n' when forming the cmdline string). But there are processes like chrome that store the entire argument vector separated by spaces (fun fact: in the upstream htop if you set hightlight basename, chrome's entire cmdline is highlighted!). Still other processes modify cmdline arbitrarily. In other words, the end of argv[0] is not readily/reliably obtainable from cmdline. So procCmdlineBasenameOffset in that argv[0] is also unreliable. I have left an example in the comments of matchCmdlinePrefixWithExeSuffix() "LinuxProcess.c:252" on how htop could get procCmdlineBasenameOffset wrong.
- Regarding merging in matchCmdlinePrefixWithExeSuffix(): Case a) is simple enough - it applies when cmdline[0] == '/' (absolute path, since exe always begins with '/'). b) is just a special case of c with procCmdlineBasenameOffset == 0. c) is a special case of d) that follows: Due to the unreliability of procCmdlineBasenameOffset, upon a match failure we have to search the cmdline backwards to see if there is another potential procCmdlineBasenameOffset - this is the outer iteration. d) This is the general case we wish to handle - cmdline prefix (not just the basename) is a suffix of exe. For example exe == '/A1/B2/C3/E4' and cmdine = 'C3/E4 ARG1 ARG2' and we want merge to succeed. In this case we match E4 (procCmdlineBasenameOffset) and reverse match upto C3, upto a valid relative path in exe (i.e. cmdline == '3/E4' should not match). This is the inner iteration, and it covers c) as well.
Is the last char read always one we want to discard? Like
\n?
No this is just me attempting to be safe. It is true that when reading /proc/[pie]/comm we can expect that comm will be already NUL terminated.
There was a problem hiding this comment.
- LinuxProcess_writeCommand() where comm is matched first with exe's basename and then located in cmdline (if findCommInCmdline is set), is a bit complex mainly because of the various combination of cases to be considered: coloring the three components while showing all three of them, merging two of them or merging all of them; showing or not showing full path, highlighting basename etc.,
There was a problem hiding this comment.
With caching of those three values I mostly referred to "per refresh cycle". Didn't also take your case d) (cmd: C3/E4 args…) into account. My questions were based on a much more focussed view on primarily highlighting the basename and trying to match based on that.
Possibly related is the display of /snap/chromium/1298/usr/lib/chromium-browser/chrome being completely marked as "basename" without that patch, although for some other chrome processes skipping of the path AND proper highlighting of the basename works correctly. Have to test again with this patch, if the issue persists.
There was a problem hiding this comment.
Possibly related is the display of /snap/chromium/1298/usr/lib/chromium-browser/chrome being completely marked as "basename" without that patch, although for some other chrome processes skipping of the path AND proper highlighting of the basename works correctly. Have to test again with this patch, if the issue persists.
But do note that without the setting 'Merge exe, comm and cmdline in Command' (toggled by 'm' key), even this pull-request falls back to old behavior of htop of displaying cmdline - then you would see the issue. With the setting (and with exe readable) of course this issue will not be seen as then basename is that of the exe, and reliably known.
There was a problem hiding this comment.
Maybe have another PR for this after this is merged, that improves on extracting the correct basename in these cases. Based on some simple heuristic: If cmdline is argv[0] only assume separation by spaces instead. The exe is then assumed to be the first part before a space; from which basename is extracted.
There was a problem hiding this comment.
But identifying the basename on just space delimiter will mishandle cases where a directory name has spaces, like
cmdline == '/A1/B 2/E3 ARG1 ARG2'. Just saying there is no heuristics that can always correctly identify basename based upon cmdline alone, given the state of affairs that argv[0] is not always delimited by NUL in /proc/[pid]/cmdline and in fact can be modified arbitrarily. I do think the old basename identification heuristics in htop is reasonable (best effort possible) if it has only cmdline to work with. This is one of the reason for merging with the reliable exe (the other, main, reason being getting to know the path of the exe reliably).
There was a problem hiding this comment.
As it stands currently half of the processes started from chrome show chrome marked proerly as their basename and the others do not, which is kinda irritating.
There was a problem hiding this comment.
Are you referring to the the upstream htop that uses only cmdline for Command column (same as the unmerged Command that uses only cmdline in this pull request)? Yes, this is a known limitation with only using cmdline that I had already described. But the merged Command (toggle using 'm') in this pull request does fix this issue properly by using exe for basename - as I said this is one of the motivations of this pull request. Please see chrome in below screenshot of this pull request, with the setting "Merge exe, comm and cmdline in Command" ('m' toggle). However, please clarify if you mean some other issue with this pull request itself.
Or is the confusion due to the setting ("Merge exe, comm and cmdline in Command") being false by default, and you suggest it to be true by default?
Or may htop wasn't given capability sudo setcap 'cap_sys_ptrace=ep' htop to read the exe of other user's processes etc. and falling back to cmdline (unmerged Command) for those processes?
5c85d0a to
c60a3d6
Compare
|
In the light of pull request #67, I have refactored LinuxProcess_writeCommand() into LinuxProcess_makeCommandStr() (which makes the merged command string deviod of newlines and ready to be displayed) and LinuxProcess_writeCommand() which colorizes suitably. LinuxProcess_getCommandStr() returns the displayed string, the whole (all fields) of which may displayed/searched/sorted/filtered, whether or not 'Show merged Command' is set. Also added color differentiation between comm of process and thread. |
c60a3d6 to
d9dd53a
Compare
|
When compared with upstream htop-dev, the previous commit had a subtle change: When a process being tracked by htop became a zombie, that commit discarded the invalid exe/cmdline of the zombie, treating it just like a zombie that was already present when htop started (which is to only display its comm obtained from the stat file). But the upstream htop continues to So in the latest commit, the handling of 'zombie under watch' is brought in line with upstream htop. Also made optimizations like not regenerating a Command string, unless its state - consisting of the relevant settings (showProgramPath, showMergedCommand, findCommInCmdline), and the processs' cmdline, comm and exe - has changed. |
|
This pull request introduces 2 alerts when merging d9dd53a into 5ea13e7 - view on LGTM.com new alerts:
|
BenBE
left a comment
There was a problem hiding this comment.
2 potential buffer overflow issues, some code style issues.
linux/LinuxProcessList.c
Outdated
| process->basenameOffset = 0; | ||
| if (!process->comm) | ||
| process->basenameOffset = 0; | ||
| else |
There was a problem hiding this comment.
Please use braces for this if-statement as this is hard to read. Cf. discussion in #158.
There was a problem hiding this comment.
Ok, will change.
| for (int i = 0; i < amtRead; i++) { | ||
| if (command[i] == '\0' || command[i] == '\n') { | ||
| if (tokenEnd == 0) { | ||
| /* newline used as delimiter - when forming the mergedCommand, newline is | ||
| * converted to space by LinuxProcess_makeCommandStr */ | ||
| if (command[i] == '\0') | ||
| command[i] = '\n'; | ||
|
|
||
| if (command[i] == '\n') { | ||
| if (tokenEnd == 0) | ||
| tokenEnd = i; | ||
| } | ||
| command[i] = ' '; | ||
| } else { | ||
| /* htop considers the next character after the last / that is before | ||
| * basenameOffset, as the start of the basename in cmdline - see | ||
| * Process_writeCommand */ | ||
| if (!tokenEnd && command[i] == '/') | ||
| tokenStart = i + 1; | ||
| lastChar = i; | ||
| } |
There was a problem hiding this comment.
Please use braces, even where they are optional. Readability of the source will be better.
There was a problem hiding this comment.
ok, will change.
linux/LinuxProcess.c
Outdated
| while (*++token == '\n') | ||
| ; |
There was a problem hiding this comment.
Avoid the empty loop body. Better do { token++; } while('\n' == *token);.
Also the missing buffer length guarding against accidental reads behind the end of the buffer somewhat irritates me (Overall missing in this routine).
There was a problem hiding this comment.
Standard idiom in string processing, but still will address to conform to htop coding style.
linux/LinuxProcess.c
Outdated
| if (cmdline[0] == '/') { | ||
| matchLen = exeBaseLen + exeBaseOffset; | ||
| if (strncmp(cmdline, exe, matchLen) == 0 && | ||
| ((delim = cmdline[matchLen]) == 0 || delim == '\n' || delim == ' ')) |
There was a problem hiding this comment.
The indentation here is 3 spaces just like other code in htop. If the reference is to the conditional's indent it is just standard vim indentation - seems no change is required here.
| do { | ||
| /* match basename */ | ||
| matchLen = exeBaseLen + cmdlineBaseOffset; | ||
| if (cmdlineBaseOffset < exeBaseOffset && |
There was a problem hiding this comment.
Could <= work here too for the offset comparison?
There was a problem hiding this comment.
No, <= not needed, since cmdline[0] == '/' was already taken care off, cmdlineBaseOffset must be strictly < exeBaseOffset (as exe is guaranteed to start with '/').
There was a problem hiding this comment.
k, was just unsure about the rationale.
linux/LinuxProcess.c
Outdated
| */ | ||
| void LinuxProcess_makeCommandStr(Process* this) { | ||
| LinuxProcess *lp = (LinuxProcess *)this; | ||
| bool showMergedCommand = this->settings->showMergedCommand, showProgramPath = this->settings->showProgramPath, |
There was a problem hiding this comment.
Avoid declaring multiple variables in one line …
There was a problem hiding this comment.
Ok, will change.
linux/LinuxProcess.c
Outdated
| return; | ||
| } | ||
|
|
||
| int commStart = 0, commEnd = 0, exeBasenameOffset = lp->procExeBasenameOffset, |
There was a problem hiding this comment.
Avoid multiple variables in one line.
There was a problem hiding this comment.
Ok, will change.
linux/LinuxProcess.c
Outdated
| for (int i = 0; i < 32; i++) | ||
| if (indent & (1U << i)) | ||
| maxIndent = i+1; | ||
| for (int i = 0; i < maxIndent - 1; i++) { | ||
| int written; | ||
| if (indent & (1 << i)) | ||
| written = snprintf(buf, n, "%s ", CRT_treeStr[TREE_STR_VERT]); | ||
| else | ||
| written = snprintf(buf, n, " "); | ||
| buf += written; | ||
| n -= written; | ||
| } |
There was a problem hiding this comment.
Indentation issue. Also use braces, even where optional.
There was a problem hiding this comment.
As mentioned in the comment, this code is from preexisting Process_writeField. However, will change.
linux/LinuxProcess.c
Outdated
| written = snprintf(buf, n, "%s ", CRT_treeStr[TREE_STR_VERT]); | ||
| else | ||
| written = snprintf(buf, n, " "); |
There was a problem hiding this comment.
Use checked snprintf (currently xSnprintf in XAlloc.h; moved by #161 to String_snprintf in StringUtils.h) to avoid possible buffer overflow noted by LGTM checker.
There was a problem hiding this comment.
As mentioned in the comment, this code is from preexisting Process_writeField. However, will change if that is the coding standard.
UPDATE: You are right that the new LGTM doesn't like snprintf. So used stpcpy instead just so avoid the warning.
UPDATE2: The confusion arose because the Process_writeField been used as basis for the tree formatting above was before the snprintf (LGTM) fixes. So I was wondering how code already in the upstream was flagged. But I now see that the Process_writeField has also been updated with correct snprintf usage. Still I will stick with the rewrite with stpcpy as it is simpler.
There was a problem hiding this comment.
I'm fine either way as long as the pitfalls are avoided.
The /proc/[pid]/exe shows as '(deleted)' whenever the executable of the process currently running gets deleted/replaced. We could manually see this with
With non-merged Command (only cmdline is displayed), this pull request has the identical behavior as the old htop as per the code. I also verified it visually, by running old htop and this pull's htop (non-merged setting) in GNU screens and toggling between the screens. This is true whether or not show full path setting is on. Please do let me know if what you see differs. On the other hand, I agree that the old behavior itself, of highlighting the whole cmdline starting from last / in cases such as chrome is not satisfactory - but I do not know how it can be improved when only using cmdline - for example, assuming space as basename delimiter would break cases where basename legitimately has a space in it, and someone else might think this pull-req broke their case. So I suggest leaving the old behavior for non-merged Command as it is, and if any improvement could be thought of there, taking it up as a separate PR.
The thing is most of the nice colors were already taken - cyan for basename highlighting. Magenta was only the somewhat decent color left. And it had to be different from the non-merged Command to show that merging has taken place. The desire to keep the non-merged case completely backwards compatible leaves only this choice I think. |
|
Thanks a lot for the examples. I installed keybase just to see what is happening: Here is the differing case, with redundant information removed: Old htop: highlight starts from first '/opt' ; New htop: highlight starts after the last / 'preload-main.bundle.js' (this is not visible in your image, but if we move to the right we see it). In this case a bug in the Old Process_writeCommand() manifests:The old code intends to search for the last slash just like the new htop, but when it encounters a ':' in v8_snapshot_data:100 suddenly discards the search result, resulting in the highlight starting from the first /opt.
I am surely open to changing the colors as long everyone will be fine with it. I can make a new commit if you could please suggest the choices (Ref: CRT.c) for PROCESS_COMM, PROCESS_THREAD_COMM, PROCESS_BASENAME, PROCESS_THREAD_BASENAME, for the color schemes COLORSCHEME_DEFAULT, COLORSCHEME_MONOCHROME, COLORSCHEME_BLACKONWHITE, COLORSCHEME_LIGHTTERMINAL, COLORSCHEME_MIDNIGHT, COLORSCHEME_BLACKNIGHT. Edit: But please do try out the color choices before suggesting as not all combinations may be pleasing to the eye. |
|
May I please ping this again.. |
There's still unresolved bugs in this (highlighting), see comment #42 (comment) . Help fix em? |
Actually that comment explains that no new "bug" has been introduced and the difference we see between the two images is in fact a fix/improvement over the pre-existing real bug (highlighting the entire cmdline when a ":" is present in the cmdline). However to reiterate the pre-existing limitation in old htop of highlighting from last '/' when working only with non-delimited cmdline is still present (in the non-merged view of this pull-req) and I don't think it will be appropriate to try to introduce new heuristics for that in this pull request, as that will break backwards compatibility in many cases. Best that it should be attempted, if at all, as a separate PR I think. Regards. |
No, it's fine to fix this as it becomes more relevant with your merged command lines. This PR changes 600 lines, 5 more won't matter. |
The point is it is impossible to come up with good heuristics for non-merged cmdline that don't have delimiters (like in the case of chrome and keybase) and that is the reason for the old htop's behavior of highlighting from last slash. It was not fixed all these years not due to oversight, but it is a limitation that is practically impossible to overcome satisfactorily (because practically any character other than / can form a valid basename). Having said that, let us not lose sight of the fact that in this pull request the user can simply use the merged Command ('m' toggle) to get perfect highlighting. It is only in the case of non-merged cmdline (which is provided for backwards compatibiity) this pull request highlights from the last slash like the old htop. |
It is fine as long as you do not merge, not optimal, but consistent. But now look at the screenshots above again, some |
That is not true, in fact all are highlighted from the last slash, consistently. Just that in the case you mentioned the last slash is not seen in the image (it is far to the right.) This is the way it is meant to be - in this case old htop highlighted the whole cmdline (which is incorrect) due to the bug (as I explained) that it suddenly aborts the last slash only whenever it finds a ':' Anyway, here is a suggestion - If someone wishes to rewrite the heuristics in Process_writeCommand() of old htop to everyone's satisfaction, well and good. We can wait for that commit and then we can emulate that heuristics in this pull request. Just that I don't wish to rewrite the heuristics myself, as it can be quite controversial (even I won't be satisfied with it frankly!) My main motivation for this pull request is the display of exe/comm/cmdline from security standpoint, and non-merged cmdline is only an afterthought for backwards compatibility. Best regards. |
|
Furthermore one quite easy heuristic would be checking for NUL-bytes in the buffer (all the problematic cases lack them) and delimit by space. If you're really worried you could even try to stat the path and move to the next delimiter (e.g. space) if not found. Due to the executable name you usually already have a few hints to what you're likely to end up highlighting: The only ever NUL byte that appears is the very last one read in cmdline, thus, after ignoring that one you simply assume you either got a program with no arguments (cmdline == &exe), or you split by any char Would be glad to receive real-world counter-examples for this heuristic, except the obvious one, when &exe contains spaces (which can be filtered by trying to stat that path and checking if what you found is a file). NB: Example for running explorer.exe in wine (One that is currently broken in the release version): |
|
Todo list for less confusing display:
|
|
@BenBE @fasterit : Reply to the TODO list:
For these reasons I don't think I should be touching the non-merged cmdline heuristics (apart from fixing the ':' bug in Process_writeCommand() that is already done).
To sum up: Pending agreement on 1. (heuristics preservation), I can work on documenting (3.) and making the EXE and COMM uppercase (4.). Otherwise we can let this pull request be, and let someone else carry this forward or use it when needed, and no harm is done. I can understand that tastes differ when it comes to presentation and it is difficult to get uniform agreement. Thanks for all your inputs. |
Note how my proposed heuristic addresses these 1% of cases and can work completely without having the EXE and COMM information available; just by working with the CMDLINE. |
Man page is awesome, but the active mode should also be reflected in the header "Command" -> "Command (merged)"
Think of this from the user perspective, magenta is close to red, it is an alerting color. So to summarize:
|
This has been discussed - any heurisitics that improves highlighting in the old cmdline for the 1% will break some other 1% (for example directories and basenames with spaces) and so is controversial. If someone still feels strongly about it, best to do the heuristics in the old htop (Process_writeCommand()), while this pull request waits, and then we can emulate that heuristics in this pull request - this was also suggested before, and modifying the old heuristics for highlighting non-merged cmdline is outside the scope of this pull request. |
I think two points are being missed here:
As I have already said, I am also not a fan of majenta, but out of 8 colors only this was available and decent. Regarding this please see my request for alternatives at end of the comment #42 (comment) Agreed that changing the title to "Command (merged)" could be done - it might help the user if he sees a screen full of non-merged cmdline's (due to lack of permission). |
|
Okay, based on your branch I implemented my heuristic (on Linux only for now). Also note, I couldn't place Thus take that patch as a PoC only for now. Okay, here for my initial patch ( diff --git a/linux/LinuxProcessList.c b/linux/LinuxProcessList.c
index 4b6e4b4..d2ad999 100644
--- a/linux/LinuxProcessList.c
+++ b/linux/LinuxProcessList.c
@@ -838,6 +838,36 @@ static void setCommand(Process* process, const char* command, int len) {
process->commLen = len;
}
+static int Compat_faccessat(int dirfd, const char* pathname, int mode, int flags) {
+ int ret;
+
+#ifdef HAVE_FACCESSAT
+ // Implementation note: AT_SYMLINK_NOFOLLOW unsupported on FreeBSD, fallback to lstat in that case
+
+ errno = 0;
+
+ ret = faccessat(dirfd, pathname, mode, flags);
+ if (!ret || errno != EINVAL)
+ return ret;
+#endif
+
+ // Error out on unsupported configurations
+ if (dirfd != AT_FDCWD || mode != F_OK) {
+ errno = EINVAL;
+ return -1;
+ }
+
+ // Fallback to stat(2)/lstat(2) depending on flags
+ struct stat statinfo;
+ if(flags) {
+ ret = lstat(pathname, &statinfo);
+ } else {
+ ret = stat(pathname, &statinfo);
+ }
+
+ return ret;
+}
+
static bool LinuxProcessList_readCmdlineFile(Process* process, const char* dirname, const char* name) {
LinuxProcess *lp = (LinuxProcess *)process;
char filename[MAX_NAME+1];
@@ -849,9 +879,7 @@ static bool LinuxProcessList_readCmdlineFile(Process* process, const char* dirna
char command[4096+1]; // max cmdline length on Linux
int amtRead = xread(fd, command, sizeof(command) - 1);
close(fd);
- int tokenEnd = 0;
- int tokenStart = 0;
- int lastChar = 0;
+
if (amtRead == 0) {
if (process->state == 'Z') {
process->basenameOffset = 0;
@@ -862,11 +890,24 @@ static bool LinuxProcessList_readCmdlineFile(Process* process, const char* dirna
} else if (amtRead < 0) {
return false;
}
+
+ int tokenEnd = 0;
+ int tokenStart = 0;
+ int lastChar = 0;
+ bool argSepNUL = false;
+ bool argSepSpace = false;
+
for (int i = 0; i < amtRead; i++) {
/* newline used as delimiter - when forming the mergedCommand, newline is
* converted to space by LinuxProcess_makeCommandStr */
if (command[i] == '\0') {
command[i] = '\n';
+ } else {
+ /* Record some information for the argument parsing heuristic below. */
+ if (tokenEnd)
+ argSepNUL = true;
+ if (command[i] <= ' ')
+ argSepSpace = true;
}
if (command[i] == '\n') {
@@ -883,10 +924,83 @@ static bool LinuxProcessList_readCmdlineFile(Process* process, const char* dirna
lastChar = i;
}
}
+
+ command[lastChar + 1] = '\0';
+
+ if (!argSepNUL && argSepSpace) {
+ /* Argument parsing heuristic.
+ *
+ * This heuristic is used for processes that rewrite their command line.
+ * Normally the command line is split by using NUL bytes between each argument.
+ * But some programs like chrome flatten this using spaces.
+ *
+ * This heuristic tries its best to undo this loss of information.
+ * To achieve this, we treat every character <= 32 as argument separators
+ * (i.e. all of ASCII control sequences and space).
+ * We then search for the basename of the cmdline in the first argument we found that way.
+ * As path names may contain we try to cross-validate if the path we got that way exists.
+ */
+
+ tokenStart = tokenEnd = 0;
+
+ // From initial scan we know there's at least one space.
+ // Check if that's part of a filename for an existing file.
+ if (Compat_faccessat(AT_FDCWD, command, F_OK, AT_SYMLINK_NOFOLLOW) != 0) {
+ // If we reach here the path does not exist.
+ // Thus begin searching for the part of it that actually is.
+
+ int tokenArg0Start = 0;
+
+ for (int i = 0; i <= lastChar; i++) {
+ /* Any ASCII control or space used as delimiter */
+ char tmpCommandChar = command[i];
+
+ if (command[i] <= ' ') {
+ if (!tokenEnd) {
+ command[i] = '\0';
+
+ bool found = Compat_faccessat(AT_FDCWD, command, F_OK, AT_SYMLINK_NOFOLLOW) == 0;
+
+ // Restore if this wasn't it
+ command[i] = found ? '\n' : tmpCommandChar;
+
+ if (found)
+ tokenEnd = i;
+ if (!tokenArg0Start)
+ tokenArg0Start = tokenStart;
+ } else {
+ // Split on every further separator, regardless of path correctness
+ command[i] = '\n';
+ }
+ } else if (!tokenEnd) {
+ if (command[i] == '/') {
+ tokenStart = i + 1;
+ } else if (command[i] == '\\' &&(!tokenStart || command[tokenStart - 1] == '\\')) {
+ tokenStart = i + 1;
+ }
+ }
+ }
+
+ if (!tokenEnd) {
+ tokenStart = tokenArg0Start;
+
+ // No token delimiter found, forcibly split
+ for (int i = 0; i <= lastChar; i++) {
+ if (command[i] <= ' ') {
+ command[i] = '\n';
+ if (!tokenEnd) {
+ tokenEnd = i;
+ }
+ }
+ }
+ }
+ }
+ }
+
if (tokenEnd == 0) {
- tokenEnd = amtRead;
+ tokenEnd = lastChar + 1;
}
- command[lastChar + 1] = '\0';
+
lp->mergedCommand.maxLen = lastChar + 1; /* accomodate cmdline */
if (!process->comm || strcmp(command, process->comm)) {
process->basenameOffset = tokenEnd; |
I understand that you intend to supplement (rather than replace) the old htop cmdline heuristics (highlight after last slash) with the new heuristics (using stat()).
But as non-merged cmdline is not important to me personally, I have no objections to either the old 99% heuristics or the new 99.5% heuristics, but the breakages etc. might concern some other users. A request: From this point onwards could you (or someone else) please carry this pull-req forward? It seems I have done all I could for the merged-Command and will not be able to come back to this anytime soon. Once the new owner's pull-req is merged I will close this pull-req. |
I'm not quite happy to take over the PR, but will see what I can do.
I'm fully aware of this shortcoming of the heuristic, but even then it introduces a great improvement over the previous implementation due to the fact that it at least partially get's the argument splitting right and thus only a small fraction of the actual command line is highlighted. Thus even while potentially the wrong part of the path may be highlighted you'd still only get one small segment highlighted (not the whole rest of the command line). There is a bit of improvement still possible, such that you could track the exact error returned and thus gain one more level of spaces even for the case of the directory that the file resides in isn't readable to the user itself, as long as there's no further space after that part (i.e. your example). Doable by tracking
As this heuristic addresses all known real-world cases (Chrome, Keybase) and widely enough covers most potential other cases I'm quite confident that it's quite unlikely that we'll see many more real world cases where this heuristic fails visibly. Even in your above examples you'll have to have BOTH a command line rewritten to use spaces (only few processes actually do) AND have the executable name contain spaces (even rarer) AND have the Important to me is, that the fallback behaviour in case of failure is sensible. And with this new heuristic is very much is: Path splitting by spaces with best effort to resolve the actual path. Okay, that said, I'll try to rebase this PR onto current master next, so we can address the last issue mentioned (too much magenta). |
|
Agree with what you have said.
The example was indeed contrived to show a corner-case, but one small clarification (though it doesn't make the example more likely): Here I was referring to actual cmdline=="/A/B C/D" (not rewritten one) - i.e. the executable exe=="/A/B C/D" actually exists and is running, but htop is unable to stat() due to dir "/A/B C" not being readable.
Super, much appreciated. One parting note: Changes suggested by @fasterit (header changes like uppercasing EXE, COMM, appending "(merged)" to "Command") that are in my local repo but I could not get around to rebase and commit: Best regards. |
I'll take a look at them later. Shouldn't be too hard to integrate them. Had to resolve like 4 minor conflicts when rebasing, thus most patches for your branch should work after the rebase as-is. |
|
@gnarendran Integrated your patches in my branch. Now IIRC only the magenta issue should be left. |
|
Implemented by #305. Many thanks for your initial patch. |





Addresses #40; Rebased from hishamhm/htop#809; Originally discussed in hishamhm/htop#801
With inputs from @Explorer09, @BenBE, @marxin
In this pull request, two new optional columns are added:
i) "Comm" - displays /proc/[pid]/comm if readable. It may not be readable for zombies.
ii) "Exe" - displays 15 characters of the basename of /proc/[pid]/exe if readable. htop is able to read the /proc/[pid]/exe of ALL processes only when htop is either run with root privilleges (done with care) or given the capability CAP_SYS_PTRACE (
sudo setcap 'cap_sys_ptrace=ep' /usr/bin/htop).A new option is added: "Merge exe, comm and cmdline in Command", toggled by the key 'm'
If this option is unset (default for backwards compatibility), Command column displays cmdline as usual. If this option is set, exe, comm and cmdline are merged in the Command column, which is to be interpreted as follows:
i) If no token is colorized, it implies htop didn't have permission to read /proc/[pid]/exe of the process, and htop has fallen back to displaying only cmdline.
ii) If a token (which may have embedded spaces) is colorized, upto 15 bytes of it is understood to be comm. This implies htop was able to read the process' /proc/[pid]/exe and /proc/[pid]/comm. During display, htop first tries to find/merge comm in exe's basename; If that fails and also if the new option 'Try to find comm in cmdline' is set (default), htop tries to find/merge comm in cmdline (this may mis-identify a string in cmdline in very rare cases, say if comm or cmdline had been unsuitably modified).
a) If comm was not merged into either exe or cmdline, three fields are displayed (with "│" as the field separator), the first being exe, the second comm, and the last cmdline.
b) If comm was merged into exe or cmdline: If exe could not be merged with cmdline, two fields are displayed, the first being exe and the last cmdline;
c) Otherwise, exe and cmdline are merged into a single field.
While htop currently sorts/filters the Command based always on cmdline, irrespective of whether it is full path or basename that is displayed, it is desirable to sort/filter based on what is displayed. After this change, Command is sorted/filtered based on what is displayed, whether it is exe or cmdline, full path or basename.
Implementation notes: