Skip to content

Conversation

@raulgarciamsft
Copy link
Contributor

Cast between semantically different string types: char* from/to wchar_t*
NOTE: Please let me know if you want to use a different CWE than CWE-704

raulgarciamsft and others added 2 commits October 1, 2018 10:25
Cast between semantically different string types: char* from/to wchar_t*
NOTE: Please let me know if you want to use a different CWE than CWE-704
@rdmarsh2
Copy link
Contributor

rdmarsh2 commented Oct 1, 2018

I ran this on both Linux and Windows builds of ChakraCore. It finds 0 results on the Linux build and 59 on the Windows build, so I suspect we'll want changes so that it produces results on ChakraCore on LGTM.com.

@geoffw0 I believe you're the most familiar with how string types are handled in ChakraCore on Linux, so I've assigned this PR to you.

@raulgarciamsft
Copy link
Contributor Author

Thanks.
Please let me know if there are any instructions I should follow to access ChakraCode for Linux on lgtm (feel free to send the instructions to me privately via email).

@rdmarsh2
Copy link
Contributor

rdmarsh2 commented Oct 1, 2018

Unfortunately, the ChakraCore build on LGTM.com is currently broken. There should be a fix for it in the next couple days, but I can send you the snapshot I've been testing with in the meantime.

@raulgarciamsft
Copy link
Contributor Author

That will work. Thanks a lot.

@raulgarciamsft
Copy link
Contributor Author

I tried building a snapshot for ChakraCore (https://github.com/microsoft/ChakraCore) for Windows, and I haven’t been able to find any instance of code that matches this new rule on it, nor on the Linux Snapshot that Robert shared with me.
Do you have any hints on what files/lines should I look at?
Thanks

@jbj
Copy link
Contributor

jbj commented Oct 2, 2018

I ran this query on 43 projects and was surprised that it found results on only two projects. On the Suspicious pointer scaling query we've had to assume that all casts to a char type are fine because people tend to convert to char * and then do a memcpy-like operation.

The results of this query also fall in this category. In fish-shell, the char pointers are used to copy wide chars into a block of memory returned from malloc where data of various types is manually laid out. In pwsafe, such a cast is used to take the SHA1 hash of a wide-char string. It seems reasonable to me that a SHA1 function takes a char * argument, and it also seems reasonable to want to call that function with an array of elements where there is no padding.

The @description and qhelp suggest that the primary use case for this query is to catch conversions from char * to wchar_t *. What do you say to ignoring conversions that go the other way? I ran that version of the query and found that it gave fewer and better results. The results on pwsafe look dodgy: I can't see how the memory allocated to an unsigned char[] will be suitably aligned for wchar_t *. Maybe the qhelp should specifically mention alignment.

@@ -0,0 +1,3 @@
LPWSTR pSrc;

pSrc = (LPWSTR)"a"; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please state this example in terms of wchar_t, which is a standard type.

<qhelp>

<overview>
<p>This rule indicates a potentially incorrect cast from/to an ANSI string (<code>char *</code>) to/from a Unicode string (<code>wchar_t *</code>).</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's accurate to say that char is ANSI and wchar_t is Unicode. That's what they're called in Windows APIs, but other platforms use char for both 7-bit ASCII, 8-bit legacy charsets, and UTF-8 Unicode.

I suggest replacing "ANSI string" with "byte string" and "Unicode string" with "wide-character string". Also below.

</overview>

<recommendation>
<p>Do not explicitly casting ANSI strings to/from Unicode strings.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"casting" -> "cast"

</overview>

<recommendation>
<p>Do not explicitly casting ANSI strings to/from Unicode strings.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should give a recommendation that can make the alert go away and keep the code working. What is the typical remedy for this sort of error? For string literals we can recommend prepending L. For other cases, should we recommend calling an appropriate (platform-dependent) conversion function?

</recommendation>

<example>
<p>In the following example, an ANSI string literal (<code>"a"</code>) is casted as a Unicode string.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"casted as" -> "cast to"

@raulgarciamsft
Copy link
Contributor Author

Thanks a lot. I will work on all your suggestions and submit them ASAP.

@@ -0,0 +1,29 @@
/**
* @name Cast between semantically different string types: char* from/to wchar_t*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a shorter @name for consistency with our existing queries. I propose "Cast from char* to wchar_t*".

@@ -0,0 +1,29 @@
/**
* @name Cast between semantically different string types: char* from/to wchar_t*
* @description This rule indicates a potentially incorrect cast from/to an ANSI string (char *) to/from a Unicode string (wchar_t *).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @description is out of date with the recent changes. For consistency with our other queries, I suggest rephrasing it as follows: "Casting a byte string to a wide-character string is likely to yield a string that is incorrectly terminated or aligned. This can lead to undefined behavior, including buffer overruns."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes ready.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the query and the fixes. I'll merge this now, and I look forward to seeing the results on lgtm.com. The next deployment should happen in a week or two.

@jbj jbj merged commit 4720c5a into github:master Oct 4, 2018
@geoffw0
Copy link
Contributor

geoffw0 commented Oct 8, 2018

Sorry I didn't get around to looking at this last week. The new query looks good, and we really appreciate the tests and qhelp alongside.

geoffw0 added a commit to geoffw0/ql that referenced this pull request Oct 8, 2018
@raulgarciamsft raulgarciamsft deleted the users/raulga/c6276 branch October 4, 2019 16:58
aibaars pushed a commit that referenced this pull request Oct 14, 2021
Enable dependabot on the Rust projects
smowton pushed a commit to smowton/codeql that referenced this pull request Apr 16, 2022
MathiasVP added a commit to MathiasVP/ql that referenced this pull request Aug 10, 2025
PS: Add simple type-based sanitizer to SQL injection query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants