Skip to content
This repository was archived by the owner on Jun 26, 2020. It is now read-only.

Binary function names#91

Merged
stoklund merged 9 commits intobytecodealliance:masterfrom
zummenix:feature/#47-binary-function-names
Jun 10, 2017
Merged

Binary function names#91
stoklund merged 9 commits intobytecodealliance:masterfrom
zummenix:feature/#47-binary-function-names

Conversation

@zummenix
Copy link
Contributor

@zummenix zummenix commented Jun 4, 2017

Eventually will close #47

I hope I move in right direction.

Copy link
Contributor

@stoklund stoklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

All your changes to the file tests and unit tests look good.

The FunctionName implementation should be able to print itself, using the %nnn syntax when possible, and falling back to the #xxxx syntax otherwise.

It would be good to add some test containing Unicode alphabetic characters, just to make sure they are not printed with %nnn.

fn basic() {
let mut f = Function::new();
assert_eq!(f.to_string(), "function \"\"() {\n}\n");
assert_eq!(f.to_string(), "function %() {\n}\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think this is an OK syntax for the empty function name.

///
/// Caller should validate that the string contains only
/// ASCII alphanumerical characters and `_`.
pub fn from_string(s: &str) -> FunctionName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with_str would be a better name for this constructor. Thefrom_string name implies a conversion constructor which would be expected to consume a String. See the Rust naming conventions.

I don't think the comment is correct. The caller should be allowed to pass any string, and it will be printed in #xxx form if it is not pure ASCII.

f.write_str(&self.0)
}
f.write_char('%')?;
f.write_str(&self.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps easier: write!(f, "%{}", self.0).

I would like for the Display implementation to still detect if the function name is pure ASCII so it can be printed with the % syntax. If it can't, it should be printed in hexadecimal instead with #xxxx. The legal characters in the %xx notating is ASCII alphanumerical characters and _.

fn formatting_string() {
assert_eq!(FunctionName::from_string("").to_string(), "%");
assert_eq!(FunctionName::from_string("x").to_string(), "%x");
assert_eq!(FunctionName::from_string(" ").to_string(), "% ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first two are correct, the second one should print as #20.

@stoklund
Copy link
Contributor

stoklund commented Jun 5, 2017

BTW, some of this may be easier if you switch the internal representation from a String to a Vec<u8>. Then you won't have to worry about invalid UTF-8 etc.

It's still useful for the parser to have a with_str constructor that takes a UTF-8 &str like the one you added. See str::as_bytes().

@stoklund stoklund closed this Jun 5, 2017
@stoklund
Copy link
Contributor

stoklund commented Jun 5, 2017

Sorry, didn't mean to close this!

@stoklund stoklund reopened this Jun 5, 2017
@zummenix
Copy link
Contributor Author

zummenix commented Jun 5, 2017

Thank you for the review!

Initially I thought that FunctionName will be an enum with cases like Name and Hex since lexer recognises %nnnn or #xxxx and these sequences are valid to create FunctionName.

So my next actions are:

  • FunctionName will accept any bytes (two constructors new and with_str).
  • for Token::Name we'll convert str::as_bytes()
  • for Token::HexSequence we'll convert from hex to bytes
  • If FunctionName can Display itself as %nnnn it will, if not it'll fallback to #xxxx

Is that correct?

@stoklund
Copy link
Contributor

stoklund commented Jun 5, 2017

Yes, you got it right

@zummenix
Copy link
Contributor Author

zummenix commented Jun 6, 2017

@stoklund, It would be helpful to have some dependency or module that can convert to/from hex. I'm hesitant to copy code from other crates.

fn is_id_continue(c: char) -> bool {
c.is_ascii() && (c == '_' || c.is_alphanumeric())
/// Creates a new function name from a sequence of bytes.
pub fn new(bytes: Vec<u8>) -> FunctionName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just take a bytes: &[u8] slice here. We don't want to force the caller to allocate a Vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave one generic constructor:pub fn new<T>(v: T) -> FunctionName where T: Into<Vec<u8>> ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in addition to a &[u8] constructor? That would be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean only one constructor. That way we can create FunctionName using &str and &[u8]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. As long as we can construct from both an &str and an &[u8], that is fine.

f.write_char('#')?;
for i in self.0.iter().map(|&b| b as usize) {
f.write_char(HEX_CHARS[i >> 4] as char)?;
f.write_char(HEX_CHARS[i & 0xf] as char)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can print a hexadecimal byte like this:

write!(f, "{:02x}", b)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, didn't think of it.

@stoklund
Copy link
Contributor

stoklund commented Jun 6, 2017

For parsing the hexadecimal string, you can use u8::from_str_radix() after you split the string into two-character parts.

@zummenix zummenix changed the title [WIP] Binary function names Binary function names Jun 10, 2017
Copy link
Contributor

@stoklund stoklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks!

@stoklund stoklund merged commit 8b484b1 into bytecodealliance:master Jun 10, 2017
@zummenix zummenix deleted the feature/#47-binary-function-names branch June 11, 2017 04:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binary function names

2 participants