Skip to content

Update high-memory bootloader and build for 128k and 64k boards#23

Closed
dmsc wants to merge 3 commits intodevanlai:masterfrom
dmsc:dmsc-hiboot
Closed

Update high-memory bootloader and build for 128k and 64k boards#23
dmsc wants to merge 3 commits intodevanlai:masterfrom
dmsc:dmsc-hiboot

Conversation

@dmsc
Copy link
Copy Markdown
Contributor

@dmsc dmsc commented Apr 19, 2020

Hi!

This is the bootloader that I'm using now in my bluepill boards (I have three boards to test), tested also in one stm32-mini and one olimexino-stm32 (maple compatible)

The three commits are:

  • I rebased your high-memory code fixing a bug on the DFU upload (the original code was overwriting RAM when patching the data to send).

  • Added targets for 128kB and 64kB boards, as the high-memory bootloader needs to reside on the top of FLASH. If your bluepill reports 64kB of flash, you can try flashing the 128kB image regardless (using "st-flash --flash=0x20000"), as most chips internally have 128kB of flash. This works in all my bluepills.

  • Use small versions of the three libc funciones called: memcpy, strlen and strncpy. This reduces the bootloader size considerably.

Posting here to see if you are interested.

devanlai and others added 3 commits April 19, 2020 16:13
The bootloader now lives at the high end of flash, allowing an non-
bootloader-aware application to run without modifications.

The bootloader modifies the application's vector table to point to
the bootloader's reset stub. The actual reset vector is stored in
unused entries in the application's vector table.

Based on ideas from:
http://stm32duino.com/viewtopic.php?t=687

This is based on code from commit cf21dc3
fixing DFU upload and rebasing on current master branch.

Co-authored-by: Daniel Serpell <daniel.serpell@gmail.com>
@devanlai
Copy link
Copy Markdown
Owner

Hi @dmsc,
I'm glad that you've found the high-memory code useful and that you're able to make improvements to it. For me, the high-memory version of the bootloader was mostly a side experiment to see if it was possible to make a flash-resident bootloader that doesn't require address changes to use. I'm also mostly happy with the low-memory bootloader as it is now, though I always welcome bugfixes and improvements.

I see three paths forward for the high-memory variant of the bootloader:

  1. Merge your enhancements as a new highboot2 branch and cut a release for people who would like a new set of precompiled binaries for the high-memory bootloader. This would not involve much work for either of us, but also wouldn't make your improvements very visible.
  2. Add build targets and conditional compilation checks to facilitate building both the low-memory and high-memory versions of the bootloader and incorporate your bug fixes / enhancements. This would involve a bit of refactoring and testing, but should make both versions equally accessible to everyone.
  3. If you make a separate repository with your enhanced version of the high-memory bootloader, I'd be happy to add a link pointing people interested in an updated, high-memory version of the bootloader to your repository. If you create the repo without using the fork button (e.g, going through the "import existing repository" flow, it will be treated as a standalone repo and then GitHub will make features like code indexing, etc. work better.

@dmsc
Copy link
Copy Markdown
Contributor Author

dmsc commented Apr 21, 2020

Hi!

I'm glad that you've found the high-memory code useful and that you're able to make improvements to it. For me, the high-memory version of the bootloader was mostly a side experiment to see if it was possible to make a flash-resident bootloader that doesn't require address changes to use. I'm also mostly happy with the low-memory bootloader as it is now, though I always welcome bugfixes and improvements.

I liked the idea of the high-memory bootloader because can upload the same binary either with my st-link-v2 clone, allowing faster development, debugging and testing; or via the bootloader once the code is mostly stable - so I don't need two USB devices connected to my PC.

IMHO, the idea of storing the original stack and reset vectors in the unused slots is clever, and makes this code almost transparent, so it only has upsides compared to the normal bootloader. Also, improvements on the bootloader size can be done without backward incompatibilities, it just allows bigger firmware to be used.

I see three paths forward for the high-memory variant of the bootloader:

  1. Merge your enhancements as a new highboot2 branch and cut a release for people who would like a new set of precompiled binaries for the high-memory bootloader. This would not involve much work for either of us, but also wouldn't make your improvements very visible.
  2. Add build targets and conditional compilation checks to facilitate building both the low-memory and high-memory versions of the bootloader and incorporate your bug fixes / enhancements. This would involve a bit of refactoring and testing, but should make both versions equally accessible to everyone.
  3. If you make a separate repository with your enhanced version of the high-memory bootloader, I'd be happy to add a link pointing people interested in an updated, high-memory version of the bootloader to your repository. If you create the repo without using the fork button (e.g, going through the "import existing repository" flow, it will be treated as a standalone repo and then GitHub will make features like code indexing, etc. work better.

All three paths work for me. If you want to do option 2, I can modify my code to make conditional compilation of the changed parts.

Also, what do you think of commit 5f160ec , you can cherry-pick it if you want, or I could send it as a new pull-request.

Thanks!

@devanlai
Copy link
Copy Markdown
Owner

I liked the idea of the high-memory bootloader because can upload the same binary either with my st-link-v2 clone, allowing faster development, debugging and testing; or via the bootloader once the code is mostly stable - so I don't need two USB devices connected to my PC.

IMHO, the idea of storing the original stack and reset vectors in the unused slots is clever, and makes this code almost transparent, so it only has upsides compared to the normal bootloader. Also, improvements on the bootloader size can be done without backward incompatibilities, it just allows bigger firmware to be used.

For me, the main reasons to prefer a conventional bootloader are:

  • No need to differentiate between different flash sizes, since a conventional low-memory bootloader always lives at the lowest address, whereas you need a different bin/hex file to setup a high-memory bootloader for each flash size you want to support. (This is possible to work around with a self-updating bootstrapper, but it adds complexity, especially if you want to bundle both the bootloader and an application image togther).
  • The high-memory bootloader has a brief window of danger when rewriting the mixed application/bootloader vector table where it can brick itself if it is reset or power is lost in the middle of the middle of erasing / writing the vector table.

These are not unreasonable tradeoffs to make, but they're important enough that I wouldn't consider the high-memory bootloader a strict improvement.

All three paths work for me. If you want to do option 2, I can modify my code to make conditional compilation of the changed parts.

I think option 2 would be the best approach if you're up for it.

Also, what do you think of commit 5f160ec , you can cherry-pick it if you want, or I could send it as a new pull-request.

I'm surprised that replacing the standard (probably newlib-nano) implementations of those functions saves so much space. I might take a look at the linker map and asm output later to see what's going on there, but otherwise I'm fine with the size-optimized versions you've added.

@dmsc
Copy link
Copy Markdown
Contributor Author

dmsc commented Apr 22, 2020

Hi!

For me, the main reasons to prefer a conventional bootloader are:

  • No need to differentiate between different flash sizes, since a conventional low-memory bootloader always lives at the lowest address, whereas you need a different bin/hex file to setup a high-memory bootloader for each flash size you want to support. (This is possible to work around with a self-updating bootstrapper, but it adds complexity, especially if you want to bundle both the bootloader and an application image togther).
  • The high-memory bootloader has a brief window of danger when rewriting the mixed application/bootloader vector table where it can brick itself if it is reset or power is lost in the middle of the middle of erasing / writing the vector table.

Yes, this should be documented in the docs.

These are not unreasonable tradeoffs to make, but they're important enough that I wouldn't consider the high-memory bootloader a strict improvement.

All three paths work for me. If you want to do option 2, I can modify my code to make conditional compilation of the changed parts.

I think option 2 would be the best approach if you're up for it.

Ok, when I have more free time I will work on it.

Also, what do you think of commit 5f160ec , you can cherry-pick it if you want, or I could send it as a new pull-request.

I'm surprised that replacing the standard (probably newlib-nano) implementations of those functions saves so much space. I might take a look at the linker map and asm output later to see what's going on there, but otherwise I'm fine with the size-optimized versions you've added.

Ah, but I just realized, you are not using newlib-nano, the Makefile does not add the -specs=nano.specs flag.

The newlib memcpy try to copy 64 bytes at a time, so they have a lot of code to test alignment and handle odd sizes. You can see using "arm-none-eabi-objdump -S src/dapboot.elf", 236 bytes for memcpy!!:

0800fc2c <memcpy>:
 800fc2c:       4684            mov     ip, r0
 800fc2e:       ea41 0300       orr.w   r3, r1, r0
 800fc32:       f013 0303       ands.w  r3, r3, #3
 800fc36:       d149            bne.n   800fccc <memcpy+0xa0>
 800fc38:       3a40            subs    r2, #64 ; 0x40
 800fc3a:       d323            bcc.n   800fc84 <memcpy+0x58>
 800fc3c:       680b            ldr     r3, [r1, #0]
 800fc3e:       6003            str     r3, [r0, #0]
 800fc40:       684b            ldr     r3, [r1, #4]
 800fc42:       6043            str     r3, [r0, #4]
[...]
 800fd10:       788b            ldrb    r3, [r1, #2]
 800fd12:       7083            strb    r3, [r0, #2]
 800fd14:       4660            mov     r0, ip
 800fd16:       4770            bx      lr

If you use newlib-nano, current memcpy is same size as mine, strlen is 2 bytes bigger and strncpy is 80 bytes bigger. So, you gain 82 bytes.

@devanlai
Copy link
Copy Markdown
Owner

Ok, when I have more free time I will work on it.

Great, I'm looking forward to it 👍

Ah, but I just realized, you are not using newlib-nano, the Makefile does not add the -specs=nano.specs flag.

The newlib memcpy try to copy 64 bytes at a time, so they have a lot of code to test alignment and handle odd sizes. You can see using "arm-none-eabi-objdump -S src/dapboot.elf", 236 bytes for memcpy!!:

0800fc2c <memcpy>:
 800fc2c:       4684            mov     ip, r0
 800fc2e:       ea41 0300       orr.w   r3, r1, r0
 800fc32:       f013 0303       ands.w  r3, r3, #3
 800fc36:       d149            bne.n   800fccc <memcpy+0xa0>
 800fc38:       3a40            subs    r2, #64 ; 0x40
 800fc3a:       d323            bcc.n   800fc84 <memcpy+0x58>
 800fc3c:       680b            ldr     r3, [r1, #0]
 800fc3e:       6003            str     r3, [r0, #0]
 800fc40:       684b            ldr     r3, [r1, #4]
 800fc42:       6043            str     r3, [r0, #4]
[...]
 800fd10:       788b            ldrb    r3, [r1, #2]
 800fd12:       7083            strb    r3, [r0, #2]
 800fd14:       4660            mov     r0, ip
 800fd16:       4770            bx      lr

If you use newlib-nano, current memcpy is same size as mine, strlen is 2 bytes bigger and strncpy is 80 bytes bigger. So, you gain 82 bytes.

Ah, I didn't realize that I wasn't requesting newlib nano (the perils of copying and pasting the Makefile and not paying close attention to the options being inherited). Personally, if using newlib-nano instead gets to within 82 bytes of the manually replaced version, I would use that instead unless the firmware was right on the cusp of not fitting into the next smallest number of whole flash sectors.

@devanlai
Copy link
Copy Markdown
Owner

devanlai commented May 4, 2020

Superseded by #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants