Skip to content

faster CRC32 implementation#5023

Merged
UplinkCoder merged 1 commit intodlang:masterfrom
MartinNowak:faster_crc
Jan 10, 2017
Merged

faster CRC32 implementation#5023
UplinkCoder merged 1 commit intodlang:masterfrom
MartinNowak:faster_crc

Conversation

@MartinNowak
Copy link
Member

  • use slicing by 8 algorithm with bigger precomputed tables
  • roughly 4x faster

@WalterWaldron
Copy link
Contributor

WalterWaldron commented Jan 7, 2017

This suggests that this implementation is endian sensitive.
The link contains the endian aware modification and mentions that reordering the lookups could yield further performance improvement.

@MartinNowak
Copy link
Member Author

Nope, the implementation also works for big endian, b/c it assembles the uint's from byte-wise reads instead of relying on unaligned hardware loads. Still did the reordering of the operations which indeed provided a noticeable speedup.

@WalterWaldron
Copy link
Contributor

Ok, I hadn't digested that hasUnalignedReads was only to force that optimization in DMD rather than to provide a per-architecture switch.

LGTM on the basis of matching other slicing by 8 implementations.

@MartinNowak
Copy link
Member Author

MartinNowak commented Jan 8, 2017

After your comment, I was actually a bit unsure whether genTables is endian correct, so I build a gdc cross-compiler and tested on my MIPS router. Renamed the enum to make clear that this can only be done on LE architectures.

- use slicing by 8 algorithm with bigger precomputed tables
- roughly 4x faster
@DmitryOlshansky
Copy link
Member

LGTM, also love the CTFE construction of the table.

Copy link
Member

@DmitryOlshansky DmitryOlshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@UplinkCoder
Copy link
Member

Auto-merge toggled on

@UplinkCoder UplinkCoder merged commit c0b6660 into dlang:master Jan 10, 2017
@MartinNowak MartinNowak deleted the faster_crc branch January 10, 2017 11:57
@bgaff
Copy link

bgaff commented Jan 11, 2017

Dumb question here but why can't sse4 crc32 numonics be used when available? I apologize of the answer is obvious

@kubo39
Copy link
Contributor

kubo39 commented Jan 11, 2017

@bgaff SSE4.2 crc32 instruction is for castagnoli polynomial, not IEEE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants