61 lines
2.2 KiB
Markdown
61 lines
2.2 KiB
Markdown
fastxor
|
|
-----
|
|
|
|
[](https://godoc.org/github.com/lukechampine/fastxor)
|
|
[](https://goreportcard.com/report/github.com/lukechampine/fastxor)
|
|
|
|
```
|
|
go get github.com/lukechampine/fastxor
|
|
```
|
|
|
|
Is there a gaping hole in your heart that can only be filled by xor'ing byte
|
|
streams at 20GB/s? If so, you've come to the right place.
|
|
|
|
`fastxor` is exactly what it sounds like: a package that xors bytes as fast
|
|
as your CPU is capable of. For best results, use a CPU that supports a SIMD
|
|
instruction set like SSE or AVX. On other architectures, performance is much
|
|
less impressive, but still faster than a naive byte-wise loop.
|
|
|
|
I wrote this package to try my hand at writing Go assembly, so please scrutinize
|
|
my code and let me know how I could make it faster or cleaner!
|
|
|
|
|
|
# Benchmarks
|
|
|
|
```
|
|
AVX:
|
|
|
|
BenchmarkBytes/16-4 200000000 8.72 ns/op 1835.82 MB/s
|
|
BenchmarkBytes/1024-4 50000000 38.1 ns/op 26850.41 MB/s
|
|
BenchmarkBytes/65k-4 500000 2738 ns/op 23930.93 MB/s
|
|
|
|
SSE:
|
|
|
|
BenchmarkBytes/16-4 200000000 8.63 ns/op 1852.98 MB/s
|
|
BenchmarkBytes/1024-4 50000000 39.4 ns/op 25993.00 MB/s
|
|
BenchmarkBytes/65k-4 500000 2733 ns/op 23975.08 MB/s
|
|
|
|
Word-wise:
|
|
|
|
BenchmarkBytes/16-4 100000000 10.5 ns/op 1521.66 MB/s
|
|
BenchmarkBytes/1024-4 10000000 125 ns/op 8163.59 MB/s
|
|
BenchmarkBytes/65k-4 200000 6895 ns/op 9504.62 MB/s
|
|
|
|
Byte-wise:
|
|
|
|
BenchmarkBytes/16-4 100000000 17.3 ns/op 925.16 MB/s
|
|
BenchmarkBytes/1024-4 2000000 841 ns/op 1216.31 MB/s
|
|
BenchmarkBytes/65k-4 30000 54100 ns/op 1211.38 MB/s
|
|
```
|
|
|
|
Conclusions: `fastxor` is 2-25 times faster than a naive `for` loop. AVX and
|
|
SSE performance is roughly equivalent, which makes me suspect that I may be
|
|
doing something wrong. Lastly, for very small slices, the cost of the function
|
|
call starts to outweigh the benefit of AVX/SSE (the Go compiler never inlines
|
|
handwritten asm). If you need to xor exactly 16 bytes (common in block
|
|
ciphers), the specialized `Block` function outperforms the more generic
|
|
`Bytes`:
|
|
|
|
```
|
|
BenchmarkBlock-4 500000000 3.69 ns/op 4337.88 MB/s
|
|
``` |