## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
- Optimization
<!-- - Bugfix -->
<!-- - Feature -->
<!-- - Code style update (formatting) -->
<!-- - Refactoring (no functional changes, no api changes) -->
<!-- - Build or CI related changes -->
<!-- - Documentation content changes -->
<!-- - Sample app changes -->
<!-- - Other... Please describe: -->
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying, or link to a relevant issue. -->
Some `for` loops have unoptimal codegen involving the indexing at each iteration.
For instance, as a very simple test, this method simply sets all items in an input `Span<int>` to `0`:
```csharp
public static void M1(Span<int> span)
{
ref int r0 = ref MemoryMarshal.GetReference(span);
int length = span.Length;
for (int i = 0; i < length; i++)
{
Unsafe.Add(ref r0, i) = 0;
}
}
```
```asm
C.M1(System.Span`1<Int32>)
L0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: xor ecx, ecx
L0008: test edx, edx
L000a: jle short L001c
L000c: movsxd r8, ecx
L000f: xor r9d, r9d
L0012: mov [rax+r8*4], r9d
L0016: inc ecx
L0018: cmp ecx, edx
L001a: jl short L000c
L001c: ret
```
Here the loop starts at `L000c`, and at every iteration it takes the loop counter, extends it to native int, then uses it to index from the initial reference (that `[rax+r8*4]` offset calculation), and then writes to it. This is unnecessary logic, in cases such as this.
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
Refactored some loops to operate within a target address range, with all indexing out of the loop body.
```csharp
public static void M2(Span<int> span)
{
ref int r0 = ref MemoryMarshal.GetReference(span);
ref int r1 = ref Unsafe.Add(ref r0, span.Length);
while (Unsafe.IsAddressLessThan(ref r0, ref r1))
{
r0 = 0;
r0 = ref Unsafe.Add(ref r0, 1);
}
}
```
```asm
C.M2(System.Span`1<Int32>)
L0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: movsxd rdx, edx
L0009: lea rdx, [rax+rdx*4]
L000d: cmp rax, rdx
L0010: jae short L001f
L0012: xor ecx, ecx
L0014: mov [rax], ecx
L0016: add rax, 4
L001a: cmp rax, rdx
L001d: jb short L0012
L001f: ret
```
Here instead we pre-calculate the target address just once, outside the loop, and then just iterate until the initial, moving reference reaches that point. This allows the actual loop to be more compact and with no indexing logic needed. We just read directly from the moving reference, and then increment it by a fixed amount at the end of each iteration. Not groundbreaking, but better 🚀
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] ~~Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->~~
- [ ] ~~Sample in sample app has been added / updated (for bug fixes / features)~~
- [ ] ~~Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)~~
- [X] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes
<!-- 🚨 Please Do Not skip any instructions and information mentioned below as they are all required and essential to evaluate and test the PR. By fulfilling all the required information you will be able to reduce the volume of questions and most likely help merge the PR faster 🚨 -->
<!-- 📝 It is preferred if you keep the "☑️ Allow edits by maintainers" checked in the Pull Request Template as it increases collaboration with the Toolkit maintainers by permitting commits to your PR branch (only) created from your fork. This can let us quickly make fixes for minor typos or forgotten StyleCop issues during review without needing to wait on you doing extra work. Let us help you help us! 🎉 -->
## Follow up from #3520
<!-- Add the relevant issue number after the "#" mentioned above (for ex: Fixes#1234) which will automatically close the issue once the PR is merged. -->
<!-- Add a brief overview here of the feature/bug & fix. -->
## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
- Optimization
<!-- - Bugfix -->
<!-- - Feature -->
<!-- - Code style update (formatting) -->
<!-- - Refactoring (no functional changes, no api changes) -->
<!-- - Build or CI related changes -->
<!-- - Documentation content changes -->
<!-- - Sample app changes -->
<!-- - Other... Please describe: -->
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying, or link to a relevant issue. -->
The codegen for the second branch in `RuntimeHelpers.ConvertLength` does a signed division:
9b75c9f910/Microsoft.Toolkit.HighPerformance/Helpers/Internals/RuntimeHelpers.cs (L43-L46)
This is not the best for the codegen, as the JIT has to handle the sign in that division, resulting in the following:
```asm
; [System.Byte, System.Private.CoreLib],[System.Numerics.Vector4, System.Numerics.Vectors]
ConvertLength[TFrom, TTo](Int32)
L0000: mov eax, ecx
L0002: sar eax, 0x1f
L0005: and eax, 0xf
L0008: add eax, ecx
L000a: sar eax, 4
L000d: ret
```
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
Avoided that with a cast to `uint`, since the length is guaranteed to be a positive value in `[0, int.MaxValue]` anyway:
```asm
; [System.Byte, System.Private.CoreLib],[System.Numerics.Vector4, System.Numerics.Vectors]
L0000: mov eax, ecx
L0002: shr eax, 4
L0005: ret
```
Perfect! 😄🎉
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] ~~Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->~~
- [ ] ~~Sample in sample app has been added / updated (for bug fixes / features)~~
- [ ] ~~Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)~~
- [X] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes
## Adds the .NET 5 target to `Microsoft.Toolkit.HighPerformance`
## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
<!-- - Bugfix -->
- Feature
<!-- - Code style update (formatting) -->
<!-- - Refactoring (no functional changes, no api changes) -->
<!-- - Build or CI related changes -->
<!-- - Documentation content changes -->
<!-- - Sample app changes -->
<!-- - Other... Please describe: -->
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying, or link to a relevant issue. -->
The `Microsoft.Toolkit.HighPerformance` package maxes out at .NET Core 3.1.
The `Microsoft.Toolkit` package maxes out at .NET Standard 2.1.
Additionally, `Microsoft.Toolkit` doesn't have proper nullability annotations, and it reports installing additional dependencies if installed in a .NET 5 apps. The extra dependency is `System.Runtime.CompilerServices.Unsafe` which is actually built-in on .NET 5, but consumers not aware of this would still see the installation prompt from NuGet as reporting an extra indirect dependency.
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
✅ Added .NET 5 target to `Microsoft.Toolkit.HighPerformance`
✅ Added .NET 5 target to `Microsoft.Toolkit`
✅ Enabled global nullability annotations to `Microsoft.Toolkit` and improved the codebase.
✅ Enabled C# 9 in both projects, with some extra code tweaks.
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->
- [ ] Sample in sample app has been added / updated (for bug fixes / features)
- [ ] Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)
- [ ] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes
<!-- If this PR contains a breaking change, please describe the impact and migration path for existing applications below.
Please note that breaking changes are likely to be rejected. -->
## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
- Performance improvement
<!-- - Bugfix -->
<!-- - Feature -->
<!-- - Code style update (formatting) -->
<!-- - Refactoring (no functional changes, no api changes) -->
<!-- - Build or CI related changes -->
<!-- - Documentation content changes -->
<!-- - Sample app changes -->
<!-- - Other... Please describe: -->
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
About 20% improvement on .NET 5 when working on `char` types (or larger):
![image](https://user-images.githubusercontent.com/10199417/97509236-ff526e80-1981-11eb-8a90-f8aa72f1551e.png)
This was done by adding an unrolled loop for the vectorized path of the SIMD accelerated version of `Count<T>`.
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] ~~Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->~~
- [ ] ~~Sample in sample app has been added / updated (for bug fixes / features)~~
- [ ] ~~Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)~~
- [X] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes
## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
<!-- - Bugfix -->
- Feature
<!-- - Code style update (formatting) -->
<!-- - Refactoring (no functional changes, no api changes) -->
<!-- - Build or CI related changes -->
<!-- - Documentation content changes -->
<!-- - Sample app changes -->
<!-- - Other... Please describe: -->
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying, or link to a relevant issue. -->
There is currently no way to interoperate between the `IBufferWriter<T>` interface and the `Stream` class. Many APIs in the BCL and in 3rd party libraries use `Stream` as the standard way to accept an instance that can be written to or read from, and there is no built-in way to have a memory stream that is also using memory pooling, because none of the types in the BCL and in the `HighPerformance` package currently support both features at the same time. This PR fixes that 😄🚀
Consider this example that I saw from a user in the C# Discord server:
```csharp
public byte[] Compress(byte[] source)
{
MemoryStream output = new MemoryStream();
using (DeflateStream dstream = new DeflateStream(output, CompressionLevel.Optimal))
{
dstream.Write(source, 0, source.Length);
}
return output.ToArray();
}
public byte[] Decompress(byte[] source)
{
MemoryStream input = new MemoryStream(source);
MemoryStream output = new MemoryStream();
using (DeflateStream dstream = new DeflateStream(input, CompressionMode.Decompress))
{
dstream.CopyTo(output);
}
return output.ToArray();
}
```
You can see how the code is very memory inefficient: the `MemoryStream` type will just `new`-up arrays as it goes, and at the end `ToArray()` is used too, which will duplicate the arrays too. Even by removing that, the main issue within `MemoryStream` remains. With the new extension introduced in this PR, these two APIs can be rewritten much more efficiently, like this:
```csharp
public IMemoryOwner<byte> Compress(ReadOnlySpan<byte> span)
{
ArrayPoolBufferWriter<byte> bufferWriter = new ArrayPoolBufferWriter<byte>();
using DeflateStream deflateStream = new DeflateStream(bufferWriter.AsStream(), CompressionLevel.Optimal);
deflateStream.Write(span);
return bufferWriter;
}
public IMemoryOwner<byte> Decompress(ReadOnlyMemory<byte> memory)
{
ArrayPoolBufferWriter<byte> bufferWriter = new ArrayPoolBufferWriter<byte>(memory.Length);
using DeflateStream deflateStream = new DeflateStream(memory.AsStream(), CompressionMode.Decompress);
deflateStream.CopyTo(bufferWriter.AsStream());
return bufferWriter;
}
```
Which heavily leverages all the various APIs and helpers in the `HighPerformance` package, and gives us the following results:
| Method | Categories | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |----------- |------------:|----------:|----------:|------:|---------:|---------:|---------:|----------:|
| new[] | COMPRESS | 29,923.5 us | 174.19 us | 162.94 us | 1.00 | 312.5000 | 312.5000 | 312.5000 | 3089853 B |
| **pool** | COMPRESS | **29,116.0 us** | 120.55 us | 106.87 us | **0.97** | - | - | - | **297 B** |
| | | | | | | | | | |
| new[] | DECOMPRESS | 832.9 us | 9.96 us | 8.83 us | 1.00 | 337.8906 | 336.9141 | 336.9141 | 2966680 B |
| **pool** | DECOMPRESS | **119.6 us** | 0.70 us | 0.62 us | **0.14** | - | - | - | **392 B** |
This benchmark compresses and decompresses a 1MB buffer, using the two methods detailed above.
You can see the vastly reduced memory allocations using the pooled writer backed stream 🚀
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
This PR introduces this new extension:
```csharp
namespace Microsoft.Toolkit.HighPerformance.Extensions
{
public static class ArrayPoolBufferWriterExtensions
{
public static Stream AsStream(this ArrayPoolBufferWriter<byte> writer);
}
public static class IBufferWriterExtensions
{
public static Stream AsStream(this IBufferWriter<byte> writer);
}
}
```
Which helps to interoperate between the `IBufferWriter<T>` interface and the `Stream` class. In particular, since the `HighPerformance` package includes the `ArrayPoolBufferWriter<T>` type, this extension allows users to use that as a `Stream`, and then keep working with the resulting `ReadOnlyMemory<T>` produced by that type, as shown above.
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] ~~Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->~~
- [ ] ~~Sample in sample app has been added / updated (for bug fixes / features)~~
- [ ] ~~Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)~~
- [X] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes
## PR Type
What kind of change does this PR introduce?
<!-- Please uncomment one or more that apply to this PR. -->
- Feature
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying, or link to a relevant issue. -->
Right now there is no (easy) way to cast a `Memory<TFrom>` instance to a `Memory<TTo>` instance. There are APIs to to do that for `Span<T>` instances, but not for `Memory<T>`. The reason for that is that with a `Span<T>` it's just a matter of retrieving the wrapped reference, reinterpreting it and then adjusting the size, then creating a new `Span<T>` instance. But a `Memory<T>` instance is completely different: it wraps an object which could be either a `T[]` array, a `MemoryManager<T>` instance, etc. The result is that currently there are no APIs in the BCL nor in the toolkit to just "cast" a `Memory<T>`.
This feature has been requested by a number of developers, including in a well known library such as `ImageSharp`:
> Yes, that's exactly what I would need. But I'm wondering how would you implement it.
> It's certainly non trivial to cast a `Memory<byte>` to a `Memory<TPixel>` and if there's an API for that I would gladly want to know...
> So I pressume `ImageSharp` would need to do some work under the hood.
(_`ImageSharp` issue, [here](https://github.com/SixLabors/ImageSharp/issues/1097#issuecomment-580639914)_)
To solve that, I created a very simplified version of the code included in this PR, into a PR [here](https://github.com/SixLabors/ImageSharp/pull/1314).
Having this available right out of the box in the `HighPerformance` package would be helpful in a number of similar situations, especially with `Memory<T>` APIs becoming more and more common across libraries now (as they've been out for a while).
## What is the new behavior?
<!-- Describe how was this issue resolved or changed? -->
This PR includes 4 new extensions for the `Memory<T>` and `ReadOnlyMemory<T>` types that enable the following:
```csharp
// Cast between two Memory<T> instances...
Memory<byte> memoryOfBytes = new byte[128].AsMemory();
Memory<float> memoryOfFloats = memoryOfBytes.Cast<byte, float>();
// ...any number of times is needed
Memory<int> memoryOfInts = memoryOfFloats.Cast<float, int>();
Memory<byte> backToBytesMemory = memoryOfInts.Cast<int, byte>();
// Or just convert into bytes directly
Memory<int> sourceAsInts = new int[128].AsMemory();
Memory<byte> sourceAsBytes = sourceAsInts.AsBytes();
// Want to get a stream from a string? Why not! 😄
using (Stream stream = "Hello world".AsMemory().AsBytes().AsStream())
{
// Use the stream here, which reads *directly* from the string data!
}
```
Here is the full list of the new APIs introduced in this PR:
```csharp
namespace Microsoft.Toolkit.HighPerformance.Extensions
{
public static class MemoryExtensions
{
public static Memory<byte> AsBytes<T>(this Memory<T> memory)
where T : unmanaged;
public static Memory<TTo> Cast<TFrom, TTo>(this Memory<TFrom> memory)
where TFrom : unmanaged
where TTo : unmanaged;
}
public static class ReadOnlyMemoryExtensions
{
public static ReadOnlyMemory<byte> AsBytes<T>(this ReadOnlyMemory<T> memory)
where T : unmanaged;
public static ReadOnlyMemory<TTo> Cast<TFrom, TTo>(this ReadOnlyMemory<TFrom> memory)
where TFrom : unmanaged
where TTo : unmanaged;
}
}
```
## Notes
Marking as draft as this is still being worked on, but feedbacks and reviews are welcome! 😄
## PR Checklist
Please check if your PR fulfills the following requirements:
- [X] Tested code with current [supported SDKs](../readme.md#supported)
- [ ] ~~Pull Request has been submitted to the documentation repository [instructions](..\contributing.md#docs). Link: <!-- docs PR link -->~~
- [ ] ~~Sample in sample app has been added / updated (for bug fixes / features)~~
- [ ] ~~Icon has been created (if new sample) following the [Thumbnail Style Guide and templates](https://github.com/windows-toolkit/WindowsCommunityToolkit-design-assets)~~
- [X] Tests for the changes have been added (for bug fixes / features) (if applicable)
- [X] Header has been added to all new source files (run *build/UpdateHeaders.bat*)
- [X] Contains **NO** breaking changes