LinqToArray 1.0.0
See the version list below for details.
dotnet add package LinqToArray --version 1.0.0
NuGet\Install-Package LinqToArray -Version 1.0.0
<PackageReference Include="LinqToArray" Version="1.0.0" />
paket add LinqToArray --version 1.0.0
#r "nuget: LinqToArray, 1.0.0"
// Install LinqToArray as a Cake Addin #addin nuget:?package=LinqToArray&version=1.0.0 // Install LinqToArray as a Cake Tool #tool nuget:?package=LinqToArray&version=1.0.0
Optimized LINQ subset dedicated to array
Background
One day when I was looking at our codes, about 80% of the scene using LINQ uses array as an argument, and calls ToArray
instantly. In many case, We do not need the versatility as much as System.Enumerable
, so I want optimized LINQ subset dedicated to array → array situation.
Therefore, I implemented such LINQ subset with the following assumption.
- Dedicated to array
- Not assume pipelining<sup>※1</sup>
- Length, or at least the upper limit<sup>※2</sup> of length, is known
- The length is short enough<sup>※3</sup>
(Though this restriction is tremendously strong, the rate of speeding up is (commonly) 30% ~ (very rarely) 400%. So I'm not sure it is useful or not ...)
<sup>※1</sup>
A multistage query such as Select().Where.Select().OrderBy()...
. Array to array optimization reduces an overhead. However, allocating array in each pipeline stage has much high computational costs than the overhead.
<sup>※2</sup> For example, in the case of 'Where' or 'Distinct', the exact length of the output is not known but the maximum length could be known. The output is always shorter than the input.
<sup>※3</sup>
This is a limit necessary to allocate temporary buffer with stackalloc
. stackalloc
ing big size memory easily raises stack overflows. In the near future with C# 7.2, this issue will be avoided with Span<T> buffer = len < LimitSize ? (Span<T>)stackalloc T[len] : (Span<T>)new T[len]
.
Optimization method
Dedicated to array
In general, the foreach
statement is expanded as follows.
Before:
foreach (var x in data)
{
}
After (in general):
{
var e = data.GetEnumerator();
try
{
while (e.MoveNext())
{
var x = e.Current;
}
}
finally
{
((IDisposable)e).Dispose();
}
}
For an array, however, this is expanded as follows.
After (for an array):
for (int i = 0; i < data.Length; i++)
{
var x = data[i];
}
This code is faster than the general IEnumerable<T>
code. As described later, two costs - allocating IEnumerator
and range check of array - disappear.
Restriction
For this optimization, it really must be dedicated to array.
(In the other optimization, it is sufficient if the length is known. Thus IList
or IEnumerable
+ length can be used.)
(Supplement) IEnumerator
allocation
In the general case of the foreach
statement, GetEnumerator
allocates an IEnumearator
instance.
GetEnumerator
in some classes such as List<T>
returns a struct
enumerator, but the enumerator is "boxed" if calling the GetEnumerator
via IEnumerable<T>
interface. This causes an heap allocation - small but not trivial cost.
(Supplement) Range check of array
Normally, The JIT compiler inserts a range check for each index access to array. It is indispensable to prevent buffer overrun vulnerability but costs a considerable amount.
However, when writing a for
statement like the following, the range check in the index access x[i]
is eliminated in JIT optimization process.
for ( int i = 0 ; i <a. Length; i ++)
{
var x = a [i];
}
For IList<T>
or IReadOnlyList<T>
, such an optimization will not work even if you write the same for
statement
Make the length known
In general, ToArray
operation for IEnumerable<T>
(size unknown) needs temporary buffer which is resized on demand.
On the other hand, if the length is known at the biginning, an array allocation needs only one time.
Restriction
As well as array, IList<T>
can also be used because its length is known. However, IEnumerable<T>
can not in general.
Dedicated implementation of hash table
Implementations such as Distinct
and GroupBy
use a hash table internally.
There are some points that a hash table can be optimized rather than using these standard implementations, Dictionary<TKey, TValue>
and HashSet<T>
.
Remove
method is not needed forDistinct
andGroupBy
- Hash table algorithm can be much simpler if
Remove
method is not needed
- Hash table algorithm can be much simpler if
- If the length is known, there is no need to resize the temporary buffer
Restriction
There is no particular restriction in this optimization.
Temporary buffer with stackalloc
Temporary buffer in the hash table is used only in the method.
It is a waste to use the managed heap for those whose lifetime is definite. The buffer can be allocated on the stack with stackalloc
operator.
Restriction
stackalloc
operator has the following restrictions.
- Can not allocate too large size
- Stack overflow easily happens
- Most codes in corefx and coreclr allocate memory with
stackalloc
by 1 Kbytes or smaller.
- Can not run over
yield return
andawait
expressions
Value type generics (inlining EqualityComparer
)
A hash table needs IEqualityComparer<T>
to obtain hash values and compare keys.
Here, for example, suppose that there is a comparer such as the following.
public struct StructEquatableComparer<T> : IEqualityComparer<T>
where T : IEquatable<T>
{
public bool Equals(T x, T y) => x.Equals(y);
public int GetHashCode(T obj) => obj.GetHashCode();
}
First, if you call it as follows
public class HashSet<T>
{
IEqualityComparer<T> _comparer;
public bool Contains(T key)
{
...
if (_comparer.Equals(key, bucket.Key))
...
}
}
This has ther following costs.
- having an instance in
_comparer
- The size of the
Dictionary
increases - The comparer instance is boxed because
StructEquatableComparer
is astruct
- The size of the
- Virtual call of interface members
- Virtual call prevent method inlining in general
- Penalty of preventing method inlining is very large especially for primitives such as
int
Therefore, we consider the following implementation.
public class HashSet<T, TComparer>
TComparer : struct, IEqualityComparer<T>
{
public bool Contains(T key)
{
...
if (default(TComparer).Equals(key, bucket.Key))
...
}
}
In .NET generics, when the type parameter is a struct
, generic types are expanded for each actual type argument. As a result, virtual call disappears and the method call can be inlined. The default(TComparer).Equals
call becomes dramatically faster.
Restriction
This optimization does not restrict in terms of types.
However, codes with the optimizaion is very ugly. For example, you have to write HashSet<string, StructEquatableComparer<string>>
(very complicated) for the optimized type, while you can write HashSet<string>
for normal one.
Also, type inference of generic type parameter becomes less effective. For example, Distinct
call is source.Distinct<string, StructEquatableComparer<string>>()
for the optimized one but source.Distinct()
for normal one.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp1.0 was computed. netcoreapp1.1 was computed. netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard1.0 is compatible. netstandard1.1 was computed. netstandard1.2 was computed. netstandard1.3 was computed. netstandard1.4 was computed. netstandard1.5 was computed. netstandard1.6 was computed. netstandard2.0 was computed. netstandard2.1 was computed. |
.NET Framework | net35 is compatible. net40 was computed. net403 was computed. net45 was computed. net451 was computed. net452 was computed. net46 was computed. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen30 was computed. tizen40 was computed. tizen60 was computed. |
Universal Windows Platform | uap was computed. uap10.0 was computed. |
Windows Phone | wp8 was computed. wp81 was computed. wpa81 was computed. |
Windows Store | netcore was computed. netcore45 was computed. netcore451 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 3.5
- No dependencies.
-
.NETStandard 1.0
- NETStandard.Library (>= 1.6.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.