SoftCircuits.Parsing.Helper
5.2.0
Prefix Reserved
dotnet add package SoftCircuits.Parsing.Helper --version 5.2.0
NuGet\Install-Package SoftCircuits.Parsing.Helper -Version 5.2.0
<PackageReference Include="SoftCircuits.Parsing.Helper" Version="5.2.0" />
paket add SoftCircuits.Parsing.Helper --version 5.2.0
#r "nuget: SoftCircuits.Parsing.Helper, 5.2.0"
// Install SoftCircuits.Parsing.Helper as a Cake Addin #addin nuget:?package=SoftCircuits.Parsing.Helper&version=5.2.0 // Install SoftCircuits.Parsing.Helper as a Cake Tool #tool nuget:?package=SoftCircuits.Parsing.Helper&version=5.2.0
ParsingHelper
Install-Package SoftCircuits.Parsing.Helper
Introduction
ParsingHelper
is a .NET class library that makes it much easier to parse text. The library tracks the current position within the text, ensures your code never accessing characters at an invalid index, and includes many methods that make parsing easier. The library makes your text-parsing code more concise and more robust.
Getting Started
To parse a string, call the ParsingHelper
constructor with the string you want to parse. If the string argument is null
, it will be safely treated as an empty string. The constructor initializes the class instance to parse the given string and sets the current position to the start of that string.
Use the Peek()
method to read the character at the current position without changing the current position. The Peek()
method can optionally accept an integer argument that specifies the character position as the number of characters ahead of the current position. For example, Peek(1)
would return the character that comes after the character at the current position. (Calling Peek()
is equal to calling Peek(0)
.) If the position specified is out of bounds for the current string, Peek()
returns ParsingHelper.NullChar
(equal to '\0'
).
Use the Get()
method to read the character at the current position and then increment the current position to the next character.
You can call the Reset()
method to reset the current position back to the start of the string. The Reset()
method accepts an optional string argument and, if supplied, will configure the class to begin parsing the new string.
The Text
property returns the string being parsed. And the Index
property returns the current position within the string being parsed. Although you would normally use the navigation methods to change the Index
value, you can set it directly. If you set the Index
property an invalid value, it will be adjusted so it is always in the range of 0 to Text.Length
.
The EndOfText
property returns true
when you have reached the end of the text. And the Remaining
property returns the number of characters still to be parsed (calculated as Text.Length - Index
).
ParsingHelper helper = new ParsingHelper("The quick brown fox jumps over the lazy dog.");
char c = helper.Peek(); // Returns 'T'
c = helper.Get(); // Returns 'T'
c = helper.Get(); // Returns 'h'
helper.Reset(); // Returns to start of string
string text = helper.Text; // Returns "The quick brown fox jumps over the lazy dog."
int index = helper.Index; // Returns 0
bool endOfText = helper.EndOfText; // Returns false
int remaining = helper.Remaining; // Returns helper.Text.Length
Navigation
To advance the parser to the next character position, use the Next()
method. This method can also accept an optional argument that specifies the number of characters to advance. For example, if you pass 5
, the current position will be advanced five characters. (Calling Next()
with no arguments is equal to calling Next(1)
.) The argument to Next()
can be a negative value if you want to move backwards.
As an alternative to the Next()
method, ParserHelper
overloads several operators that can be used as a shortcut to change the current position. These are demonstrated in the following example.
helper++; // Same as helper.Next()
helper--; // Same as helper.Next(-1)
helper += 2; // Same as helper.Next(2)
helper -= 2; // Same as helper.Next(-2)
helper = helper + 3; // Same as helper.Next(3)
helper = helper - 3; // Same as helper.Next(-3)
int i = helper; // Same as i = helper.Index
// Safely moves to the end of the text if you add a number that is too large
helper += 1000000;
// Safely moves to the start of the text if you subtract a number that is too large
helper -= 1000000;
This simple example shows how you might print each character in the text being parsed.
while (!helper.EndOfText)
{
Console.WriteLine(helper.Peek());
helper++;
}
Tracking Line and Column Position
For performance reasons, ParsingHelper does not track the current line and column values as it parses. However, you can use the GetLineColumn()
method to calculate the line and column values that corresponds to the current position. This is useful for providing more information when reporting parsing errors back to the end user.
Skipping Over Characters
To skip over a group of characters, you can use the Skip()
method. This method accepts any number of char
arguments (or a char
array). It will advance the current position to the first character that is not one of the arguments.
The following example would skip over any numeric digits.
helper.Skip('1', '2', '3', '4', '5', '6', '7', '8', '9', '0');
The SkipWhile()
method accepts a predicate that specifies when this method should stop skipping. The following example would skip over any characters that are not an equal sign:
helper.SkipWhile(c => c != '=');
A common task when parsing is to skip over any whitespace characters. Use the SkipWhiteSpace()
method to advance the current position to the next character that is not a white space character.
The library has a number of other methods and overloads that support skipping over characters.
Skipping to Characters
In addition to skipping specified characters, the library also provides ways to advance to specified characters.
The SkipTo()
method advances to the next occurrence of the given string.
helper.SkipTo("fox");
This example advances the current position to the start of the next occurrence of "fox"
. If no more occurrences are found, this method advances to the very end of the text and returns false
. The SkipTo()
method supports an optional StringComparison
value to specify how characters should be compared.
The SkipTo()
method is overloaded to also accept any number of char
arguments (or a char
array).
helper.SkipTo('x', 'y', 'z');
This example will advance the current position to the first occurrence of any one of the specified characters. If none of the characters are found, this method advances to the end of the text and returns false
.
Use the SkipToEndOfLine()
to advance the current position to the first character that is a new-line character (i.e., '\r'
or '\n'
). If neither of the characters are found, this method advances to the end of the text and returns false
. Use the SkipToNextLine()
to advance the current position to the first character in the next line. If no next line is found, this method advances to the end of the text and returns false
.
The library also has a number of other methods and overloads for skipping to characters.
Parsing Characters
The ParseWhile()
method accept a predicate that specifies when this method should stop parsing. It works like the SkipWhile()
method except that ParseWhile()
will return the characters that were skipped. (Note that SkipWhile()
is faster and should be used when you do not need the skipped characters.)
The following example will parse all letters starting from the current position.
string token = helper.ParseWhile(char.IsLetter);
The ParseTo()
method parses characters until a delimiter character is found, and returns the characters that were parsed. There are two versions of this method: one takes a param
array of characters that specify the delimiters, and the other accepts a predicate that returns true for characters that are delimiters.
In addition, the library also defines the ParseToken()
method. This method takes a list of delimiters and will skip all characters that are a delimiter, then parse all characters that are not a delimiter and return the parsed characters. Delimiters can be specified as character parameters, a character array, or a predicate that returns true if the given character is a delimiter.
string token;
token = helper.ParseToken(' ', '\t', '\r', '\n');
token = helper.ParseToken(char.IsWhiteSpace);
The library also has a number of other methods and overloads for parsing characters.
Note: Methods that return a string also have an AsSpan
version, which returns a ReadOnlySpan<char>
. Use this version for less memory allocations and better performance.
Parsing Quoted Text
You may have an occassion to parse quoted text. In this case, you will probably want the quoted text (without the quotes). The ParseQuotedText()
method makes this easy.
Call this method with the current position at the first quote character. The method will use the character at the current position to determine what the quote character is. (So the quote character can be any character you choose.)
This method will parse characters until the closing quote is found. If the closing quote is found, it will set the current position to the character after the closing quote and return the text within the quotes. If the closing quote is not found, it will return everything after the starting quote to the end of the string, and will advance the current position to the end of the string.
If ParseQuotedText()
encounters two quote characters together, it will interpret them as a single quote character and not the end of the quoted text. For example, consider the following example:
ParsingHelper helper = new ParsingHelper("One two \"three and \"\"four\"\"!");
helper.MoveTo('"');
string token = helper.ParseQuotedText();
This example would set the token
variable to three and "four"
. The two pairs of quotes are interpreted each as one quote in the text and not the end of the quoted text.
The ParseQuotedText()
method has a second overload that allows you to specify the escape character (including no escape character), whether or not the escape character is included in the result, and whether or not the enclosing quotes are included in the result.
Extracting Text
It is common to want to extract text tokens as you parse them. You can use the Extract()
method to do this. The Extract()
method accepts two integer arguments that specify the 0-based position of the first character to be extracted and the 0-based position of the character that follows the last character to be extracted.
string token = helper.Extract(start, end);
This method is overloaded with a version that only accepts one integer argument. The argument specifies the 0-based position of the first character to be extracted, and this method will extract everything from that position to the end of the text.
Neither of these methods change the current position.
Comparing Text
Finally, you may need to test if a predefined string is equal to the text at the current location. The MatchesCurrentPosition()
method tests this. It accepts a string argument and returns a Boolean value that indicates if the specified string matches the text starting at the current location. The MatchesCurrentPosition()
method supports an optional StringComparison
value to specify how characters should be compared. Note that while this method can be handy, it's less performant than most methods in this class. Any type of search function that works by calling this method at each successive position should be avoided where performance matters.
Examples
Here are a couple of examples to illustrate use of the library.
Parse a Sentence into Words
This example parses a sentence into words. This implementation only considers spaces and periods as word delimiters. But you could easily add more characters, or use the overload of ParsingHelper.ParseTokens()
that accepts a lambda expression.
ParsingHelper helper = new ParsingHelper("The quick brown fox jumps over the lazy dog.");
List<string> words = helper.ParseTokens(' ', '.').ToList();
CollectionAssert.AreEqual(new[] {
"The",
"quick",
"brown",
"fox",
"jumps",
"over",
"the",
"lazy",
"dog" }, words);
Command Line
This example parses a command line. It detects both arguments and flags (arguments preceded with '-'
or '/'
). It's okay with whitespace between the flag character and flag. And any argument or flag that contains whitespace can be enclosed in quotes.
ParsingHelper helper = new ParsingHelper("app -v -f /d-o file1 \"file 2\"");
List<string> arguments = new List<string>();
List<string> flags = new List<string>();
char[] flagCharacters = new char[] { '-', '/' };
string arg;
bool isFlag = false;
while (!helper.EndOfText)
{
// Skip any whitespace
helper.SkipWhiteSpace();
// Is this a flag?
if (flagCharacters.Contains(helper.Peek()))
{
isFlag = true;
// Skip over flag character
helper++;
// Allow whitespace between flag character and flag
helper.SkipWhiteSpace();
}
else isFlag = false;
// Parse item
if (helper.Peek() == '"')
arg = helper.ParseQuotedText();
else
arg = helper.ParseWhile(c => !char.IsWhiteSpace(c) && !flagCharacters.Contains(c));
// Add argument to appropriate collection
if (isFlag)
flags.Add(arg);
else
arguments.Add(arg);
}
CollectionAssert.AreEqual(new[] { "app", "file1", "file 2" }, arguments);
CollectionAssert.AreEqual(new[] { "v", "f", "d", "o" }, flags);
Regular Expressions
This example uses a regular expression to find all the words in a string that start with the letter "J".
ParsingHelper helper = new ParsingHelper("Jim Jack Sally Jennifer Bob Gary Jonathan Bill");
IEnumerable<string> results = helper.ParseTokensRegEx(@"\b[J]\w+");
CollectionAssert.AreEqual(new[]
{
"Jim",
"Jack",
"Jennifer",
"Jonathan"
}, results.ToList());
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages (4)
Showing the top 4 NuGet packages that depend on SoftCircuits.Parsing.Helper:
Package | Downloads |
---|---|
SoftCircuits.ExpressionEvaluator
.NET library that will evaluate a string expression. Expressions can include integer, double and string operands. Operators can include "+", "-", "*", "/", "%", "^" and "&". Supports custom functions and symbols. Provides easy integration with any application. |
|
SoftCircuits.CommandLineParser
Simple and lightweight command-line parser. Supports regular arguments, flag arguments and extended arguments (in the form: -mode:extArg). Arguments, flag arguments, and extended arguments can all be wrapped in quotes in order to include whitespace. Now targets .NET Standard 2.0 or .NET 5.0 and supports nullable reference types. |
|
SoftCircuits.CodeColorizer
Code Colorizer is a .NET class library to convert source code to HTML with syntax coloring. The library is language-agnostic, meaning that the the same code is used for all supported languages. Only the language rules change for each language. Library now targets either .NET 5.0 or .NET Standard 2.0 and supports nullable reference types. |
|
BitMagic.Compiler
X16 Emulator and Debugger |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
5.2.0 | 121 | 12/18/2024 |
5.1.0 | 198 | 3/17/2024 |
5.0.0 | 352 | 7/16/2023 |
4.2.0 | 959 | 10/2/2022 |
4.1.0 | 1,302 | 12/12/2021 |
4.0.1 | 1,397 | 8/29/2021 |
4.0.0 | 538 | 2/19/2021 |
3.0.4 | 423 | 11/29/2020 |
3.0.3 | 396 | 11/24/2020 |
3.0.2 | 995 | 10/26/2020 |
3.0.1 | 386 | 10/26/2020 |
3.0.0 | 442 | 10/25/2020 |
2.4.0 | 502 | 10/25/2020 |
2.3.2 | 410 | 10/24/2020 |
2.3.1 | 448 | 10/23/2020 |
2.3.0 | 799 | 10/23/2020 |
2.2.0 | 419 | 10/21/2020 |
2.1.1 | 487 | 10/21/2020 |
2.1.0 | 492 | 10/18/2020 |
2.0.2 | 467 | 7/3/2020 |
2.0.1 | 511 | 6/28/2020 |
2.0.0 | 455 | 6/22/2020 |
1.2.0 | 466 | 6/7/2020 |
1.0.17 | 496 | 5/23/2020 |
1.0.16 | 460 | 4/15/2020 |
1.0.15 | 485 | 4/3/2020 |
1.0.14 | 521 | 3/31/2020 |
1.0.13 | 491 | 3/24/2020 |
1.0.12 | 483 | 3/16/2020 |
1.0.11 | 472 | 3/13/2020 |
1.0.10 | 9,793 | 2/27/2020 |
1.0.9 | 506 | 2/26/2020 |
1.0.8 | 533 | 2/26/2020 |
1.0.7 | 472 | 2/24/2020 |
1.0.6 | 467 | 2/23/2020 |
1.0.5 | 585 | 2/16/2020 |
1.0.4 | 455 | 2/14/2020 |
1.0.3 | 461 | 2/14/2020 |
1.0.2 | 533 | 12/26/2019 |
1.0.1 | 520 | 12/11/2019 |
1.0.0 | 534 | 12/11/2019 |
Added direct support for .NET 9.0; Code clean up.