FastCode
FastCode is an open source programming project that aims to provide optimized runtime library routines for Embarcadero Delphi and C++ Builder. This community-driven project was started in 2003 by Dennis Kjaer Christensen and has since contributed optimized functionality to the 32-bit Delphi runtime library (RTL).
Organized as a competition divided into challenges, FastCode focuses on optimizing specific functions against multiple targets. The project offers benchmarking tools and validation processes for each function contribution. Contributions are scored, with points awarded based on performance against the targets. Embarcadero recognizes and incorporates the code created by the FastCode team into their Delphi codebase. Most participants in this project are assembler developers who utilize processor-specific code. The list of challenges tackled by the FastCode project is extensive; it covers diverse areas ranging from string manipulation functions like PosEx or CompareText to mathematical operations such as Power or Int64Mul.
Structure
[edit]The project is organized as a competition divided into challenges. Each challenge takes one function and optimizes it against a number of targets. The project provides tools for benchmarking and validating each function contribution. One point is given per contribution (maximally one function per target is given points) and ten points are awarded for a target winner. A list with all contributors and their scores is maintained, and at the end of each year, until 2008, a winner was celebrated. Borland, Codegear and Embarcadero, the owners of Delphi and C++ Builder, have historically sponsored prizes.
The majority of participants in the competition are assembler developers who often utilize processor-specific 32-bit code and extra instruction sets, such as MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.[1]
The project enjoys the support of Embarcadero who recognizes the contributions of the FastCode team and incorporates their code into the codebase for Delphi.[2] The default memory manager for Embarcadero Delphi, FastMM4, is the winner of the FastCode Memory Manager challenge.[3]
The project was first hosted on Robert Lee's OptimalCode site, and its source code's home page is [1], last updated in 2008. The source code contains both the enhanced routines and the testing suites to benchmark the routines. In 2017, the benchmark routines for Move, FillChar and the memory manager have been ported for 64-bit, available at [2].
Testing
[edit]The FastCode project puts a lot of effort into testing and focus is on providing very-high-quality software. Testing is split into two categories - testing for correctness and testing for speed.
Validation
[edit]Validation is done on all CPUs from the target set and very often on other CPUs and OSs (Windows XP, Windows Vista, Windows 7 etc.) as well. Validation is done across many different function inputs, both normal usage cases and error usage cases. Validation is done against known correct values and against reference implementations such existing RTL functions.
Benchmarking
[edit]Benchmarking is done on all the CPUs that are part of the current target set at the given time. These CPUs have been or are part of target sets: (Intel Pentium 3, Intel Pentium M, Intel Pentium 4, Intel Core, Intel Core 2, AMD Athlon XP, AMD Opteron, AMD Phenom). Great care has been taken to make the benchmarks stable and realistic. Especially the memory manager challenge benchmark was hard to get fair and memory manager usage logs were made from normal usage of real world applications, and then played back by the benchmark.
Targets
[edit]Testing done on the entire targetset. A new targetset is decided each year from a poll where the FastCode community can vote. A targetset will typically consist of six CPUs, where four are from Intel and two are from AMD. This ratio has been selected to mimic the marketshares. In addition to these six CPU targets there are ten targets defined as a blend of the six CPUs. These ten targets are called computed targets and can be speed only or a combination of speed and size. The maximum allowed instruction set is different for each target. A target could be "IA32 size penalty" or "SSE2". The penalty for size is decided for each challenge by a poll.
List of challenges
[edit]The FastCode project has run the following challenges [3]:
- AES (Advanced Encryption Standard)
- AnsiStringReplace (Replaces occurrences of a substring within a string.)
- ArcCos (Calculates inverse cosinus. Overloaded versions for Single, Double and Extended precision.)
- ArcSin (Calculates inverse sinus. Overloaded versions for Single, Double and Extended precision.)
- Ceil32 (Returns the largest near 32 bit integer number.)
- Ceil64 (Returns the largest near 64 bit integer number.)
- CharPos (Searches for the first occurrence of a Char in a String. It returns the position of this occurrence.)
- CharPosIEx (Case insensitive search for the first occurrence of a Char in a String starting from an index passed as parameter. It returns the position of this occurrence.)
- CharPosEy (Searches for the n'th occurrence of a Char in a string starting from an index passed as parameter. It returns the position of this occurrence.)
- CharPosRev (Searches for the last occurrence of a Char in a String. It returns the position of this occurrence.)
- CompareMem (Compares two blocks of memory.)
- CompareStr (Compares two strings of type AnsiString.)
- CompareText (Compares two strings.)
- Fillchar (Fills out a section of storage Buffer with the same byte or character FillValue FillCount times.)
- Floor32 (Returns the smallest near 32 bit integer number.)
- Floor64 (Returns the smallest near 64 bit integer number.)
- GCD32 (Greatest Common Divisor 32 bit)
- IDCT (Inverse Discrete Cosine Transform)
- Int64Div (Divides two 64 bit integers)
- Int64Mul (Multiplicates two 64 bit integers)
- IntToStr (Converts an integer to a string)
- IsPrime (Tests a 32 bit integer for primality)
- LowerCase (Converts a string to lowercase)
- MaxFP (Returns the maximum of two Single, Double or Extended floating point values)
- MaxInt (Returns the maximum of two integer values)
- MaxInt64 (Returns the maximum of two 64 bit integer values)
- Memory Manager
- MinFP (Returns the minimum of two Single, Double or Extended floating point values)
- MinInt (Returns the minimum of two integer values)
- MinInt64 (Returns the minimum of two 64 bit integer values)
- Move (Copies N bytes from source to destination)
- Polar Complex Number Addition
- Polar Complex Number Subtraction
- Polar Complex Number Multiplication
- Polar Complex Number Division
- Polar To Rectangular Format Conversion
- Pos (Searches for the first occurrence of a substring in a String. It returns the position of this occurrence.)
- PosEx (search for the first occurrence of a substring in a String starting from an index passed as parameter. It returns the position of this occurrence.)
- PosIEx (Case insensitive search for the first occurrence of a substring in a String starting from an index passed as parameter. It returns the position of this occurrence.)
- Power (Returns base raised to exponent)
- Rectangular Complex Number Addition
- Rectangular Complex Number Subtraction
- Rectangular Complex Number Multiplication
- Rectangular Complex Number Division
- Rectangular To Polar Format Conversion
- RGBA To BGRA (Bitmap Format Conversion)
- Round (Bankers rounding on Single, Double or Extended value. Returns 64 bit integer)
- RoundToEx (Rounds a, Extended precision floating-point value to a specified digit or power of ten using "Banker's rounding".)
- Round32 (Bankers rounding on Single, Double or Extended value. Returns 32 bit integer)
- Scale Down (Bitmap Scaling)
- Sort
- StrComp (Compares two null-terminated strings, with case sensitivity)
- StrCopy (Copies one null-terminated string to another)
- StrIComp (Compares two null-terminated strings, without case sensitivity)
- StrLen (returns the length of a zero terminated string)
- StrLComp (Compares two null-terminated strings up to a length, with case sensitivity)
- StrLIComp (Compares two null-terminated strings up to a length, without case sensitivity)
- StrToInt32 (Converts a string to a 32 bit integer)
- Trim (Removes blank and control characters from the start and end of a string)
- TList.Sort
- Trunc (Truncates Single, Double or Extended value. Returns 64 bit integer)
- Trunc32 (Truncates Single, Double or Extended value. Returns 32 bit integer)
- UpperCase (Converts a string to uppercase)
- Val
Contributions to Delphi RTL
[edit]FastCode functions included in the Delphi RTL:
- Delphi 2005: CompareText, Int64Div and FillChar.
- Delphi 2006:[4] CompareText, Int64Div and FillChar, FastMM4 memory manager.[3]
- Delphi 2007—Delphi XE: ArcCos, ArcSin, Power, PosEx, Move, Memory Manager, FillChar, Pos, __lldiv, LowerCase, UpperCase, CompareStr, CompareMem, CompareText, StrLen, StrCopy, StrComp.[5]
The Mastering Delphi books by Marco Cantu contains a chapter about FastCode listing the contributions to the Delphi RTL.[6]
The FastCode Library
[edit]All the challenge winners are included in the FastCode library (http://fastcode.sourceforge.net/challenge_content/rtl_replcmnt_pkg.html). This library is open source, released under the MPL license. The library can be used in two ways: 1) calling functions directly, and 2) using the patching functionality.
When calling functions directly it is entirely up to the application developer to call the version of a function he thinks is fastest.
When using the patching functionality the library automatically detects the CPU type at application load, and uses this information to redirect all function calls to the FastCode winner function for that specific CPU.
FastMM4 memory manager
[edit]The FastMM memory manager used by Delphi and C++ Builder since 2006 is also the winner of a FastCode competition.[7] It replaced the standard memory manager of Delphi and is not only less prone to memory fragmentation, it also provides improved debugging possibilities like being able to report memory leaks when the application is being closed,[8] detecting use after memory release or double releases.
FastMM4 is also used as memory manager for applications developed in Lazarus.[9]
FastMM4 is often listed as a "must have" tool for Delphi developers.[10]
Nexus DB comes with FastMM4 integration for leakchecking.[11]
FastMM usage is documented in "The New Memory Manager In BDS 2006".[12]
Applications using FastCode
[edit]An application developed in Delphi or C++ Builder will typically use the default memory manager which is FastMM4. FastCode functions in the RTL have been selected to be the most commonly used ones and an application will also typically use some of these, especially if any string handling is done. Most Delphi/C++ Builder applications will therefore use code developed by the FastCode project. Some examples are Skype, FL Studio, and Embarcadero’s own RAD Studio. Hallvard's blog describes FastMM4 and why it is being used as the memory manager in "The Online Trader" application.
References
[edit]- ^ "How to Optimize Delphi Application Performance to the Max using FastCode Library". Retrieved 3 September 2015.
- ^ "Nick Hodges". Retrieved 3 September 2015.
- ^ a b "The Oracle at Delphi". Retrieved 3 September 2015.
- ^ Long, Brian & Swart, Bob, "Borland Developer Studio 2006 Reviewed", The Delphi Magazine, Issue 124, December 2005
- ^ "Community contributions improve Delphi 2007 RTL performance". Archived from the original on 6 March 2016. Retrieved 3 September 2015.
- ^ "Mastering Delphi Update for Delphi 2006". Retrieved 3 September 2015.
- ^ Gabrijelcic, Primoz, "To Manage Memory", The Delphi Magazine, Issue 126, February 2006
- ^ "FastMM4". Retrieved 3 September 2015.
- ^ "Lazarus Free Pascal". Retrieved 3 September 2015.
- ^ "Good Tools for Delphi Developers". Delphi Programming. Retrieved 3 September 2015.
- ^ "NexusDB". Retrieved 3 September 2015.
- ^ "The New Memory Manager In BDS 2006 - by Pierre le Riche". Retrieved 3 September 2015.