Page 1 of 1
StringBuilder performance
Posted: Thu Mar 12, 2020 4:05 pm
by wriedmann
Hi all interested people,
please see this code:
Code: Select all
cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
cBuffer := cBuffer + oTag:DebugString( 1 )
foreach oPosition as PlanPosition in _oPlanPositionen
cBuffer := cBuffer + oPosition:DebugString( 1 )
next
next
cBuffer := cBuffer + DateTime.Now:ToString()
In my application this code creates a text file with over 95.000 lines.
The code takes a lot of time (5 minutes 36 seconds) and uses an entire processor core.
A simple optimization makes it behave better:
Code: Select all
cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
cBuffer := cBuffer + oTag:DebugString( 1 )
cPosition := ""
foreach oPosition as PlanPosition in _oPlanPositionen
cPosition := cPosition + oPosition:DebugString( 1 )
next
cBuffer := cBuffer + cPosition
next
cBuffer := cBuffer + DateTime.Now:ToString()
The only change is that instead of adding every substring to the main buffer there is an intermediate buffer.
This reduces the needed time to about 4 seconds!!!
But the use of the StringBuilder class makes the code again perform faster:
Code: Select all
oSB := StringBuilder{}
oSB:AppendLine( DateTime.Now:ToString() )
foreach oTag as PlanTag in _oPlanTage
oSB:Append( oTag:DebugString( 1 ) )
foreach oPosition as PlanPosition in _oPlanPositionen
oSB:Append( oPosition:DebugString( 1 ) )
next
next
oSB:AppendLine( DateTime.Now:ToString() )
cBuffer := oSB:ToString()
The code now takes only 2 seconds!
Wolfgang
P.S. in VO you can see similar differences, but there is no StringBuilder class available.
StringBuilder performance
Posted: Thu Mar 12, 2020 4:31 pm
by Chris
Hi Wolfgang,
Very good sample!
Furthermore, if you know in advance the size (more or less) of the final string, then specify this in the constructor of the StringBuilder object, this will make sure that its internal buffer will only allocated once (instead of dozens of times if you do not specify a starting size), which will further improve performance.
Also, if you do this very often in your app, then it's also a good idea to always (re)use a single StringBuilder object, instead of creating a new one every time. Just reset to zero string size after you are done with it (with oSB:Length := 0), this will keep the internal buffer intact, which will prevent any further memory allocation when you generate new text in the string builder. Only further memory allocation will happen when converting it to a normal string.
StringBuilder performance
Posted: Thu Mar 12, 2020 6:47 pm
by wriedmann
Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
This is the relative VO-Code:
Code: Select all
class StringBuilder
protect _aElements as array
declare method Append
declare method GetString
method Init() class StringBuilder
_aElements := {}
return self
method Append( cString as string ) as void pascal class StringBuilder
AAdd( _aElements, cString )
return
method GetString() as string pascal class StringBuilder
local ptrResult as byte ptr
local ptrTemp as byte ptr
local nLen as dword
local nI as dword
local nBufLen as dword
local nTotalLen as dword
local nIndex as dword
local cBuffer as string
local cResult as string
nLen := ALen( _aElements )
nBufLen := 0
for nI := 1 upto nLen
cBuffer := _aElements[nI]
nTotalLen := nTotalLen + SLen( cBuffer )
next
if nTotalLen == 0
cResult := ""
else
ptrResult := MemAlloc( nTotalLen )
if ptrResult == null_ptr
_Break( "memory allocation error - failed to allocate " + NTrim( nTotalLen ) + " bytes" )
endif
ptrTemp := ptrResult
nIndex := 0
for nI := 1 upto nLen
cBuffer := _aElements[nI]
nBufLen := SLen( cBuffer )
MemCopyString( ptrTemp, cBuffer, nBufLen )
ptrTemp := ptrTemp + nBufLen
next
cResult := Mem2String( ptrResult, nTotalLen )
MemFree( ptrResult )
endif
return cResult
I'm pretty sure this code can be enhanced, but after the first checks I decided to to put more time in this class.
Wolfgang
StringBuilder performance
Posted: Thu Mar 12, 2020 8:18 pm
by Jamal
Hi Wolfgang,
While you are at it, just wondering if you create X# or C# COM object and initialize a StringBuilder object like Chris suggested, then use it in a similar fashion, what would the performance be beyond the initial COM object call.
Jamal
StringBuilder performance
Posted: Fri Mar 13, 2020 5:45 am
by wriedmann
Hi Jamal,
I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Wolfgang
StringBuilder performance
Posted: Fri Mar 13, 2020 8:30 am
by Serggio
You're welcome (see the attachment)
StringBuilder performance
Posted: Sat Mar 14, 2020 8:42 am
by Karl-Heinz
wriedmann wrote:Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
Hi Wolfgang,
i agree, even when i use static memory only i see no speed advantages. Maybe i overlooked something, but when i compare the results of your stringbuilder with mine the speed differences are not that much as i would expect.
Code: Select all
CLASS StringbuilderMem INHERIT Vobject
PROTECT _ptrValue AS BYTE PTR
PROTECT _dwCurrentPos AS DWORD
PROTECT _dwStep := 2000 AS DWORD
DECLARE METHOD Append
DECLARE METHOD GetString
METHOD Append ( cValue AS STRING ) AS VOID PASCAL CLASS StringbuilderMem
LOCAL dwLen AS DWORD
dwLen := SLen ( cValue )
IF dwLen > 0
IF _dwCurrentPos + dwLen > MemLen ( _ptrValue )
// ? "MemRealloc" , MemLen ( _ptrValue ) , dwLen , _dwCurrentPos
_ptrValue := MemRealloc ( _ptrValue , MemLen (_ptrValue ) + _dwStep )
ENDIF
MemCopyString ( PTR ( _CAST , _ptrValue + _dwCurrentPos ) , cValue , dwLen )
_dwCurrentPos += dwLen
ENDIF
RETURN
METHOD Destroy() CLASS StringbuilderMem
UnRegisterAxit(SELF)
IF _ptrValue != NULL_PTR
MemFree ( _ptrValue )
ENDIF
RETURN NIL
METHOD GetString() AS STRING PASCAL CLASS StringbuilderMem
IF _ptrValue == NULL_PTR .OR. _dwCurrentPos == 0
RETURN NULL_STRING
ENDIF
RETURN Mem2String ( _ptrValue , _dwCurrentPos )
METHOD Init( nCapacity ) CLASS StringbuilderMem
Default (@nCapacity, _dwStep )
_ptrValue := MemAlloc ( nCapacity )
_dwStep := nCapacity
RegisterAxit ( SELF )
RETURN SELF
regards
Karl-Heinz
StringBuilder performance
Posted: Sat Mar 14, 2020 3:50 pm
by ArneOrtlinghaus
I have also made the experience that often repeated string operations with strings for 1000 characters and more get very expensive. In VO already many years ago I made a class similar to stringbuilder to use memalloc functions for avoiding triggering the garbage collector and there was a huge difference in speed. Now with X# it is very similar: the dynamic memory can get cost intensive. Making tests with a performance profiler show that much time goes into treating strings, even if fully strong typed.
StringBuilder performance
Posted: Mon Mar 16, 2020 3:23 pm
by mainhatten
wriedmann wrote:I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Hi Wolfgang,
gut reaction hints at following my second programming mantra: "Chunky, not Chatty" when it comes to calling across layers, as such layers sometimes have realistc physical borders - in this case the marshalling code. I am pretty certain that your first example done across COM into Stringbuilder, would be slower - at least at first / for strings not really large. The second example, first concatenating lots of tiny strings into intermediate, then doing 1 large append - there the benefit of not tasking memory managment with large discarded memory areas might be better as target string is in multi-megabyte range.
In vfp we have similar issues, typically when string sizes rize above 10K and memory allotment is set for small VM. Typical response is similar to your second way (as we have no StringBuilder type), although often with the twist of not only adding small strings into 1 string, but a small array of strings, which then can be concatenated in 1 line
Code: Select all
laTmp = "" && setting all elements to 1 start value is nice in this context
for lnRun = 1 to 7
*-- build laTmps
next
lcLargeString = lcLargeString + laTmp[1] + laTmp[2] + laTmp[3] + laTmp[4] + laTmp[5] + laTmp[6] + laTmp[7]
as the slow part is not the concat of one or more strings, but the release of previous var, claiming new memory and assigning the total of right side of the line. Can be seen by measuring: as lcLargestring grows, adding strings of identical length gets slower as lcLargeString grows.
But easiest way (even if going against "RAM is always faster" reflex) is to open a buffered low level file and just appending the new strings until result is finished. If needed loading them once with FileToStr() for further processing is often faster than always memcpying it around in process space, as all internal memory allotment and garbage collection is sidestepped until final load.
Unixoid behaviour makes sense there and is even easier to code and read. (Noticed that xSharp does not differentiate between buffered or unbufferef LLF, but as buffered was/is vfp default behaviour, probably xSharp LLF implementation defaults to buffered as well. Question already raised on GIT)
That was true on old HD, and SSD improved write throughput as well.
regards
thomas