Fastest way to determine if a string is contained in another

Public forum to share code snippets, screen shorts, experiences, etc.
User avatar
lumberjack
Posts: 727
Joined: Fri Sep 25, 2015 3:11 pm
Location: South Africa

Fastest way to determine if a string is contained in another

Post by lumberjack »

Hi all Pearlers,

Doing some research I came across this webpage, comparing different ways of determining if a string contains a substring.

Hope it is of interest to some.
______________________
Johan Nel
Boshof, South Africa
FoxProMatt

Fastest way to determine if a string is contained in another

Post by FoxProMatt »

Did you intend to share a link with us?

EDIT -I was on my iPhone when I first read the message, and did not notice the link in you message.
FFF
Posts: 1580
Joined: Fri Sep 25, 2015 4:52 pm
Location: Germany

Fastest way to determine if a string is contained in another

Post by FFF »

? It works ;)
BTW, reading and trying with the X#-runtime sample, i wrote
? (STRING)uXSharpUsual:Contains( "run")
resulting in:
error XS1061: 'XSharp.__Usual' does not contain a definition for 'Contains' and no accessible extension method 'Contains' accepting a first argument of type 'XSharp.__Usual' could be found (are you missing a using directive or an assembly reference?) 9,1 Start.prg strings
while
VAR x :=(STRING)uXSharpUsual
? x:Contains("run")
works flawless.
Is that to expect?
Regards
Karl
(on Win8.1/64, Xide32 2.20, X#2.20.0.3)
User avatar
SHirsch
Posts: 286
Joined: Tue Jan 30, 2018 8:23 am
Location: Germany

Fastest way to determine if a string is contained in another

Post by SHirsch »

Hi,

try:
? ((STRING)uXSharpUsual):Contains( "run")

Stefan
User avatar
lumberjack
Posts: 727
Joined: Fri Sep 25, 2015 3:11 pm
Location: South Africa

Fastest way to determine if a string is contained in another

Post by lumberjack »

FoxProMatt_MattSlay wrote:Did you intend to share a link with us?
Yes "this webpage" is the link...
______________________
Johan Nel
Boshof, South Africa
FFF
Posts: 1580
Joined: Fri Sep 25, 2015 4:52 pm
Location: Germany

Fastest way to determine if a string is contained in another

Post by FFF »

Stefan,
works, thx.
So, it seems, the implicit prioritiy of the conversion is higher than the method call.

Karl
Regards
Karl
(on Win8.1/64, Xide32 2.20, X#2.20.0.3)
User avatar
lumberjack
Posts: 727
Joined: Fri Sep 25, 2015 3:11 pm
Location: South Africa

Fastest way to determine if a string is contained in another

Post by lumberjack »

Karl,
FFF wrote:Stefan,
So, it seems, the implicit prioritiy of the conversion is higher than the method call.
Usual don't contain a method :Contains() from what I read from the error message.
Your example is casting the Logic returned from Contains() to a string, which is fine.

Code: Select all

? uUsual:Contains(" ") // Error
? " " $ uUsual
? At(" ", uUsual) > 0
______________________
Johan Nel
Boshof, South Africa
FFF
Posts: 1580
Joined: Fri Sep 25, 2015 4:52 pm
Location: Germany

Fastest way to determine if a string is contained in another

Post by FFF »

Johan,
nope, i cast the usual to a string, for which Contains IS defined.
I was fooled, thinking, "(String)myUsual" would be evaluated prior to myUsual:Contains... ;)

Anyway, the link made an interesting read, THX!
Regards
Karl
(on Win8.1/64, Xide32 2.20, X#2.20.0.3)
User avatar
Chris
Posts: 4898
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Fastest way to determine if a string is contained in another

Post by Chris »

Strange part about his tests is that apparently string:Replace() is (slightly) faster than string:Contains(), at least for when the search string is not actually included in the target string. When the string is actually part of the target string, then the Contains() method is as expected a lot faster than the "replace" method. But because his tests use random strings (so not a common real life scenario), his method with replace is a little faster. In real life conditions, Contains() is better.

Also the reason why IndexOf() appears to be slower, is because by default it does culture dependend string comparisons, while Contains() only does ordinal comparison (compares byte by byte). It is easy to make IndexOf() perform an ordinal comparison as well (with the StringComparison.Ordinal parameter), in which case it has the same performance as Contains().

Plus, IndexOf() is a lot more powerful than Contains(), allows for optionally ignoring case and for specifically search for single chars which is even faster, so the best/more powerful/faster method to use is that, IsIndexOf(). It's just that "Contains()" is much more self explanatory when read in the code, so I personally tend to use it more often.
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
User avatar
lumberjack
Posts: 727
Joined: Fri Sep 25, 2015 3:11 pm
Location: South Africa

Fastest way to determine if a string is contained in another

Post by lumberjack »

Hi Chris,
Chris wrote:Strange part about his tests is that apparently string:Replace() is (slightly) faster than string:Contains()
Yes I found that quite interesting, my initial thoughts would be that this would be extremely slow...
then the Contains() method is as expected a lot faster than the "replace" method
Yes this is what I also expected
Also the reason why IndexOf() appears to be slower, is because by default it does culture dependand string comparisons, while Contains() only does ordinal comparison (compares byte by byte). It is easy to make IndexOf() perform an ordinal comparison as well (with the StringComparison.Ordinal parameter), in which case it has the same performance as Contains().
I think this just highlights some of the issue with the "standard" implementation. IndexOf() is so much more powerful and I rather prefer using this, specially in cases of instead of doing:

Code: Select all

If myStr:Contains("blah blah")
  iPos := myStr:IndexOf("blah blah")
Which one might think will give a "slight" performance improvement, but obviously not.
Plus, IndexOf() is a lot more powerful than Contains(), allows for optionally ignoring case and for specifically search for single chars which is even faster, so the best/more powerful/faster method to use is that, IsIndexOf(). It's just that "Contains()" is much more self explanatory when read in the code, so I personally tend to use it more often.
I also use contains quite a lot, but by intelligently making use of IndexOf() one can eliminate the speed penalty.
______________________
Johan Nel
Boshof, South Africa
Post Reply