Remove [html] (Like HTML Stripper but with [] instead of <>)

By ProIcons on Jun 06, 2010

I Don't know how to use Regular Expressions and i wanted to remove all Values with [] from outside. Example:

[Quax] LoLen [dasfjals]

$a4([Quax] LoLen [dasfjals]) == LoLen

alias a4 { 
  var %z = 1
  while (%z <= $numtok($1-,91) ) {
    var %b = $iif(%b,$+(%b,$chr(44),[,$gettok($gettok($1-,%z,91),1,93),]),$+([,$gettok($gettok($1-,%z,91),1,93),]))
    inc %z
  }
  var %i = 1
  var %c = $1-
  while (%i <= $numtok(%b,44) ) {
    var %c = $iif(%c,$remove(%c,$gettok(%b,%i,44)),$remove(%b,$gettok(%b,%i,44)))
    inc %i
  }
  return %c
}

Like Page Follow Author

Comments

ataraxia - Jun 07, 2010

I apologise for misconstruing the efficiency of the expression as I posted it. I also apologise since, it seems from a little bit of further research, that mIRC's handling of inline if statements is grossly less efficient than I'd thought. In any other language, interpreted or not, they're handled with a rather great deal more speed than mIRC.
I was taught that regardless of the implementation of the regular expression, using a regular expression with no regards to the specific content of the text - that is, if the regular expression needs to be invoked at all - should be frowned upon in favour of simpler if statements to check if the complexity of a regular expression is required. Unfortunately, when it comes to mIRC, this method of thinking is less efficient than desired, but presuming the average percentage of lines containing 'BBCode' is under 90%, there's still advantage to be gained with the use of an if statement. Execution speed when only 5% of lines contain 'BBCode' is nearly halved, while when 95% of lines contain 'BBCode' execution speed is slower by 10%. (For reference, the percentage at which the two methods apparently reach equal execution times is 89.7%, although I doubt that the method of testing we're using is accurate to that level of precision.)

And, in case you're curious, the adaption to your speedcheck alias which I used to test this out is:

alias speedcheck {

  ;Play with average percentage of lines containing []s
  var %perc 897
  ;var %perc 500

  var %b = 20000
  !var  %t = $ticks
  while (%b) {
    var %a = $iif($r(1,1000) > %perc,$str(abcd,100),$str(aaa[ $str(b,30) ]ccc,10))
    if ([ isin %a) { !noop $regsubex(%a,/\[[^]]+\]/g,) }
    else { !noop %a }
    !dec %b
  }
  !var %c = $ticks - %t

  var %b = 20000
  !var %t = $ticks
  while (%b) {
    var %a = $iif($r(1,1000) > %perc,$str(abcd,100),$str(aaa[ $str(b,30) ]ccc,10))
    !noop $regsubex(%a,/\[[^]]+\]/g,)
    !dec %b
  }
  !var %d = $ticks - %t

  echo -a First method: %c ms -- Second method: %d ms
}

Anyway, Jaytea, thankyou for giving me the opportunity to learn a little more.

Respond

jaytea - Jun 07, 2010

you're missing a crucial detail here which is the means whereby .+? stops matching. once an extra character has been consumed, the engine tries to match the rest of the expression, in this case ]. there is extra overhead involved in this approach, and the difference becomes apparent in the following tests:

alias speedcheck {

  ;var %a = $str(abc,100)
  var %a = $str(aaaaa[ $str(b,30) ]cccc,10)
  ;var %a = [[ $str(abc,100) ]]

  var %b = 20000
  !var  %t = $ticks 
  while (%b) { 
    !noop $regsubex(%a,/\[.+?\]/g,) 
    !dec %b 
  } 
  !var %c = $ticks - %t 

  var %b = 20000
  !var %t = $ticks 
  while (%b) { 
    !noop $regsubex(%a,/\[[^]]+\]/g,) 
    !dec %b 
  } 
  !var %d = $ticks - %t

  echo -a First method: %c ms -- Second method: %d ms

}

.+? is a construct that, in general, describes languages that aren't strictly regular. our intuition should tell us that if we can use a regex that can be converted into a DFA (http://en.wikipedia.org/wiki/Deterministic_finite_state_machine) then it should be preferred to one that cannot. as you correctly stated, the byte size of the regex has little to do with its performance

Respond

ataraxia - Jun 07, 2010

You're missing the point. Using a character class - especially a negated one - creates additional work for the regular expression engine. The simpler the match, the quicker the function. A character class is slower than a wildcard match; The . wildcard is particularly efficient, since the regular expression engine doesn't have to care so much about what it matches.
The byte-size of your regular expression has very little to do with its efficiency.
Strictly speaking, if you wanted to improve the execution speed when dealing with large amounts of text, you'd use something along the lines of
alias a4 return $iif([ isin $1-,$regsubex($1,/[.+?]/g,),$1-)
Since an if/find operation is vastly quicker in execution than a regular expression.
Hope my explanation helps.

Respond

Jethro - Jun 06, 2010

Darn it, I was right. :/ Gee, thanks for the reconfirmation, Jaytea. I've changed it back to the one I had. I suppose I was mistaken...thinking the less is better.

Respond

jaytea - Jun 06, 2010

more efficient still is $regsubex($1,/[[^]]+]/g,)

Respond

Jethro - Jun 06, 2010

Yeah you're right. Yours is slightly more efficient indeed. I've changed them as per your suggestion, making the regsubex one 45 bytes in size.

Respond

ataraxia - Jun 06, 2010

Your regular expression is long and requires an unnecessary amount of parsing. /[.+?]/g is slightly more efficient.

Respond

Jethro - Jun 06, 2010

You can also do:

regsubex version:

alias a4 { return $regsubex($1,/\[[^]]+\]/g,) }

regsub version:

alias a4 { var %r | noop $regsub($1,/\[[^]]+\]/g,,%r) | return %r }

.echo -q can be used in place of noop if someone uses an older copy of mirc that doesn't recognize noop command.

Respond

Spoofing - Jun 06, 2010

yep, it is copied and adapted for []. =)
thanks for the more streamlined version. saved.

Respond

jaytea - Jun 06, 2010

Spoofing, that looks eerily similar to a challenge entry of mine from a while back (http://forum.swiftirc.net/viewtopic.php?f=35&t=13872&start=32)

we can keep it nice and simple with:

alias nobbcode if (],[) return $($!remove($1, $replace(#$1,[,],],$v1) ),2)

but it's certainly not as general purpose as others since it will mess up if $1 contains members of mirc's syntax

Respond

Spoofing - Jun 06, 2010

Thats nice, but, example, I like only native mIRC scripting.. no dlls, no regex... =)

Of course, you make it simple:

alias nobb {
  return $remove($1, [ $replace($remove(#$1,$chr(32)),$([,),$(],),$(],),$+($(],),$chr(44),$([,))) ] )
}

$nobb([Quax] LoLen [dasfjals]) = LoLen

Respond

Jonesy44 - Jun 06, 2010

alias a4 return $regsubex($1-,/\[(\/)?[\w\d\s]*\]/gSi,)

Respond