Remove [html] (Like HTML Stripper but with [] instead of <>)

By ProIcons on Jun 06, 2010

I Don't know how to use Regular Expressions and i wanted to remove all Values with [] from outside. Example:

[Quax] LoLen [dasfjals]

$a4([Quax] LoLen [dasfjals]) == LoLen

alias a4 { 
  var %z = 1
  while (%z <= $numtok($1-,91) ) {
    var %b = $iif(%b,$+(%b,$chr(44),[,$gettok($gettok($1-,%z,91),1,93),]),$+([,$gettok($gettok($1-,%z,91),1,93),]))
    inc %z
  }
  var %i = 1
  var %c = $1-
  while (%i <= $numtok(%b,44) ) {
    var %c = $iif(%c,$remove(%c,$gettok(%b,%i,44)),$remove(%b,$gettok(%b,%i,44)))
    inc %i
  }
  return %c
}

Comments

Sign in to comment.
ataraxia   -  Jun 07, 2010

I apologise for misconstruing the efficiency of the expression as I posted it. I also apologise since, it seems from a little bit of further research, that mIRC's handling of inline if statements is grossly less efficient than I'd thought. In any other language, interpreted or not, they're handled with a rather great deal more speed than mIRC.
I was taught that regardless of the implementation of the regular expression, using a regular expression with no regards to the specific content of the text - that is, if the regular expression needs to be invoked at all - should be frowned upon in favour of simpler if statements to check if the complexity of a regular expression is required. Unfortunately, when it comes to mIRC, this method of thinking is less efficient than desired, but presuming the average percentage of lines containing 'BBCode' is under 90%, there's still advantage to be gained with the use of an if statement. Execution speed when only 5% of lines contain 'BBCode' is nearly halved, while when 95% of lines contain 'BBCode' execution speed is slower by 10%. (For reference, the percentage at which the two methods apparently reach equal execution times is 89.7%, although I doubt that the method of testing we're using is accurate to that level of precision.)

And, in case you're curious, the adaption to your speedcheck alias which I used to test this out is:

alias speedcheck {

  ;Play with average percentage of lines containing []s
  var %perc 897
  ;var %perc 500

  var %b = 20000
  !var  %t = $ticks
  while (%b) {
    var %a = $iif($r(1,1000) > %perc,$str(abcd,100),$str(aaa[ $str(b,30) ]ccc,10))
    if ([ isin %a) { !noop $regsubex(%a,/\[[^]]+\]/g,) }
    else { !noop %a }
    !dec %b
  }
  !var %c = $ticks - %t

  var %b = 20000
  !var %t = $ticks
  while (%b) {
    var %a = $iif($r(1,1000) > %perc,$str(abcd,100),$str(aaa[ $str(b,30) ]ccc,10))
    !noop $regsubex(%a,/\[[^]]+\]/g,)
    !dec %b
  }
  !var %d = $ticks - %t

  echo -a First method: %c ms -- Second method: %d ms
}

Anyway, Jaytea, thankyou for giving me the opportunity to learn a little more.

 Respond  
jaytea   -  Jun 07, 2010

you're missing a crucial detail here which is the means whereby .+? stops matching. once an extra character has been consumed, the engine tries to match the rest of the expression, in this case ]. there is extra overhead involved in this approach, and the difference becomes apparent in the following tests:

alias speedcheck {

  ;var %a = $str(abc,100)
  var %a = $str(aaaaa[ $str(b,30) ]cccc,10)
  ;var %a = [[ $str(abc,100) ]]

  var %b = 20000
  !var  %t = $ticks 
  while (%b) { 
    !noop $regsubex(%a,/\[.+?\]/g,) 
    !dec %b 
  } 
  !var %c = $ticks - %t 

  var %b = 20000
  !var %t = $ticks 
  while (%b) { 
    !noop $regsubex(%a,/\[[^]]+\]/g,) 
    !dec %b 
  } 
  !var %d = $ticks - %t

  echo -a First method: %c ms -- Second method: %d ms

}

.+? is a construct that, in general, describes languages that aren't strictly regular. our intuition should tell us that if we can use a regex that can be converted into a DFA (http://en.wikipedia.org/wiki/Deterministic_finite_state_machine) then it should be preferred to one that cannot. as you correctly stated, the byte size of the regex has little to do with its performance

 Respond  
ataraxia   -  Jun 07, 2010

You're missing the point. Using a character class - especially a negated one - creates additional work for the regular expression engine. The simpler the match, the quicker the function. A character class is slower than a wildcard match; The . wildcard is particularly efficient, since the regular expression engine doesn't have to care so much about what it matches.
The byte-size of your regular expression has very little to do with its efficiency.
Strictly speaking, if you wanted to improve the execution speed when dealing with large amounts of text, you'd use something along the lines of
alias a4 return $iif([ isin $1-,$regsubex($1,/[.+?]/g,),$1-)
Since an if/find operation is vastly quicker in execution than a regular expression.
Hope my explanation helps.

 Respond  
Jethro   -  Jun 06, 2010

Darn it, I was right. :/ Gee, thanks for the reconfirmation, Jaytea. I've changed it back to the one I had. I suppose I was mistaken...thinking the less is better.

 Respond  
jaytea   -  Jun 06, 2010

more efficient still is $regsubex($1,/[[^]]+]/g,)

 Respond  
Jethro   -  Jun 06, 2010

Yeah you're right. Yours is slightly more efficient indeed. I've changed them as per your suggestion, making the regsubex one 45 bytes in size.

 Respond  
ataraxia   -  Jun 06, 2010

Your regular expression is long and requires an unnecessary amount of parsing. /[.+?]/g is slightly more efficient.

 Respond  
Jethro   -  Jun 06, 2010

You can also do:

regsubex version:

alias a4 { return $regsubex($1,/\[[^]]+\]/g,) }

regsub version:

alias a4 { var %r | noop $regsub($1,/\[[^]]+\]/g,,%r) | return %r }

.echo -q can be used in place of noop if someone uses an older copy of mirc that doesn't recognize noop command.

 Respond  
Spoofing   -  Jun 06, 2010

yep, it is copied and adapted for []. =)
thanks for the more streamlined version. saved.

 Respond  
jaytea   -  Jun 06, 2010

Spoofing, that looks eerily similar to a challenge entry of mine from a while back (http://forum.swiftirc.net/viewtopic.php?f=35&t=13872&start=32)

we can keep it nice and simple with:

alias nobbcode if (],[) return $($!remove($1, $replace(#$1,[,],],$v1) ),2)

but it's certainly not as general purpose as others since it will mess up if $1 contains members of mirc's syntax

 Respond  
Spoofing   -  Jun 06, 2010

Thats nice, but, example, I like only native mIRC scripting.. no dlls, no regex... =)

Of course, you make it simple:

alias nobb {
  return $remove($1, [ $replace($remove(#$1,$chr(32)),$([,),$(],),$(],),$+($(],),$chr(44),$([,))) ] )
}

$nobb([Quax] LoLen [dasfjals]) = LoLen

=)

 Respond  
Jonesy44   -  Jun 06, 2010
alias a4 return $regsubex($1-,/\[(\/)?[\w\d\s]*\]/gSi,)
 Respond  
Are you sure you want to unfollow this person?
Are you sure you want to delete this?
Click "Unsubscribe" to stop receiving notices pertaining to this post.
Click "Subscribe" to resume notices pertaining to this post.