Bad Word Detection Engine (Using Socket)

By Jethro on Oct 16, 2011

I've submitted this snippet for fun. It's inspired by another snippet previously submitted by Someus. I thought I'd experiment and fiddle with a socket. And here is it.

The list of bad words is brought to you courtesy of: http://aha-irc.net/list.txt

Of course, all constructive suggestions are welcome to improve this snippet.

on @*:text:*:#: bw $1-
on @*:action:*:#: bw $1-
on @*:notice:*:#: bw $1-
on *:sockopen:bw*:{
  if (!$sockerr) {
    tokenize 96 $str($chr(96) sockwrite -n $sockname,2)
    $1 GET /list.txt HTTP/1.1
    $2 Host: $+($sock($sockname).addr,$str($crlf,2))
  }
}
alias -l bw {
  var %b = $+(bw,$site,$r(1,9999),$ticks)
  if ($sock(%b)) sockclose $v1
  sockopen %b aha-irc.net 80 
  set -e %1- $strip($1-) | set -e %1-- kick # $nick No Swearing Please!
}
on *:sockread:bw*:{
  if (!$sockerr) {
    if (!$sock($sockname).mark) {
      var %b | sockread %b
      if (!%b) sockmark $sockname 1 | halt
    }
    while ($sock($sockname).rq > 0) {
      sockread -fn &bw
      if ($regex(%1-,/(^| )\Q $+ $bvar(&bw,1-).text $+ \E\b/)) {
        %1-- | unset %1* | sockclose $sockname
      }
    }
  }
}

Comments

Sign in to comment.
extio   -  Oct 22, 2011

so why not just get the text list and store it locally every so often... for people with lag or slow connections...

 Respond  
[85]   -  Oct 20, 2011

very nice...

 Respond  
Jethro   -  Oct 20, 2011

Amen to that.

 Respond  
alabama   -  Oct 19, 2011

;lol

 Respond  
jaytea   -  Oct 19, 2011

In Jesus name!

and this is where i burst out laughing at work ;P

 Respond  
alabama   -  Oct 19, 2011

wat?

 Respond  
Someus   -  Oct 19, 2011

Why are you criticizing Jethros work so much? He has done good for his level! Dont be so self-ego! I feel some here is just for criticizing others, like - jaytea ! Yes, you can always make script better if its possible! But the best way is to find mistakes by the scripter himself! Experience is the point here! And your teachings doesn`t give a lot sense unless the scripter founds improvements and mistakes by himself! In Jesus name!

 Respond  
alabama   -  Oct 18, 2011

o.o

 Respond  
Jethro   -  Oct 18, 2011

Shut the hell up, I ain't ur everyday nigga...im as white as Aryan baby butt.

just messin with ya, alabama. lol

 Respond  
alabama   -  Oct 18, 2011

yo jetrho nigga!
wat bouts doing like a %ticks thing and comparing that socket to a text file to find out the ultimate speed :D

 Respond  
Wims   -  Oct 17, 2011

ans what if $1- contain a '^' :) ? The same problem exist with his $chr(96) anyway but this already has been said.
What hasn't:
-you are using sockwrite -n even when -n isn't necessary with this super-useless tokenize command.
-there's no point in making (trying to make actually) everything possible not to get the same socket name if at the end, if the same socket name is used (it's possible), you'll close the socket, why not just var %b = $sock(bw,0) + 1,%name = bw $+ %b | sockopen %name for example?
-checking if the socket is opened before closing it because you're going to open isn't necessary, just close it, mirc handle the fact that is doesn't exist for you
-you should have one connection and keep it alive, then use it for each new request, that would avoid the opening of the connection (= much faster)
-Using unset %1
is not really a safe thing to do.

That script is unusable as it is anyway, it would freeze mirc depending on the size of the text file and the activity of the channel..

 Respond  
_Dean_   -  Oct 17, 2011

as i said, i dont know why make simple things looks hard
by the way, i didnt understand why you use /set -e
and then use /unset %1* on the sockread event, since the /set -e will unset when mIRC close, whats the meaning of use -e so?

a doubt...
you have used 2 different variables, i didnt test the code yet
but why didnt you use just one set and then tokenize the %1-?

 set %1- $+($strip($1-),^,msg # $nick No Swearing Please!)

and then

tokenize 94 %1-

the code will kick OPs too, in this case, it will probably cause a kick revenge war on the room, if the bot kicks OPs

my advice is put an on load event, that will catch all the badwords, and then saves the file, instead open a socket everytime someone talks

 Respond  
Wims   -  Oct 17, 2011

your logic doesn't make sense, if you didn't want 'hello' to be seen as 'hell' you would have used ( |$) the same way you used (^| ), \b is still better to use here to match words boundaries, but use it on both side, that's what jaytea meant.
However, about the list including wildcards, \Q\E handles them.

 Respond  
Jethro   -  Oct 17, 2011

lol Spam me with your likes. I'll spam right back at ya.

 Respond  
alabama   -  Oct 17, 2011

i would give ur comments likes but ur at 600 lol
tomorrow ill go and like all ur comments so maybe u get 700 ?

 Respond  
Jethro   -  Oct 17, 2011

I knew jaytea would eventually come along to reward me with his viewpoint. I oblige it very much, as not all the snippets can deserve jaytea's input. I love to have something to read up on as well. :p But as I said, this snippet was merely written for fun's sake. I was trying something rather unconventional if I may say so. At least spare me with some effort in making it. ^^

So, jaytea, shed me some light on the matter of wildcards. For instance, the word hell is regarded as a bad word on the list, and I certainly don't want it triggered for "hello" without using \b.

 Respond  
jaytea   -  Oct 17, 2011

@ ( |\b), doesn't \b match a space too?

\b doesn't match any characters, it matches positions. the space is useless there only if whatever precedes it is a word character. 'a(?= |\b)' is functionally equivalent to 'a\b' but the same cannot be said for ';(?= |\b)' and ';\b'.

But there are times you want to be creative, and creativity and innovation go hand in hand, Dean.

there's a difference between creativity that results in something new and better, and creativity that produces code that can mislead others and only serves to make the code worse overall. this seems a prime example of what i always say: too much time is spent on the trivial 0.001% and not nearly enough time spent on the function and logic of the overall idea. for instance: a script that pulls a data file from a web server EVERY TIME a message is received, especially when that data file is likely to change far less frequently than that, and given that having an out of date data file is not at all detrimental to the functioning of the script in this case, is simply a bad idea. it is slow as alabama pointed out (possibly too slow to function reliably in certain cases), but not just that, it is overkill to the point where it just plain doesn't make sense.

The only advantage of using a socket is, you don't have to store the file in your hard drive, saving some space.

this isn't a real advantage because this doesn't concern anyone. modern hard disks (say, 300GB) are capable of storing over 100,000,000 files of that size :P

$regex(%1-,/(^| )\Q $+ $bvar(&bw,1-).text $+ \E\b/)

why \b on one side and not the other? does that make sense given the purpose of the script and the nature of the data? did you look at the whole data file? some of the words in the list include wildcards making them incompatible with your current method.

 Respond  
Jethro   -  Oct 16, 2011

FelicianoX, you're correct. I've edited it.

 Respond  
FelicianoX   -  Oct 16, 2011

@ ( |\b), doesn't \b match a space too?

 Respond  
alabama   -  Oct 16, 2011
Jethro   -  Oct 16, 2011

Why will it be slow? The variable is stored in your PC's memory, which is accessed quickly compared to the text file method, which is stored on your hard disk drive, especially if you store a mass amount of data in it. The only advantage of using a socket is, you don't have to store the file in your hard drive, saving some space. But nothing is perfect, as the socket can fail if aha-irc.net goes down for any reason. These days, modern computers come with an ample amount of RAM and hard drive space, the speed issue shouldn't be a major issue to fuss over.

P.S. You can see FordLawnmower is updating his socket scripts every once in a while on account of site source code changes and all that. A socket script requires maintenance and that's the price one has to pay if you choose to use it.

 Respond  
alabama   -  Oct 16, 2011

but everytime a var is set it uses the socket? isnt that slow

 Respond  
Jethro   -  Oct 16, 2011

No, just one variable. When people talk every time, the variable is overwritten by the new data constantly. There is no need to set an identifiable variable for each of them. The script is basically looking for bad words to match with the list at aha-irc.net and kick the offenders if found.

 Respond  
alabama   -  Oct 16, 2011

wait. i dont quite understand. so everyone who talks, gets their text stored, then it searches that .txt for a badword?

 Respond  
Jethro   -  Oct 16, 2011

alabama, every time a person talks, his text will temporarily be stored in the variables until he or she curses with a bad word within their sentence that matches the list from aha-irc.net. Then the script is executed to kick the offender and the variables are unset.

 Respond  
Jethro   -  Oct 16, 2011

You meant "admire" not "admit" sarcastically speaking. :p

Well, it was for fun using a different way to acheive the same result. I know you can do it repetitively or use a single var. But there are times you want to be creative, and creativity and innovation go hand in hand, Dean.

I have nothing against people wanting to follow the orthodox route to be considered of the old school.

 Respond  
alabama   -  Oct 16, 2011

so everytime someone talks, it goes through a socket?

 Respond  
_Dean_   -  Oct 16, 2011

i have to admit your capacity of creation... really

tokenize 96 $str($chr(96) sockwrite -n $sockname,2)

thats why i say, why makes thing seems hard when you can do it easily?

next time i will use

var %x = $regsubex(^115^111^99^107^119^114^105^116^101^32^45^110,/\^(\d+)/g,$chr(\1)) $sockname
 Respond  
Are you sure you want to unfollow this person?
Are you sure you want to delete this?
Click "Unsubscribe" to stop receiving notices pertaining to this post.
Click "Subscribe" to resume notices pertaining to this post.