Why No register ?

spam

on
December 14, 2007 - 10:37am

I have temporarily disabled user creation due to a recent deluge of abuse by spammers. I am working on a solution. Things are slower going than normal, as I'm currently on vacation through the end of the year.

Sorry for the inconvenience.

re-enabled

on
April 30, 2008 - 6:49pm

User accounts are finally re-enabled.

what was changed to get rid of "abuse by spammers"?

Tomasz Chmielewski (not verified)
on
May 5, 2008 - 4:28am

What was changed to get rid of "abuse by spammers"?

Still, some things need to be changed, at least in forums (captcha? report as spam button?) - see this spam: kerneltrap.org/node/16105

new spam filter

on
May 5, 2008 - 9:01am

I've upgraded to the alpha version of my 3.x Drupal spam module. I modified the Bayesian filter a little, so I'm currently re-training it, and a few are still slipping through. The new filter is working well, and is easy to improve as spammers change their tactics.

3.x Drupal spam module

on
May 5, 2008 - 6:56pm

I've got spam in blog. I'm happy that at least spam bots are "reading" that stuff, but there's no "mark as spam" button any more.

> 3.x Drupal spam module

I see request for support there. While, i'm not a php or other web coder, let me share an idea.

Near [submit] button you are generating N random strings, which are visually alternated with CSS, and user is asked to include <!-- visible string --> in post.

This random "ticket" works only once, thus as many strings are placed and CSS-ed away (by fgcolor=bgcolor, display:none, etc), less chances to get spam posted.

When spam will become smarter, then we will have other nice CSS-supporting web browser :)
____

smtp

on
May 5, 2008 - 7:02pm

i've had other ideas about SMTP and spam in LKML and debian lists, but was flamed. A bit more order and culture for users/posters isn't the right way(tm) there.

All that CPU-sucking Bayesian and spamassasin stuff still sucks on the job, letting spam there (everywhere).
____

the most simple example

on
May 6, 2008 - 4:21am

The most simple example of using usual and somewhat useless after few previews content of web page:
http://kerneltrap.org/node/16107#comment-302738
____

spam

on
May 6, 2008 - 8:49am

Send me links to the spam in your blog if I've somehow missed them. At this time there's no "report as spam" link available, as I've not had time to implement it. I hope to find the time soon.

Regarding Captcha's, I seriously do not like them, and as such I don't use them on this website.

I'm still training my new filters -- overall (including training) they've had over 98.7% accuracy, and in the past few days they've been up to about 99.5% accuracy. What spam does slip through I try and clean up within 12 hours, further training the filters.

It's alright, there was just

on
May 6, 2008 - 11:39am

It's alright, there was just one.

As of captcha, i think [mark as spam] button is kind doing the same thing, but after the fact: struggling consequences, not causes.

Maybe there's a way to visually disable with CSS multiple [Post comment] buttons, so hitting/using just one available (with random option or something) will automatically send correct form, which will be checked?

If javascript is available, then simple check of keyboard or mouse activity is look good also. (but i use `lynx` and disable otherwise useless javascript quite frequently.)

"/files/css/e07d16f75ead26d32750f40a613edb4d.css"

i see, that some kind of "random" CSS is already included.
____

spam filtering

on
May 6, 2008 - 2:01pm

"As of captcha, i think [mark as spam] button is kind doing the same thing, but after the fact: struggling consequences, not causes."

The intention of the "report as spam" buttons is to allow KernelTrap readers to report if spam slips through my spam filters. My filters then learn from this spam, and hopefully block similar posts in the future.

These links are nothing like captchas, as they are opt-in -- you choose if you want to participate in the spam prevention effort. If you don't care about spam, you don't have to participate -- you simply enter your comments and/or forum posts, and away you go.

I know that my filters will never be 100% accurate, and that some spam will always slip through, but they do they heavy lifting for me, and make it possible for me to allow anonymous posting w/o captchas.

Alright then. I've put (in

on
May 6, 2008 - 2:46pm

Alright then.

I've put (in blog) all CSS-possible ways, i can see to do it in non-captcha way. I'd glad to help with developent of design and testing, if it is usable (i don't do php and such).
____

captcha

on
May 6, 2008 - 10:36pm

The ideas you're describing in your blog are just captchas in disguise as far as I see -- I don't want to have to solve a puzzle just to post a comment, and I don't want to require this from anyone else, either. I want to type out my comment and hit submit, nice and simple. That is what my spam module is all about.

> The ideas you're

on
May 7, 2008 - 11:19am

> The ideas you're describing in your blog are just captchas

Well, if hiding all but one correct button is a captcha, so be it.

But this thing is non usable only without CSS support. Why having more input buttons that are hidden in various ways isn't a good front-end for filters to learn, for example?

The output of hidden buttons can be a loophole for automatic bots with neverending requests for input or just big writing delays.

(sidenote: sometimes i feel myself too stupid with web; not when i'm being asked to fill captcha when in `lynx`, but when i'm being told to download flash player to see some news videos.

Player is there, but javascript is switched off. Anyway i just hit: ctrl+u, ctrl+f "flv", select+copy filename, run mplayer. Even with a player things are too inflexible. So i just go to browser's cache and run mplayer on cached copy of file, thus i have all volume/speed/other tunning buttons right there.

If somebody don't know: make file read-only and you will have it for your own. It will sometimes save you another login/registration time for "download" capability, will prevent from installing useless download software, will enable you to have a "backup" copy if content have no official "download". Sometimes design flaws of new tech are quite useful.:)
____

Yet another spam

Tomasz Chmielewski (not verified)
on
May 8, 2008 - 6:04am

Yet another spam -> kerneltrap.org/node/5515

I've had two in blog right

on
May 8, 2008 - 11:50am

I've had two in blog right after i've posted "why they have such simple life without visually hidden random input buttons in preview, post stage etc." :)

I can go further with imagination

on
May 8, 2008 - 9:08pm

I'm opted as non-spammer and i take this with very big responsibility. So, let me propose this for funny start:

provide stdin/stdout-only (jail) facility for `sed`, configured via user settings, where i can:

* set number of `sed`s in the pipe:
comment_web ==> | sed "$S1" | sed "$S2" | sed "$S3" | ... | ==> web_output
* set individual scripts for each

jail must have no open() and exec*() (and all other security-breaking) calls, because `sed` can read and write files and `chroot` is available only for root (for some crazy reason).

What i will do, is inserting style tags and wrap as more space with <input css=hidden> traps as possible. One but: there must be random string and/or number to peek somewhere in html, so script have no static, easy-to-avoid output. This info is used and stripped in output, of course.

And then let's see which blog is spammed more. If somebody don't like my captchas or CSS games, they will not post comments (never saw human replies there, but anyway). Don't say site runs on msft products or you have no standard `sed` which in turn can be statically linked for easy in-jail run. This is real fun with web!

:D

Who is the hacker after that? xml-php-java*-web2.0 or what?
--
sed 'sed && sh + olecom = love' << ''
-o--=O`C
 #oo'L O
<___=E M

restricted sed

on
May 8, 2008 - 9:46pm

BTW, technically restricted `sed` can be organized easily: you strip out all open+exec functionality form sources (by same `sed`), build it statically and call it `rsed`.

:)

next question if you can trust to all that over-complicated RE codebase to do not do stupid stack overflows and other explointing "fun". Oh, gee...

ou, this one must be called rsed.bash or rsed.sh at least.
http://freshmeat.net/projects/rsed/

simple wrapper is the whole project...
____

As far as I can tell the

on
May 18, 2008 - 6:08pm

As far as I can tell the filters are now blocking 99.5% of legitimate comments as well. I have tried posting several comments recently (without having bothered to log in) and got blocked by the spam filter every time.

Seriously, I'd rather fill in a CAPTCHA than deal with this "you're blocked and there's nothing you can do about it" approach.

Here's some spam that slipped through:
* http://kerneltrap.org/node/16160
* http://kerneltrap.org/node/16159
* http://kerneltrap.org/node/16158
* http://kerneltrap.org/node/16157

I wonder, why filters can't

on
May 18, 2008 - 7:29pm

I wonder, why filters can't get spam mark on trivial patterns of phpBB urls, which are 99% of spam i've seen so far here.

tuning

on
May 19, 2008 - 11:24am

"I wonder, why filters can't get spam mark on trivial patterns of phpBB urls,"

The filter does match on those types of patterns, and does heavily weight them as probable spam -- indeed, that's the vast majority of what the filters are already blocking of the 1 spam comment a minute 24/7 currently hitting these web pages. But, the filters look at much more than that, and so occasionally some still slip through. (I intentionally have not enabled any "instant-death" type logic yet, while I continue tuning the other logic -- that will come next, at which point these urls will never slip through.)

When I switched to the new filters, I made a choice to throw away all the previous tuning that had been learned by the old filters, and as such the filters are having to be retrained. Perhaps this was a mistake, certainly it proves a little more annoying, but it allows me to fully analyze how well the updated Bayesian logic is working and to continue tuning it for optimal matching.

filters

on
May 19, 2008 - 11:19am

"As far as I can tell the filters are now blocking 99.5% of legitimate comments as well."

97.8% of all statistics are made up, right on the spot (and amazingly, 83.4% of the people believe them, whether they're accurate statistics or not).

"I have tried posting several comments recently (without having bothered to log in) and got blocked by the spam filter every time."

A few things to realize:

  • KernelTrap is a hobby website -- I maintain it in my spare time because I enjoy it, no other reason.
  • I recently rewrote my spam filters from the ground up to address some serious shortcomings in the previous version -- I got about 60% of the way done and then deployed them on KernelTrap with the intention of continuing to develop and debug them with real world testing.
  • Unfortunately I then became too busy with other aspects of my life to have time to focus on this particular hobby, and us such we've been running far longer than intended with very suboptimal filtering logic in place.
  • From a quick analysis of the logs, 99.62% of what has been blocked in the past 14 days was actually spam, with an average of one spam comment posted every minute, 24 hours a day 7 days a week.
  • It is not in any way acceptable to me that 0.38% of the postings being blocked are valid content -- I have every intention of fixing the filters to bring this number ever closer to 0, as user contributed comments are a massively important part of this community.
  • Yesterday while some less-than-thrilling spam slipped through the filters, I was flying through the sky on a jet plane, returning for Ottawa and working on the KernelTrap spam filters, addressing some known shortcomings.

"Seriously, I'd rather fill in a CAPTCHA than deal with this 'you're blocked and there's nothing you can do about it' approach."

The day you have to fill out a CAPTCHA to leave a comment on KernelTrap is the day that I've lost interest in this hobby and moved on to something else. I detest the things. Yes, I know the current state of the filters is equally or even more obnoxious, but it's a very temporary state until I have time to fix them.

"Here's some spam that slipped through:"

I read every single forum posting, blog posting and comment posted to these pages -- posting links to mind-numbingly obvious spam is not especially helpful. If you find spam that is well hidden (ie, it looks like an applicable comment, but contains a link to a spam website, etc), it can be very helpful to point that at. Otherwise, you'll simply have to wait for my airplane to land, or for me to wake up after a night's sleep before I'll clean up any spam that's slipped through.

The spam filters used by KernelTrap are open source and available here, if you're interested in helping to improve them. Otherwise, it'll just take a little more time.

I still can't get, why

on
May 19, 2008 - 1:11pm

I still can't get, why simple randomizing in prewiew + post forms with applied hidden traps (CSS or tag soup, etc) is not possible technically. This is not captcha, isn't it?
____

Yes it is

on
May 19, 2008 - 1:25pm

CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart".

Technically, visibility hacks also fit under this definition. Obfuscated images are simply the most popular (or at least the most user-visible) form of CAPTCHAs.

Yes. I propose

on
May 19, 2008 - 2:48pm

Yes. I propose "parser+randomizer" vs "parser" thing and (actually more complicated parser + engine) CSS hide or visual effects in last turn.

IMHO images are to heavy (and bad for real users) against obviously simple brute force script. I use `lynx` and apart form that, i cannot figure out some captcha images when in graphic :(
____

because...

on
May 19, 2008 - 1:45pm

"why simple randomizing in prewiew + post forms with applied hidden traps (CSS or tag soup, etc) is not possible technically"

Of course it's possible technically, and you've even described a way to implement it in your blog. It's just that I don't see it buying much, at least not for long -- it would be easy to program around, and could potentially be very confusing to anyone reading the website with an older browser or with some browser functionality disabled.

And at the end of the day, I get far more pleasure out of improving my Bayesian filter, creative pattern matching, and things like that.

Oh man! I use `lynx` and

on
May 19, 2008 - 2:31pm

Oh man! I use `lynx` and know what it is, i use `shell && nc && sed` for browsing and pure text console for watching useful web flash content.

That's why i mentioned CSS *and* html tag soup, because it's not easy to figure out

a='[...soup...]
<script class="random123" id="random-IFGNKG">
[...soup...]
trap-post-web-from-trap-rnd1
[...soup...]
trap-post-web-from-trap-rndN
[...soup...]
</script>
[...soup...]'

content
`a rnd=get_rnd()`
content
`a rnd=get_rnd()`
`a rnd=get_rnd()`
content
con`a rnd=get_rnd()`tent
...

where 'script' or any other invisible tag can be used. I bet even simple unmatched HTML comments and dummy javasscript with var = "-->" (+ text randomization) can fool and catch+ban in one hit all naive script kiddies.

Template for this site is too simple. You know, it's like ASR (address space randomization), which cannot be implemented and deployed for decades, thus stupid "stack-overflow + _known_ libc/app address" exploits are still alive (and will be so, dammit!). Even brute force is easy, as long as no audit of SEGV crashes is done.

The only reason i didn't came up with "the patch" (LKML-way), is that, i don't run any web server anymore, i don't do PHP, i myself, actually, don't care.

97.8% of all statistics are

on
May 19, 2008 - 2:08pm

97.8% of all statistics are made up, right on the spot

This was the point; the 99.5% figure was a reference to your earlier "they've been up to about 99.5% accuracy" quote, to turn your attention to the fact that the filters are not only blocking spam; it was never meant as a serious figure. I find false positives more significant than false negatives; I think you should weigh them differently when making statistics.

The day you have to fill out a CAPTCHA to leave a comment on KernelTrap is the day that I've lost interest in this hobby and moved on to something else

Why are you so hell bent on CAPTCHAs? (I tried searching Google, but I could not find any rants on this topic by you)

ok

on
May 19, 2008 - 3:01pm

"to turn your attention to the fact that the filters are not only blocking spam"

Ah, absolutely, thanks. Indeed, and it's only been getting worse. The problem was quite simple: I only got so far as coding a way to mark content as spam, with no way to mark content as not being spam. As such, the Bayesian filter was getting a larger and larger collection of "spam tokens", without getting any "not spam tokens", and thus it was matching a greater and greater number of valid comments.

On my flights back yesterday I finally got around to implementing a user interface for training the Bayesian filter about valid tokens, which was deployed today, and as of about fifteen minutes ago the filters should be allowing valid content through again.

"I find false positives more significant than false negatives; I think you should weigh them differently when making statistics."

Of course agreed! Otherwise it would be trivial to build a perfect spam filter by simply disabling comments. ;)

"Why are you so hell bent on CAPTCHAs?"

The same reason I dislike Java -- too many people have created awful things with it.

In any case, the next step for my spam filters is the ability to report when they inappropriately block valid content with the click of a button -- and there I will very likely be using a captcha. This inability to easily report when the filters are getting it wrong is an ugly but temporary state due to my own lack of time.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.