哇,連續 48 小時針對行動裝置的 BarComp,議程主題含括了所有的開放平台,同一段時間居然同時有 6 個議程同時進行!真是太酷了。Over the AirMobile Monday London 所辦的活動,贊助的有 Nokia, Google, Yahoo, Sun, Adobe 等。活動就在明天開始。

Over the air

看了看台灣的 Mobile Monday Taipei 調性偏向高調的商務聚會性質,缺乏了玩家、開發者那樣的氛圍。相較之下,Nokia Taiwan 為 Mobile01 社群辦的 S60 玩家聚會還更吸引我去湊熱鬧呢。

由於 debian.org.tw 以及許多社群網站都是多人之團隊維護的系統,團隊中許多人都有 root 權限,如果團隊間沒有協調妥當,常常另一人改了設定,下一人又改回來,造成設定上得困擾。於是我們需要一個系統可以追蹤、紀錄各種設定的修改。

有些方法是把設定檔目錄丟進 svk, bzr, mercurial 等等的版本控制軟體。但是往往會忘記將改過的設定檔儲存 (commit),以至於日後又忘記是誰動的手腳。近日裝了 metche,發現他大致可以滿足我的需求。metche 的基本功能是協助你監視 /etc 下面的設定檔,並在更動的時候 Email 一份 diff 給指定的電子郵件參考。預設 diff 是列出更動過的檔案,也可以要求 metche 顯示所有的修改細節。但是如果要求其顯示所有的修改內容,可能會誤把某些機要資訊像是密碼之類寄出來,為了避免資訊外漏,metche 也可以設定成寄出的信件使用 PGP 加密,只需要確定收件人的 PGP Key 在 root 的 keyring 中即可。

Continue reading

It’s been almost one year for not maintaining the wiki.debian.org.tw web site. Since I joined the current company, I spent all my time for dealing with routine jobs every signal day. I don’t even want to use my laptop at home, after I finish the jobs every day.

Lately, the wiki.debian.org.tw becomes more unstable. People usually see `Service is not available’ pages in the last couple weeks. One of the reason is the disk is full, the other reason is there are too much spam articles.

Finally, I spent a few hours this weekend for the site. First thing I do, it’s to upgrade the server and the mediawiki software. Frankly speaking, it’s not hard at all, since the wiki is installed in a vserver based on the Debian. All I need to do is running `aptitude dist-upgrade’, to upgrade the distribution from sarge to etch. And then I sync the mediawiki source tree, from 1.7.1 to 1.11.1. It’s also very easy, since mediawiki provide a upgrade script for check and modify the database schema.

The real problem is the thousands of spam articles. Since I have been for a long time not handing the spam problem, and more of the wiki moderators do not check the spam frequently. The spammers are easy to posts a lot of articles without supervision. Even through the moderators come to the wiki site ofter, it’s still impossible to delete the spams through the web interface, due to too much spammers.

Anyhow, the result is I got the thousands of spams in the database. Most of them are advertisements of venereal diseases treatment, they help you to deal with syphilis, gonorrhea and herpes. I all most want to change the wiki’s name as `SafeSexpedia‘, it’s become an informative knowledge base.

Still, I can not stand for the spamming situation. The first two things I do is install the reCAPTCHA MediaWiki Extension, so people need to pass CAPTCHA when they try to register an account. Also, I enabled $wgEmailConfirmToEdit which means only allow the account with email confirmed editing the pages. These two approach would be good enough to stop the new spammers. However, the real problem is the spam articles already in the database.

In order to clean up the database, I check several extensions like Nuke. However, I found they are not convenient for clean up thousands of spam articles. I decided to use APIs. The good thing is there are two scripts in the mediawiki/maintenance folder, cleanupSpam.php and removeUnusedAccounts.php.The cleanupSpam seems fit my requirements, it takes url as argument, and find out all the article which contains the url and remove it.

However, I don’t want to check the articles one by one for looking the urls. Since most of the spammers on the wiki.debian.org.tw are from China, most of them use the email address at 163.com. The most easy way for me, is just clean up all the accounts from 163.com and all of the articles posted by these accounts. And of course I can not just delete these articles. Because the spammer can modify any articles they want. In this case, I might remove some important articles by modified by spammers.

So, I need to have a script, the purpose of the script is find out the accounts with special email or nickname. And find out all of the articles modified by the account. For the article, if

  • If the account is not the latest editor, then we ignore the article. Because someone might already fix the content manually.
  • If the account is the latest editor and the article is created by the account, and it has signal version. Then we simply delete it.
  • If the account is the latest editor and there are earlier version, we found the last version which edited by a valid account. And we restore the article to that version. So we could have the right content for the article, before the spammer put the links into it.

I created another script based on the maintenance samples, thanks for these developers. With the script, I deleted hundreds of accounts and more then 2 thousands articles in a few hours. If you are interested about the script, you can download it from here. Put it in your mediawiki/maintenance folder. The usage is very simple

USAGE: php removeSpamAccountsAndPost.php [--delete] email

It takes only one parameter, you can find the articles by nickname or email. My database is mysql, so you can use ‘%’ as pattern matching for LIKE statement.

php removeSpamAccountsAndPost.php chihchun
php removeSpamAccountsAndPost.php chihchun%

The script only give you a list for preview by default, if you are sure that these accounts and articles should be deleted. Please add `–delete’ for let the script REAL DELETING THE ACCOUNT AND ARTICLES for you.

php removeSpamAccountsAndPost.php --delete chihchun

昨晚在 魯米爺咖啡Tossg 聚會分享了一點架設 dot 的心得。

說是分享,其實只是漫談。稍微介紹了現在的 Debian.org.tw 的伺服架構,其實我一向很討厭巨細靡遺的介紹「如何作」這種事情,因為通常沒有 Google 大神不知道的事情,而且重新整理敘述那些步驟,遠比順利的話數分鐘內可以設定完成的工作量要大多了,絲毫沒有意願作這種差事。尤其我通常使用那些穩定、資料充足的自由軟體,在一堆高手前面獻寶實在丟臉,真正有意義的事情是或許提供方向與經驗吧。

Continue reading