身為網管,時常需要計算統計網路位址,雖說計算不複雜,一隻鉛筆加腦袋也可以算出需要的網域,但是使用工具輔助總是較為方便。先前是使用 ipcalc,每次計算都要下達指令總是不太方便,最近改用 gip,gip 是以 GTK2 為基礎開發的圖形介面 IPv4 位址計算機,拿來做規劃網路的小工具還算實用。(其實通常只用到 IPv4 Address Analyzer)
Category: Technology
Metche
由於 debian.org.tw 以及許多社群網站都是多人之團隊維護的系統,團隊中許多人都有 root 權限,如果團隊間沒有協調妥當,常常另一人改了設定,下一人又改回來,造成設定上得困擾。於是我們需要一個系統可以追蹤、紀錄各種設定的修改。
有些方法是把設定檔目錄丟進 svk, bzr, mercurial 等等的版本控制軟體。但是往往會忘記將改過的設定檔儲存 (commit),以至於日後又忘記是誰動的手腳。近日裝了 metche,發現他大致可以滿足我的需求。metche 的基本功能是協助你監視 /etc 下面的設定檔,並在更動的時候 Email 一份 diff 給指定的電子郵件參考。預設 diff 是列出更動過的檔案,也可以要求 metche 顯示所有的修改細節。但是如果要求其顯示所有的修改內容,可能會誤把某些機要資訊像是密碼之類寄出來,為了避免資訊外漏,metche 也可以設定成寄出的信件使用 PGP 加密,只需要確定收件人的 PGP Key 在 root 的 keyring 中即可。
Script for removing spam messages on mediawiki site.
It’s been almost one year for not maintaining the wiki.debian.org.tw web site. Since I joined the current company, I spent all my time for dealing with routine jobs every signal day. I don’t even want to use my laptop at home, after I finish the jobs every day.
Lately, the wiki.debian.org.tw becomes more unstable. People usually see `Service is not available’ pages in the last couple weeks. One of the reason is the disk is full, the other reason is there are too much spam articles.
Finally, I spent a few hours this weekend for the site. First thing I do, it’s to upgrade the server and the mediawiki software. Frankly speaking, it’s not hard at all, since the wiki is installed in a vserver based on the Debian. All I need to do is running `aptitude dist-upgrade’, to upgrade the distribution from sarge to etch. And then I sync the mediawiki source tree, from 1.7.1 to 1.11.1. It’s also very easy, since mediawiki provide a upgrade script for check and modify the database schema.
The real problem is the thousands of spam articles. Since I have been for a long time not handing the spam problem, and more of the wiki moderators do not check the spam frequently. The spammers are easy to posts a lot of articles without supervision. Even through the moderators come to the wiki site ofter, it’s still impossible to delete the spams through the web interface, due to too much spammers.
Anyhow, the result is I got the thousands of spams in the database. Most of them are advertisements of venereal diseases treatment, they help you to deal with syphilis, gonorrhea and herpes. I all most want to change the wiki’s name as `SafeSexpedia‘, it’s become an informative knowledge base.
Still, I can not stand for the spamming situation. The first two things I do is install the reCAPTCHA MediaWiki Extension, so people need to pass CAPTCHA when they try to register an account. Also, I enabled $wgEmailConfirmToEdit which means only allow the account with email confirmed editing the pages. These two approach would be good enough to stop the new spammers. However, the real problem is the spam articles already in the database.
In order to clean up the database, I check several extensions like Nuke. However, I found they are not convenient for clean up thousands of spam articles. I decided to use APIs. The good thing is there are two scripts in the mediawiki/maintenance folder, cleanupSpam.php and removeUnusedAccounts.php.The cleanupSpam seems fit my requirements, it takes url as argument, and find out all the article which contains the url and remove it.
However, I don’t want to check the articles one by one for looking the urls. Since most of the spammers on the wiki.debian.org.tw are from China, most of them use the email address at 163.com. The most easy way for me, is just clean up all the accounts from 163.com and all of the articles posted by these accounts. And of course I can not just delete these articles. Because the spammer can modify any articles they want. In this case, I might remove some important articles by modified by spammers.
So, I need to have a script, the purpose of the script is find out the accounts with special email or nickname. And find out all of the articles modified by the account. For the article, if
- If the account is not the latest editor, then we ignore the article. Because someone might already fix the content manually.
- If the account is the latest editor and the article is created by the account, and it has signal version. Then we simply delete it.
- If the account is the latest editor and there are earlier version, we found the last version which edited by a valid account. And we restore the article to that version. So we could have the right content for the article, before the spammer put the links into it.
I created another script based on the maintenance samples, thanks for these developers. With the script, I deleted hundreds of accounts and more then 2 thousands articles in a few hours. If you are interested about the script, you can download it from here. Put it in your mediawiki/maintenance folder. The usage is very simple
USAGE: php removeSpamAccountsAndPost.php [--delete] email
It takes only one parameter, you can find the articles by nickname or email. My database is mysql, so you can use ‘%’ as pattern matching for LIKE statement.
php removeSpamAccountsAndPost.php chihchun php removeSpamAccountsAndPost.php chihchun%
The script only give you a list for preview by default, if you are sure that these accounts and articles should be deleted. Please add `–delete’ for let the script REAL DELETING THE ACCOUNT AND ARTICLES for you.
php removeSpamAccountsAndPost.php --delete chihchun
Net-SMS-PChome updated
去年年中的時候架了一個 SmokePing 來監測某公司幾個服務的 Network Latency 問題,用 SmokePing 的原因是他支援數種協定,所以我可以一口氣拿來監測 DNS, SSH Daemon, RADIUS, Web, SMTP 等。而且 SmokePing 的架構頗模組化,只要稍加修改幾個 Perl Script 就可以很快的滿足我的需求。
不過既然已經隨時偵測網路服務,光是使用電子郵件通知也稍嫌不夠即時。於是起意做了簡訊通知功能,隨意找了幾個 SMS 服務供應商,決定拿便宜的 PCHOME 一元簡訊來頂著用。感謝 SnowFLY (飄然似雪) 做了 SMS PCHOME 的 Net-SMS-PChome Perl module,省了不少功夫。也因此半夜時常被簡訊吵醒。
不過 CPAN 上的版本是 2006 年,跟目前的 PCHOME 網頁不太相容,稍加修改後如
Continue reading
Little script for backup svn repository
Here is my little script for `incremental’ dump svn revision trees. The script just check every svn repositories which located at /home/svn, and save it in /home/backup by versions.
#!/bin/sh for dir in /home/svn/* ; do name=$(basename ${dir}) version=$(svnlook youngest ${dir}) for ((r=1;r<${version};r++)) ; do if [ ! -f "/home/backup/${name}-$r.gz" ] ; then svnadmin dump ${dir} -r $r --incremental | \ gzip -9> /home/backup/${name}-$r.gz fi done done
Have fun! This is a tip.
reverse proxy add forward module for apache
If you ever read my blog entry for setting up the Debian.org.tw, you probably already know that I love to use reverse proxy in the front of my web servers. This approach can solve the signal IP address for multiple Vservers problem, also it can provide web cache which reducing the server loading.
Since the proxy server (Squid) pass the http session to the real web servers, one of the problem is that my web servers always saw signal source IP address, which is the proxy’s IP address. Even through the proxy server still put the client’s IP in the `X-Forwarded-For’ http header, it’s still painful to retrieve the correct IP address from the head in every web application.
Thanks for Thomas Eibner, who wrote the reverse proxy add forward module for apache. The module simply check the IP address to see if it comes from the proxy server, if it is it will put the IP address in `X-Forwarded-Host’ or `X-Host’ to `Host’ header. So you don’t need to worry about the wrong IP address, and track the http requests more easily.
Debian package is ported by Piotr Roszatycki, but it’s still the old 0.5 version. Since the 0.6 is out, I filed a bugreport for remind him. For my etch servers, I back-ported the package with the last version. You can download it from my personal repository.
BTW, Piotr Roszatycki use yada for libapache2-mod-rpaf, who is also the maintainer of yada. After reading the yada’s script file `debian/packages’, I really feel like I went to my `good’ old days with RPM/specs. :p