Building a developer community with containers

Posted on 2016-10-02

Rex Tsai

This is a lightning talk I shared during SoTM Asia 2016 at University of the Philippines Diliman.

If you can read Chinese, please read my post – 使用 Docker 玩轉開放街圖

應用內政部20公尺網格數值地形模型資料

Posted on 2016-09-14

Rex Tsai

政府開放資料的挑戰

2016/09/09 台灣政府內政部發布開放全臺及澎湖地區的20公尺網格DTM，這真是政府開放地理資訊之一大里程碑！

在此筆資料發布之前，台灣政府已經陸陸續續發布了相當多以 WMS、KML、GeoJSON 等格式的地圖資料集，特別是七月底發布的經建版地形圖數值檔（比例尺為2萬5千分之1、5萬分之1及10萬分之1），一舉將原本每幅圖檔收費150元(非加值型)、600元(加值型)改成以政府資料開放授權條款散布！

內政部部國土測繪中心的「經建版地形圖數值資料檔(比例尺為二萬五千分之一、五萬分之一及十萬分之一)」前經「105年行政院資料開放諮詢小組第2次會議」列為甲類資料，並經本部105年7月26日台內地字第1051306149號令修正發布「國土測繪成果資料收費標準」第2條附表附件2，開放資料供免費下載使用，授權條款採用行政院「政府資料開放授權條款－第1版」

這批 2016/07/28 釋出的圖檔包括二萬五千分之一經建版地形圖計262幅、五萬分之一經建版地形圖計80幅及十萬分之一經建版地形圖計7幅，共計349幅。

不含等高線圖層，但是包含水系、道路、行政界線、鐵道、高壓線、建築區等圖層，及圖例、中文註記等向量圖層。

雖然這批經建版地形圖數值檔的部份圖檔年代久遠，座標系統部份因為製圖時偏好，選用了 TWD67(119分帶)、TWD67(121分帶)、TWD97(121分帶) 等等不同的座標系統，格式也是需要私有軟體 AutoCAD 2013/2014 的 AC1027，實務上仍需要整理之後才有使用價值。

但是這項開放政策代表政府機關終於願意改變預設立場，將此圖資所能帶來的歲入財源，換取開放資料活化應用的經濟價值。而這種預設立場是過往法條的規範所造成的，例如規費法第7條與第8條明定：「為特定對象之權益辦理下列事項，應徵收行政規費；交付特定對象或提供其使用下列項目，應徵收使用規費。」，在「特定對象」的授權前提下，依據規費法所定義的各種政府資料管理辦法就會變成

限制利用目的；
禁止將資料或加值/衍生產品自由移轉、散布；
要求利用人之委託人管制資料的利用。

以至於降低各種資料再度利用的可能性。如果今天這筆資料沒有以政府資料開放授權條款發布，我也無法依照測繪圖資供應收費基準透過購買加值型授權後，將資料初步處理後開放給 OSM 社群再產製程其他格式的地圖。

社群交流

這不是一夜之間發生的事情，前前後後有來自不同的非營利組織的許多專家、學者與政府官員開會交流。

以開放街圖社群為例子，社群代表早在 2014 年中開始與行政院接觸，前前後後開了不少次會議。

這些會議的主題基本圍繞著

定義「開放」，解釋所謂免費取得與自由使用的概念差異。
遊說將開放地理資訊作為公共財的預設開放政策。
爬梳那些舊有相關法條、行政命令。如政府預算體制的規費法、採購法，國防部或各部會的使用管制。
釐清政府版本與國際開放資料社群的授權條款相容性。
分享討論國外案例，例如歐盟環境局的土地利用分析資料、紐約市政府的建築外框線、加拿大自然資源部與 OSM 的合作模式等等。
推動國內的應用典範，如防災用途。以及政府單位與開放資料社群的合作模式。

目的是逐一針對釋出資料的可行性等等討論。

最初的會議相當令人挫折，很多時候會由於雙方對於期待「開放」的程度不同，討論難以有交集，加上具體需要調適的繁複法規，以及對於變動政策後難以預期的民意反應等等，往往讓進度難以快速推進，所幸一直有積極的政務官支持推進。

即使有最高層級的官員支持，從願意採納意見到實際釋出資料，還要好長一段時間。

這些工作包含要調適法規，包含釋出的資料必須仍在個人資料保護法、著作權法之下，以及行政罰法、規費法等等都會影響各機關的支持度。法規不甚完備，加上政府機關的缺乏積極動機，維持資料正確與即時性需要透入預算與資源，但是政府機關往往並非資料利用的受益者，公開資料只會帶來違法的風險。

數值地形模型資料的應用

其實內政部100公尺網格數值地形模型資料是最早開放的資料，但是 100 公尺的精度缺乏實用價值，且早在 2011 年就有學者建議國安單位應該逐步開放更高精度的 DTM 資料。許多使用開放圖資的用戶使用 NASA Shuttle Radar Topography Mission (SRTM) 的 DEM 資料，它的理論精度是 1 arc/second (30公尺)，但由於它是插點處理，仍有缺陷。但是那是一般研究單位或是 OpenStreetMap 可以拿到的免費資料，也因此許多拿 OpenStreetMap 為底圖做戶外運動的使用者會拿 SRTM 作為等高線地形圖的參考資料。

作為一個以登山為主要應用的開放街圖使用者，在 2014/07/27 在行政院開的第一次會議上，我就提議將數值地形模型優先開放。

第一次會議後足足過了兩年，終於等到這筆20公尺網格DTM數值地形模型。

這批資料的 HDR 資料顯示是 2006 年測製，由財團法人成大研究發展基金會使用 5M 網格資料疏化重製 20M 網格資料。座標系統則是 TWD97 / TM2 zone 121。DTM數值地形模型不只是可用於產製等高線 (Contour lines) 與彩色暈渲圖 (Hillshade map)，尚可用於工程模擬、地形查詢、立體地形圖、坡度計算等等。

以目前台灣的登山社群而言，可以立刻用於取代原本的 SRTM 地圖資料。
例如地圖產生器在 v4.02 即已經使用此數據作為地形高度查詢的參考資料。
也可以用於產生 Mapsforge 離線向量圖資，目前台灣社群已經長期供應的有 Jing 的 ASTER.OSM、綠野遊蹤等離線圖資來源。這些圖資將可以用於手機的 OsmAnd、Locus Map、OruxMaps、綠野遊蹤等軟體。

也可以產製成 Garmin 裝置所需的地圖格式，台灣有 ASTER.OSM 與台灣登山地圖 – Taiwan TOPO 等。

由於這批資料的座標系統是二度分帶（TWD97，中央經線121度），為了使用方便，我將原始的 GeoTIFF / ASCII gridded XYZ raster datasets 一律轉成 WGS84，以方便再度後製使用。

目前轉好的格式有

GeoTIFF 與 LZW 壓縮版 GeoTIFF。可用於產生彩色暈渲圖、地形查詢等。
Shapefile – 分成 10公尺、20公尺、50公尺、250公尺間距的等高線向量圖。可用於 QGIS 或其他 GIS 軟體。
Postgres/PostGIS SQL – 分成 10公尺、20公尺、50公尺、250公尺間距的 MULTILINESTRING 等高線圖層。可以用 PostgreSQL/PostGIS 接上 QGIS 顯示，或是透過 mapnik/CartoCSS 等工具輸出成圖磚 (slippy map).
SRTM HGT – 由於 HGT 格式需求此為 downsampling 成 3601×3601 版本。可以放進手機，部份軟體支援直接畫出等高線或地形圖。由於是縮減取樣處理，不建議作為再產製原料。
OSM PBF – 10 公尺間距的等高線資料。可以用於合併 OSM 圖資，生成手機用的離線 Mapsforge 圖資或是 Garmin 版本地圖。

相關格式的圖資以及轉換的命令語法，請於此下載 http://goo.gl/Wku11y

如果有圖資應用上的問題，歡迎到 OpenStreetMap台灣社群討論。如果有手機使用此版圖資的問題，可以洽詢手機GPS登山推廣計畫及其經驗交流聯誼會，或是別讓自己迷失（手機GPS應用）以及正在籌備成立的福爾摩沙山難預防協會。

此資料集以 CC0 授權。使用者利用此資料，受有損害或損失，或致第三人受有損害或損失，而遭求償者，不負任何賠償或補償之責。

圖資預覽

目前已知的幾個問題，一是外海部份由於有負數高程，導致海岸線出現方框或人工痕跡。另外無論是分幅雲林縣資料或不分幅全台資料，在 (120.682650, 23.608483) 與 (120.682283, 23.604167) 兩處有高達六千公尺的奇怪外星建築。

與 SRTM 比對發現可用性高很多。以往 SRTM 圖資精度不足，使許多河谷的等高線錯誤，容易將河谷誤判為稜線。

以下與 2001 的經建三版兩萬五千分之一紙圖比對。可能很多山友會在登百岳的時候購買上河文化的地圖，或是在爬中級山的時候，使用地圖產生器印出經建三版的紙本地圖，然後用膠帶防水貼好帶上山。

透過內政部這次釋出的資料搭配 OSM 的山岳路線，山友將可以搭配使用產生出具備高即時性，使用便利的手機或專業手持衛星定位裝置離線地圖。

感謝這一路以來，努力謀求共識的各位政府機關、專家朋友們！

用於開放街圖的人工智慧技術

Posted on 2016-09-09

Rex Tsai

工人智慧之協同合作

無國界醫生組織 (Doctors Without Borders/Médecins Sans Frontières) 在七月份發表了一個新的手機軟體 MapSwipe，這個軟體的功能是讓志工可以透過手機協助預分析衛星圖 (pre-screen satellite imagery)，協助醫療團隊判斷哪些偏遠地區的群聚部落，以便派人前往提供疫苗接種等醫療服務。

MapSwipe 是一個開放原始碼軟體，使用方式非常簡單，透過後端系統將衛星圖切好豆腐，透過手機介面讓志工選出地圖上可能的道路或建築物。這個計畫可以與 OpenStreetMap 的 Humanitarian Team 的 Missing Maps Project 合作，在大型災害發生時，透過最新的衛星圖資料，讓志工可以快速的區分出尚存的建築物。然後再發布於 HOT Task Manager 上，由後續的圖客 (Mapper) 接手進行細部的地圖繪製工作。可以爭取時效由全球的志工，為地面隊伍提供更新過得地圖情報。

機械學習用於社經發展判讀

然而，這是使用工人智慧進行衛星圖，利用大量的人力來做地圖判讀。

大型的科技公司如 Facebook ，為了達成他們的連結全世界的遠景，也必須找出人口分布，這是傳統的人口統計無法提供的資訊。於是他們與衛星影像公司 Digital Global 合作，利用 machine learning 的影像處理技術找出人造建築，來推算人口分布度。Facebook Connectivity Lab 的技術文章說明了技術的概念。

至於非營利組織與學術界，只好利用開放的資料。類似的工作有 Arnhold Institute for Global Health 的 Senior Data Analyst, Patrick Doupe 使用 LANDSAT7 的衛星圖，計算預測特定區域的社經狀態。初期的程式碼已經發布在 github 上。

Standford 大學的研究學者 Marshall Burke 與 Stefano Ermon 的團隊，則是更進一步的藉由分析白天和夜間的衛星影像來建立模型，因為電力的基礎設施可以反應出當地的經濟水平，然後再以地面的基礎設施作為 filters，將模型轉移成預測貧困地區 (中文)。透過這套技術，可以以衛星影像取代過往利用大量人力普查的成本，長期的觀察偏遠區域的發展現況。然而，衛星資料仍有其侷限性，研究團隊也在考慮如何蒐集行動電話的 metadata作為原始數據. 論文中以 R 3.2.4 與 Python 2.7. 實做的程式碼與配置教學已經放到 github　上。

深度學習的特徵搜尋

這些技術使用的都是 convolutional neural network，有別傳統的 Geographic Object-Based Image Analysis 技術，有更多應用場景。

像是快速搜索衛星圖中的特定影像特徵。

Terrapattern 提供了一個普羅大眾都可以輕易使用的反向搜尋引擎。透過 OSM 標籤分類衛星圖像，以 Deep Convolutional Neural Network 訓練，但他的目的不是自動區分出建築物的類型。而是讓使用者可以快速透過 Cover Tree 的依據影像特徵快速搜尋地景，最好用來搜尋哪些不容易出現在地圖上的設施，像是廢氣泳池、冷氣機組等等。同樣的技術也可以用在受災區域的空照圖，可以很快的辨識出毀損的建築與橋樑道路等。程式碼是以 MIT 授權發布，網站上的資料也很有參考價值。

人工智慧實踐自動與偵錯

本文開始提到的以工人智慧方式檢測衛星影像，以便加速急難救助時候的繪圖速度，其實也可以透過人工智慧的技術加速處理。Stanford 的同學 Lars Roemheld 在他的學習課程 CS231n: Convolutional Neural Networks for Visual Recognition 的報告，詳細的描述了他嘗試的方法，以及碰到的問題。

Development Seed 的 Anand Thakker 在 2016 的 SoTM US 的演講 Skynet 則成功的展示如何利用 Machine Learning 找出道路。他提到、雖然 OSM 圖資中已經有許多的道路可以作為訓練資料，但是由於道路的寬度並未正確得標注，所以仍需要一些調整後才能正確的訓練出模型。細節請參考投影片、測試資料與程式碼。

光是標注出路線只完成了一小份工作，人工智慧與衛星影像並無法提供路名。

Facebook 在某些區域，已經開始使用 OSM 當作地圖基本資料，作為打卡的基礎地圖，Sadi Khan, Yin Wang, Luke Walsh 等幾位 Facebook 工程師在 SoTM US 上分享了他們針對埃及所進行的一些實驗，預測可以增加 20%-30% 道路網絡，並開始測試從 POI 取出路名，然後讓使用者回報選擇正確的路名，進一步提高資訊的完整度。雖然已經匯入部份部份路網，但是由於程序問題且品質不佳，已經回退修改。

其他的社群實驗還有 Geometalab 的 Samuel Kurath 做的 Crosswalk-Detection，用於偵測馬路上的斑馬線，據說目前已經有了 98% 的準確率。

在開放街圖社群已經有社群將技術用於改善圖資品質。專門做專業登山地圖軟體 Gaia GPS 的 TrailBehind, Inc，利用 TensorFlow 開發自動偵錯工具 DeepOSM，利用衛星影像識別道路，並與現有的圖資對照是否正確。這是最早將類神經網路技術利用於 OSM 的計畫之一。

未來發展

深度學習/機械學習的進入門檻越來越低，主要的線上雲端服務供應商包含 Amazon, Microsoft Azure, Google 都陸續推出適合類神經網路計算使用的 GPU 伺服器。可以很方便的取得計算資源，進行一些實驗開發。
除此之外，DigitalGlobe, CosmiQ Works 與 Amazon 合作推出了 SpaceNet 資料集。這些圖資來自 DigitalGlobe 的 WorldView-2 商業衛星，50 cm 的高畫質圖資，包含八個波段多光譜影像 (8-band multispectral data)。以及 220,594 組建築物圖像可以作為人工智慧訓練資料。

從 Development Seed 的分享，社群已經開始將技術用在完善 Dar es Salaam 路網或是與 World Bank 合作。但是這些新技術在 OSM 社群中引起一些政治衝突，OSM 社群一直都是由志工徒步踏查從無到有畫出來的，這些志工很珍惜透過聚會所建立的社群，只須要宅在家裡的鍵盤畫圖活動一直都不太受到歡迎。

而現在到達了一個機械化編輯 (Robot mapping) vs 人力編輯 (Craft mapping) 的十字路口。

作為一個實用主義者，我個人相信機械化編輯可以大幅降低成本的維持高品質的圖資資料，而人力編輯可以相輔相成的專注於提供本地知識。

永不停歇的系統安全工作

Posted on 2016-09-05

Rex Tsai

這篇文章是閱讀了 Kuon Ding 在 COSCUP 2016 發表的演講簡報「開源編譯器，如何實現系統安全最後一哩路」的一點想法。因為 COSCUP 一直待在場外聊天，未進入演講廳聽講，這些心得僅僅參考投影片的資訊。

私認為資訊安全沒有最後一哩路[1]，需要保持紀律的環環層層不停的造橋鋪路。

這場演講分享了開放原始碼編譯工具針對系統安全的發展，編譯工具的確是重要一環，以 Ubuntu 為例[2]， gcc 的 Stack Protector、built as PIE for exec ASLR、Fortify Source、Read-only relocation 都做額外的補釘加強安全性。然而 toolchain 不能提供獨立的安全保護，像是 Address Space Layout Randomization (ASLR) 必須是從 kernel 層做的。不管是融合桌面、手機環境的 Ubuntu 或是以手機為主的 Android 而言，安全性的發展都是盡可能的降低攻擊範圍(attack surface) 並層層疊加安全限制。

以最近發布的 Android 7[7][8] 為例子，針對系統面的保護改進用 SELinux 與 seccomp sandboxing 中減少 ioctl 的白名單呼叫範圍、 Library ASLR[3]、從 Grsecurity 學來的 CONFIG_DEBUG_RODATA 等等。這些都一步步的減少了攻擊暴露範圍。

舉例而言，文中提到了像是 2016/08 的 DEFCon 24 發布的 QuadRooter 相關漏洞[4]，許多都是來自 SoC 的程式碼設計缺陷所造成，而這些缺陷很難透過代碼審查的方式查出，特別是由於智慧產權的限制，很多有問題的驅動程式是以二進位檔散布的，作業系統廠商或終端硬體品牌商是拿不到原始碼的。這些只能透過系統安全機制[15]來防護。

如 QuadRooter 中提到的 CVE-2016-2059: Linux IPC router binding any port as a control port，這個攻擊的前提是系統關掉 kASLR[5]，然後才有機會做 Heap Spraying，但是要再拿到 root 還得關閉 SELinux 才行。而攻擊第一步 iocl 命令是可以透過 SELinux Policy 抑制的，例如 CVE-2016-0820 中，MediaTek 的 WiFi 驅動程式的 private ioctl 漏洞，可以關掉一般程式存取 device private commands[6].

編譯器未能防止類似的問題，必須依賴其他機制來保護系統。

編譯器[9]實踐的 KAsan (Kernel Address Sanitizer)[21] 功能可以查找 QuadRooter 中 CVE-2016-2503/CVE-2016-2504 等 use-after-free attack[22] 問題，但是一樣需要核心的支援[10]。而這個在 4.4 中的功能能夠發送到使用者手上尚須要一段時間[14]，不僅僅是更新 toolchain 重新編譯即可。

不是所有的理論技術都可以在安全、便利性、效能上帶來好處，作業系統往往必須做出取捨。

例如啟動了投影片中[1]提到 vtable verification feature[27]，這個功能會讓一些重要的軟體如 Firefox 炸掉[11]，因為開發者會對 vtable 用一些奇計淫巧。
例如前述的 Ubuntu 中的 built as PIE 在 i686 平台上會造成 5-10% 的效能損失[12]，只能挑某些重要的庫使用。到 16.10 才因為 64 bit 環境成熟而預設啟用。
例如啟動了 Kernel Address Space Layout Randomisation (kASLR) 後，在 x86 上就無法讓電腦休眠[13] ，對沒電時需要緊急休眠的筆記型電腦使用者是無法接受的。

每項安全設計都不能只從單方面來看，需要全局的評估。有些無法在編譯器中實踐的功能，可以在 kernel 中完成，kenrel 的問題可以透過 app sandboxing 來補強。

而最近幾年的作業系統發展趨勢以 Isolation (Sandboxing) 為方向，像是 Android 使用 Selinux 的 Sandbox、ChoromeOS 中使用 Minijail[16]，Linux Desktop 上的 xdg-app/Flatpak[17][18]，以及 Ubuntu 使用 Snappy (Apparmor)[19][20] 等等技術。除了 Linux 以外，Apple OSX 基於 TrustedBSD Mandatory Access Control (MAC) Framework 的 Sandbox[23][24][25], 以及 Microsoft 的 Windows Runtime sandbox[26] 等等。這些系統的設計都是為了保護使用者的資料，除了防止惡意程式之外，如果程式遭到破解，所能造成的破壞也會被侷限在沙箱內。

最大的挑戰之一，或許是針對新的 security model 設計具備彈性 API，以及在多重限制的運行環境下仍可提供友善便利的使用者體驗吧。

[1] 開源編譯器，如何實現系統安全最後一哩路 by Funny Systems – https://speakerdeck.com/FunnySystems/kai-yuan-bian-yi-qi-ru-he-shi-xian-xi-tong-an-quan-zui-hou-li-lu
[2] https://wiki.ubuntu.com/Security/Features
[3] Implement Library Load Order Randomization – https://android.googlesource.com/platform/bionic/+/4f7a7ad3fed2ea90d454ec9f3cabfffb0deda8c4%5E%21/
[4] QuadRooter Research Report – https://www.checkpoint.com/downloads/resources/quadRooter-vulnerability-research-report.pdf
[5] Kernel address space layout randomization [LWN.net] – https://lwn.net/Articles/569635/
[6] Only allow shell user to access unprivileged socket ioctl commands. – https://android.googlesource.com/platform/external/sepolicy/+/57531ca%5E%21/
[7] Security | Android Open Source Project – https://source.android.com/security/
[8] Security Enhancements in Android 7.0 | Android Open Source Project – https://source.android.com/security/enhancements/enhancements70.html
[9] [ASan] Initial support for Kernel AddressSanitizer · llvm-mirror/llvm@e9149f4 – https://github.com/llvm-mirror/llvm/commit/e9149f4f8cd3b915ada134d80452c6eae7875ca4
[10] KASan support for arm64 – http://lkml.iu.edu/hypermail/linux/kernel/1511.0/02583.html
[11] Crash in mozJSComponentLoader::ModuleEntry::GetFactory when compiled with GCC 4.9.0 and VTV – https://bugzilla.mozilla.org/show_bug.cgi?id=1046600
[12] PIE has a large (5-10%) performance penalty on architectures with small numbers of general registers (e.g. x86) – https://wiki.ubuntu.com/Security/Features#pie
[13] Prefer kASLR over Hibernation – Patchwork – https://patchwork.kernel.org/patch/8765121/
[14] KASan support for arm64 – http://lkml.iu.edu/hypermail/linux/kernel/1511.0/02583.html
[15] Google Online Security Blog: Protecting Android with more Linux kernel defenses – https://security.googleblog.com/2016/07/protecting-android-with-more-linux.html
[16] Chromium OS Sandboxing – The Chromium Projects – https://www.chromium.org/chromium-os/developer-guide/chromium-os-sandboxing#h.l7ou90opzirq
[17] Projects/SandboxedApps – GNOME Wiki! – https://wiki.gnome.org/Projects/SandboxedApps
[18] Sandbox · flatpak/flatpak Wiki – https://github.com/flatpak/flatpak/wiki/Sandbox
[19] snapcraft – Snaps are universal Linux packages – http://snapcraft.io/
[20] Snappy Interfaces | Labix Blog – http://blog.labix.org/2016/04/22/snappy-interfaces
[21] Kernel Address Sanitizer – https://github.com/google/kasan/wiki
[22] Four new Android privilege escalations [LWN.net] – https://lwn.net/Articles/696716/
[23] The Apple Sandbox https://media.blackhat.com/bh-dc-11/Blazakis/BlackHat_DC_2011_Blazakis_Apple%20Sandbox-Slides.pdf
[24] The Apple Sandbox https://media.blackhat.com/bh-dc-11/Blazakis/BlackHat_DC_2011_Blazakis_Apple_Sandbox-wp.pdf
[25] SandBlaster: Reversing the Apple Sandbox – https://arxiv.org/pdf/1608.04303.pdf
[26] WinRT: The Metro-politan Museum of Security https://conference.hitb.org/hitbsecconf2012ams/materials/D1T2%20-%20Sebastien%20Renaud%20and%20Kevin%20Szkudlapski%20-%20WinRT.pdf
[27] Improving Function Pointer Security for Virtual Method Dispatches https://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=cmtice.pdf

我被黑了嗎？請愛用密碼管理軟體

Posted on 2016-09-02

Rex Tsai

Dropbox 在 2012 的時候曾經傳出被入侵盜走六百九十萬筆個人資料。結果最近媒體發現[1]，其實是被偷走 68,680,741 筆，其中31,865,280 組密碼使用 bcrypt hashing 加密，而另外的36,815,461組密碼則使用SHA1 hashing 加密。「我被黑了嗎？」(Have I been pwned[2]) 的站長 Troy Hunt[3] 拿到這批資料，驗證[4]之後確定是 Dropbox 被偷走得資料、不是假資料。

這些資料包含電子郵件與密碼，雖然這些密碼是被加密過，但是還是可能被破解猜出來。若沒有用不同密碼的習慣，別人就可能有機會用同樣帳號密碼登入不同的服務。而這些資料除了某些熟門路的人拿的到，像是 LeakedSource[5] 也提供付費版 API[6] 供人取用受害者原始資料 (亦即加密後的密碼等資料)。

比較好的習慣是使用密碼管理軟體 (Password Manager) 與雙因素授權 (Two factor authentication, 2FA)[9]。

密碼管理軟體的基本功能就是幫你產生亂數密碼，自動登入等，所以你可輕易在不同服務間使用難破解的密碼。密碼管理器的選擇很多[7][8]，我自己是使用 Lastpass[11]，它在瀏覽器與 Android 平台上整合的很方便。而且價格是 12 USD 一年，相較其他軟體更為便宜，雖然目前服務的公司有提供免費帳號，但是我已經買了好幾年了。Lastpass 雖然在 2015 也被入侵過[12]，不過由於安全設計得當，並沒有產生重大問題。

我每個月都會定期做一次 Lastpass security challenge[10]，他會檢查密碼強度、是否重複使用密碼，以及類似 Have I been pwned[2]、LeakedSource[5] 的功能，會查找已知的受害者資料庫，提醒是否為高風險需要採取任何行動。

快選一個密碼管理器吧。 https://lastpass.com/f?4133426

同場加映: Password manager security papers | Wilders Security Forums – http://www.wilderssecurity.com/threads/password-manager-security-papers.365724/

“Password Managers: Risks, Pitfalls, and Improvements” (2014)

We study the security of popular password managers and their policies on automatically filling in passwords in web pages. We examine browser built-in password managers, mobile password managers, and 3rd party managers. We show that there are significant differences in autofill policies among password managers. Many autofill policies can lead to disastrous consequences where a remote network attacker can extract multiple passwords from the user’s password manager without any interaction with the user. We experiment with these attacks and with techniques to enhance the security of password managers. We show that our enhancements can be adopted by existing managers.
“Protecting Users Against XSS-based Password Manager Abuse” (2014)

To ease the burden of repeated password authentication on multiple sites, modern Web browsers provide password managers, which offer to automatically complete password fields on Web pages, after the password has been stored once. Unfortunately, these managers operate by simply inserting the clear-text password into the document’s DOM, where it is accessible by JavaScript. Thus, a successful Cross-site Scripting attack can be leveraged by the attacker to read and leak password data which has been provided by the password manager. In this paper, we assess this potential threat through a thorough survey of the current password manager generation and observable characteristics of password fields in popular Web sites. Furthermore, we propose an alternative password manager design, which robustly prevents the identified attacks, while maintaining compatibility with the established functionality of the existing approaches.
“Vulnerability and Risk Analysis of Two Commercial Browser and Cloud Based Password Managers” (2013)

Web users are confronted with the daunting challenges of managing more and more passwords to protect their valuable assets on different online services. Password manager is one of the most popular solutions designed to address such challenges by saving users’ passwords and later auto-filling the login forms on behalf of users. All the major browser vendors have provided password manager as a built-in feature; third-party vendors have also provided many password managers. In this paper, we analyze the security of two very popular commercial password managers: LastPass and RoboForm. Both of them are Browser and Cloud based Password Managers (BCPMs), and both of them have millions of active users worldwide. We investigate the security design and implementation of these two BCPMs with the focus on their underlying cryptographic mechanisms. We identify several critical, high, and medium risk level vulnerabilities that could be exploited by different types of attackers to break the security of these two BCPMs. Moreover, we provide some general suggestions to help improve the security design of these and similar BCPMs. We hope our analysis and suggestions could also be valuable to other cloud-based data security products and research.
“Automated Password Extraction Attack on Modern Password Managers” (2013)

To encourage users to use stronger and more secure passwords, modern web browsers offer users password management services, allowing users to save previously entered passwords locally onto their hard drives. We present Lupin, a tool that automatically extracts these saved passwords without the user’s knowledge. Lupin allows a network adversary to obtain passwords as long as the login form appears on a non-HTTPS page. Unlike existing password sniffing tools, Lupin can obtain passwords for websites users are not visiting. Furthermore, Lupin can extract passwords embedded in login forms with a destination address served in HTTPS. To determine the number of websites vulnerable to our attack, we crawled the top 45,000 most popular websites from Alexa’s top website list and discovered that at least 28% of these sites are vulnerable. To further demonstrate the feasibility of our attack, we tested Lupin under controlled conditions using one of the authors’ computers. Lupin was able to extract passwords from 1,000 websites in less than 35 seconds. We suggest techniques for web developers to protect their web applications from attack, and we propose alternative designs for a secure password manager.
“Keys to the Cloud: Formal Analysis and Concrete Attacks on Encrypted Web Storage” (2013)

To protect sensitive user data against server-side attacks, a number of security-conscious web applications have turned to client-side encryption, where only encrypted user data is ever stored in the cloud. We formally investigate the security of a number of such applications, including password managers, cloud storage providers, an e-voting website and a conference management system. We find that their security relies on both their use of cryptography and the way it combines with common web security mechanisms as implemented in the browser. We model these applications using the WebSpi web security library for ProVerif, we discuss novel attacks found by automated formal analysis, and we propose robust countermeasures.
“On The Security of Password Manager Database Formats” (2012)

Password managers are critical pieces of software relied upon by users to securely store valuable and sensitive information, from online banking passwords and login credentials to passport- and social security numbers. Surprisingly, there has been very little academic research on the security these applications provide.
This paper presents the first rigorous analysis of storage formats used by popular password managers. We define two realistic security models, designed to represent the capabilities of real-world adversaries. We then show how specific vulnerabilities in our models allow an adversary to implement practical attacks. Our analysis shows that most password manager database formats are broken even against weak adversaries.
From Web-based Attacks on Host-Proof Encrypted Storage (2012):

Cloud-based storage services, such as Wuala, and password managers, such as LastPass, are examples of so-called host-proof web applications that aim to protect users from attacks on the servers that host their data. To this end, user data is encrypted on the client and the server is used only as a backup data store. Authorized users may access their data through client-side software, but for ease of use, many commercial applications also offer browser-based interfaces that enable features such as remote access, form-filling, and secure sharing.
We describe a series of web-based attacks on popular host-proof applications that completely circumvent their cryptographic protections. Our attacks exploit standard web application vulnerabilities to expose flaws in the encryption mechanisms, authorization policies, and key management implemented by these applications. Our analysis suggests that host-proofing by itself is not enough to protect users from web attackers, who will simply shift their focus to flaws in client-side interfaces.

利益揭露: lastpass 的邀請連結是我的個人 premium referral link.

[1] Hackers Stole Account Details for Over 60 Million Dropbox Users | Motherboard – http://motherboard.vice.com/read/hackers-stole-over-60-million-dropbox-accounts
[2] Have I been pwned? Check if your email has been compromised in a data breach – https://haveibeenpwned.com/
[3] Troy Hunt, a Microsoft Regional Director and Most Valuable Professional awardee for Developer Security – https://haveibeenpwned.com/About
[4] Troy Hunt: The Dropbox hack is real – https://www.troyhunt.com/the-dropbox-hack-is-real/
[5] Find the source of your leaks – https://www.leakedsource.com/
[6] LeakedSource API Purchase – https://www.leakedsource.com/api/purchase
[7] Password Managers Compared http://www.howtogeek.com/?p=240255
[8] Best Password Manager http://www.asecurelife.com/dashlane-vs-lastpass-vs-1password-vs-roboform-vs-keepass/
[9] https://en.wikipedia.org/wiki/Multi-factor_authentication
[10] https://blog.lastpass.com/tag/lastpass-security-challenge/
[11] https://lastpass.com/f?4133426
[12] LastPass Hacked http://lifehacker.com/lastpass-hacked-time-to-change-your-master-password-1711463571

OpenStreetMap 向量圖資的授權方式

Posted on 2016-09-01

Rex Tsai

OpenStreetMap 從 2012/09/12 後的資料，是使用 ODbL (Open Database License)[1] 散布。ODbL 條款[2][3][4]有寬鬆的授權模式 (permissive license)、 Copyleft 授權等特性，這些特性影響到利用 OSM 為基礎開發的其他作品是否也該以 ODbL 授權方式 (條款 4.4 節) 再次散布，或是只要聲明資料來自使用 ODbL 的 OSM 資料庫 (條款 4.3 節)。

而作品的區分方式分為

Produced Work 產製作品
Derived Work 衍生作品

例如把 OSM 輸出成圖檔或是紙本地圖，這即是「產製作品」。如果是直接改造原始資料庫，則為「衍生作品」。

給行動裝置用的向量圖資 (Garmin img 或 MapsForge[9]) 是一個模糊的地帶，因為實際上地圖並非以圖檔格式 (raster graphic) 散布，而是將原始 OSM 資料庫轉換成另外一種資料庫型態散布。但是由於這些向量圖檔的主要用途仍是離線顯示地圖，在 OSM 社群的討論[8][9]上，是被認可為「產製作品」[6]。除非新增額外的資訊到資料庫或增修其向量圖檔，或將其作為資料庫使用，則會被視為「衍生作品」[11]。

所以如果你散布的是未經過增修的向量圖資則請按照 4.3 節規定，在授權處說明

『本作品內含部份資訊取自「OpenStreetMap」https://wiki.openstreetmap.org/wiki/Downloading_data ，該資料庫以開放資料庫授權條款 (Open Database License, ODbL) 進行提供。』

[1] Open Database License – OpenStreetMap Wiki – https://wiki.openstreetmap.org/wiki/Open_Database_License
[2] Open Database License (ODbL) v1.0 | Open Data Commons – http://opendatacommons.org/licenses/odbl/1-0/
[3] 20121120-ODbL-1.0非官方正體中文翻譯 | Lu-six Person’s Notes – http://lucien.cc/?p=2358
[4] 20121120-DbCL-1.0非官方正體中文翻譯 | Lu-six Person’s Notes – http://lucien.cc/?p=2360
[5] 20121018-從開源軟體到開放資料－論 Open Database License v1.0 | Lu-six Person’s Notes – http://lucien.cc/?p=2348
[6] Licence/Community Guidelines/Produced Work – Guideline – OpenStreetMap Foundation Wiki – http://wiki.osmfoundation.org/wiki/Licence/Community_Guidelines/Produced_Work_-_Guideline
[7] Open Data License/Produced Work – Guideline – OpenStreetMap Wiki – https://wiki.openstreetmap.org/wiki/Open_Data_License/Produced_Work_-_Guideline
[8] [OSM-legal-talk] Garmin maps and license – https://lists.openstreetmap.org/pipermail/legal-talk/2016-February/008382.html
[9] Selling routable OSM maps for garmin – OSM Help – http://help.openstreetmap.org/questions/48251/selling-routable-osm-maps-for-garmin
[10] 台灣 MapsForge 圖資檔案 – osmtw.hackpad.com – https://osmtw.hackpad.com/%E5%8F%B0%E7%81%A3-MapsForge-%E5%9C%96%E8%B3%87%E6%AA%94%E6%A1%88-tcm2Owggcqb
[11] Legal FAQ – OpenStreetMap Wiki – https://wiki.openstreetmap.org/wiki/Legal_FAQ#3c._If_I_make_something_with_OSM_data.2C_do_I_now_have_to_apply_your_license_to_my_whole_work.3F
[12] Legal FAQ – OpenStreetMap Wiki – https://wiki.openstreetmap.org/wiki/Legal_FAQ#3._Using