Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

wget. Integreated it with a C# application just fine. It outputs the pages to file and produces a nice parsable crawl log. It is single thread but unless you're crawling wikipedia, you won't have a problem. Small tools that work well are a good start. I had problems with many of the multi-threaded crawlers, they seemed to trip over themselves, wget was fast and rock solid.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: