Spotlight customer reviews:
|
Customer Rating:      Summary: Does the basics. Comment: "Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."
Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.
I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."
This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:
"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."
PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.
The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.
If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.
Customer Rating:      Summary: Must buy for any Webbot programmer Comment: great book. very well organized and code in book is available for download and code is well documented
Customer Rating:      Summary: Great Book with Lots of Information Comment: This book covers every aspect I could ever hope a book on web bots would cover. It goes into great detail and provides lots of background information about things such as why you should use web bots, security issues, how to authenticate a bot with password protected sites, writing search engine crawlers, parsing HTML, how to handle cookies, HTTP headers, dealing with forms and a lot more.
I was very pleased with how this book covered concepts. The book uses PHP and the cURL library as a teaching tool instead of trying to give a lesson in how to use PHP as a crawler language. The way the code is explained makes it very easy to translate into whatever language you are most comfortable coding in. The book uses fundamental functional programming concepts which make it easy to pick up the general idea without actually knowing PHP.
My boss bought this book to help my group us with a project we were working on, and even my co-workers who had no background with PHP were able to use this book to write a web bot in C# (using the cURL library) very easily. The concepts from this book easily transfered over to object-oriented concepts.
Customer Rating:      Summary: Scour The Internet = FUN FUN FUN Comment: 'Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL' by Michael Schrenk is an absolute GEM of a book for all internet computer nerds that love trying new things and hacking for a hobby!! If you are one of the afformentioned and love to try new things and see how you can scour the internet with the greatest of ease, you owe it yourself to read and DO this guide!! When I say DO I mean don't just read, but input the examples you'll find within and play around with the power of PHP and CURL to be able to quickly and efficiently traverse the web, not for the purpose of mayhem but enjoyment!
A perfect example contained within this book is writing code to programmatically download entire websites. Instead of just right-clicking an image, imagine running code to grab the ENTIRE contents simply and easily?!?
Other fun tasks you'll learn how to do are send SMS alers to your cell phone, decode encrypted files, parse web site data... the list could go on and on!!
If you like to play with the web and create cool apps that will do cool things, pick up this wonderful book, sick back and PLAY!!
***** HIGHLY RECOMMENDED
Customer Rating:      Summary: It's a top pick any comprehensive computer library needs. Comment: Programmers and businesspeople who want to use the Web's resources to make the most of locating or promoting data will find Webbots, Spiders, and Screen Scrapers a key to successful use of the web. From how to decode encrypted files and automate form submissions to unlocking password-protected websites and placing automatic bids on web auction sites, this comes from a developer who has developed webbots and spiders for clients across North America, and who has all the insider keys to usage. It's a top pick any comprehensive computer library needs.
Diane C. Donovan
California Bookwatch
|
|
|