Internet Cross Logo
Internet Cross your one stop web tutorial website
Your Ad Here

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL

List Price: $39.95
Our Price: $21.96
Your Save:$ 17.99 ( 45% )
Availability: Usually ships in 24 hours
Manufacturer: No Starch Press Average Customer Rating: Average rating of 4.5/5Average rating of 4.5/5Average rating of 4.5/5Average rating of 4.5/5Average rating of 4.5/5

Buy it now at Amazon.com!

Back to previous page




Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL


Binding: Paperback
Dewey Decimal Number: 025.04
EAN: 9781593271206
Format: Illustrated
ISBN: 1593271204
Label: No Starch Press
Manufacturer: No Starch Press
Number Of Items: 1
Number Of Pages: 328
Publication Date: 2007-03-30
Publisher: No Starch Press
Studio: No Starch Press

Related Items

Spotlight customer reviews:

Customer Rating: Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5
Summary: :-) bots
Comment: This book is a great reference and/or introduction to the cURL library. After reading this book, I realized it is not intended as a single solution for bot programming. This book covers many features of cURL and should be in any bot programmers library. Also as mentioned here in the reviews, the regular expressions statement in the book is almost completely inaccurate.

Customer Rating: Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5
Summary: Excellent Source
Comment: I can't say enough about this book. It's informative, laid out well, dynamic examples and has an awesome website tie-in. I would recommed this book to anyone interesting in learning how to scrape websites for data

Customer Rating: Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5
Summary: Excellent cURL primer
Comment: This is an excellent book used as an introduction to the cURL library. The author has created a set of his own functions that are well written and, with the help of the book, easy to understand.

It does pre-suppose some PHP and data transfer protocol knowledge but if you are already armed with that, this is an excellent intro to data exchange across servers. Each chapter introduces a new concept and a simple usage of that concept. I seldom read tech related books cover to cover but this book was an exception. I have been programming for over 20 years so being excited by new stuff is somewhat rare. I enjoy new stuff but this book whets the imagination!

Customer Rating: Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5Average rating of 5/5
Summary: barry naice!
Comment: This book is simply awesome. You will need to come armed with at least a basic knowledge of php, but everything is pretty straight forward. The projects are well explained and applicable to a wide range of projects that you might be getting yourself into.

Customer Rating: Average rating of 2/5Average rating of 2/5Average rating of 2/5Average rating of 2/5Average rating of 2/5
Summary: Does the basics.
Comment: "Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."

Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.

I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."

This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:

"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."

PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.

The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.

If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.

 

Editorial Reviews:

The Internet is bigger and better than what a mere browser allows. Webbots, Spiders, and Screen Scrapers is for programmers and businesspeople who want to take full advantage of the vast resources available on the Web. There's no reason to let browsers limit your online experience-especially when you can easily automate online tasks to suit your individual needs.

Learn how to write webbots and spiders that do all this and more:

  • Programmatically download entire websites
  • Effectively parse data from web pages
  • Manage cookies
  • Decode encrypted files
  • Automate form submissions
  • Send and receive email
  • Send SMS alerts to your cell phone
  • Unlock password-protected websites
  • Automatically bid in online auctions
  • Exchange data with FTP and NNTP servers

    Sample projects using standard code libraries reinforce these new skills. You'll learn how to create your own webbots and spiders that track online prices, aggregate different data sources into a single web page, and archive the online data you just can't live without. You'll learn inside information from an experienced webbot developer on how and when to write stealthy webbots that mimic human behavior, tips for developing fault-tolerant designs, and various methods for launching and scheduling webbots. You'll also get advice on how to write webbots and spiders that respect website owner property rights, plus techniques for shielding websites from unwanted robots.

    As a bonus, visit the author's website to test your webbots on sample target pages, and to download the scripts and code libraries used in the book.

    Some tasks are just too tedious-or too important!- to leave to humans. Once you've automated your online life, you'll never let a browser limit the way you use the Internet again.


  • Buy it now at Amazon.com!