Customer Rating:      Summary: :-) bots Comment: This book is a great reference and/or introduction to the cURL library. After reading this book, I realized it is not intended as a single solution for bot programming. This book covers many features of cURL and should be in any bot programmers library. Also as mentioned here in the reviews, the regular expressions statement in the book is almost completely inaccurate.
Customer Rating:      Summary: Excellent Source Comment: I can't say enough about this book. It's informative, laid out well, dynamic examples and has an awesome website tie-in. I would recommed this book to anyone interesting in learning how to scrape websites for data
Customer Rating:      Summary: Excellent cURL primer Comment: This is an excellent book used as an introduction to the cURL library. The author has created a set of his own functions that are well written and, with the help of the book, easy to understand.
It does pre-suppose some PHP and data transfer protocol knowledge but if you are already armed with that, this is an excellent intro to data exchange across servers. Each chapter introduces a new concept and a simple usage of that concept. I seldom read tech related books cover to cover but this book was an exception. I have been programming for over 20 years so being excited by new stuff is somewhat rare. I enjoy new stuff but this book whets the imagination!
Customer Rating:      Summary: barry naice! Comment: This book is simply awesome. You will need to come armed with at least a basic knowledge of php, but everything is pretty straight forward. The projects are well explained and applicable to a wide range of projects that you might be getting yourself into.
Customer Rating:      Summary: Does the basics. Comment: "Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."
Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.
I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."
This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:
"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."
PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.
The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.
If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.
|
|