Posts tagged 'about'

Book review - Guide to Web Scraping with PHP

published on May 25, 2011.

It took me a while to grab myself a copy of Matthew Turland’sGuide to Web Scraping with PHP”, but a few weeks ago a copy finally arrived and I had the pleasure of reading it. I planned to buy it right as the print copy was announced, but then realised that php|arch accepts only PayPal as the payment method, which doesn’t work from Serbia, so I had to postpone the shopping for some better times. Fast forward 5-6 months and I found a copy on the Book Depository, which has no shipping costs! Yey!

My overall impression of the book is that it was worth the time and I’m really glad that I bought it. Matthew did a great job explaining all the tools we have at our disposal for writing web scrapers and how to use them. The chapter on HTTP at the beginning and a chapter with some tips and tricks at the end of the book, fit in great with the rest of the chapters, which are full of code examples. For the first reading, I’d recommend reading the book cover to cover, to get an overall view of all the tools presented, but later the chapters can be read independently.

As I said, the first chapter (actually, the second one, the first one is the introductory chapter :p), deals with the HTTP, especially with the parts of it which are needed for understanding, using and creating web scrapers.

The book then continues on different client libraries we can use to send HTTP requests and receive responses. Libraries like cURL or Zend_Http_Client are explained, but it is also explained how one can create his own using streams (the author does note that you’d be better of with an existing one!). For each of the tools it is described, how to handle things like authentication, redirects and timeouts, amongst others…

The second part of the book deals with preparing the documents for, and with the actual parsing of the data from these documents. Again, different tools are presented and explained, which one to use when and why. If none of the parsing tools can help, a most essential overview of the PCRE extension is given, too.

The book is finished with a nice “Tips and Tricks” chapter, which discusses real-time vs batch job scrapers, how to work with forms, the importance of unit testing… IMHO, without this last chapter, the book would not be finished.

I’m thinking hard right now, what bad things could I say about this book, but I can’t think of any. It is a guide, clear and straight-to-the-point, explaining what tools are there, which one to use and how for writing scrapers and that’s exactly what I wanted to know.

Yep, I’d recommend this book to anyone interested in web scraping with PHP :)

New adventures ahead!

published on May 23, 2011.

After a month or two of pondering and thinking and planning and thinking and some more thinking, today I finally told the management at work that I’ll be leaving in a month from today. Actually, I won’t be extending my contract with them which will end on June 24th.

Why? I don’t like the road the leadership of the company has taken (if this can be called a road at all…), the amount of energy the whole team is wasting on some small and silly things, the fact that extra effort is not recognised, thanked or paid and that there’s currently 8 of us in a roughly 25 m^2 room.

I know, there are bad moments everywhere when one just have to suck it up and deal with it for the whole team/company, but I’ve been doing that for quite some time and I had enough of that.

The only thing that makes me sad about this decision is that I’ll be leaving @milosija on his own here, a great mentor and friend from whom I’ve learnt a lot.

On the other hand, as of June 25th I’ll be starting my own company, which should be an exciting new experience. Still need to wrap my head around that one, so more on that in some future post…

Until then,

happy hackin’!

Tags: about, company, future, job, me.
Categories: Blablabla.

Moved

published on July 11, 2010.

As I said 2 weeks earlier, I decided to move my stuff over to linode. Well, I did it. Kinda.

First step was to change the nameservers of the domain. I thought it’s gonna take a while, so I took my time with moving the files and the database, but (at least on my end) the dns changes were alive and kickin’ under an hour.

My original idea was to run everything on nginx, but that soon turned out to be a bad idea cause there was no way I could setup properly the rewriting - if PHP was working right, CSS was broken. If CSS was working right, PHP was broken. At one point I broke everything. Hooray for me. Then I just took down nginx and all that php-fastcgi stuff and installed apache. Everything is lovely once again, the world is all shiny and pink and full of rainbows and unicorns. But fear not, I will not let nginx beat me in this mad game of rewrites! Just have to do that somewhere else, not on a live server.

Now to setup the emails and my job here is done. Oh, and the sidebar is a tad broken. Sorry bout that.

Carry on now, nothing left to see here.

Tags: about, me, moving.
Categories: Blablabla.

I'll be moving soon...

published on June 27, 2010.

Just a little heads up to all of you who stumble upon this place: I’ll be moving servers and stuff in the coming month to linode and most likely there’ll be some downtimes and fuckups so just thought to let you all know (this sounds like there’s someone reading this blog at all, heh).

Hope I won’t forget to make backups.

Tags: about, me, moving.
Categories: Blablabla.

2009 in a few words

published on January 02, 2010.

In 2009 some good stuff happened and some bad stuff happened. All in all, a crappy year. Hopefully, this year will be a lot better…

I graduated on June 26th, the topic was a Python desktop application that communicates with a web service; both sending and receiving data is possible. Started working on July 1st at Online Solutions as a PHP dev and started to “officially” give back to the Open Source community by joining the ZF Bug Hunt Days - so far few minor patches submitted and applied. Wrote a review on a ZF book and another one on jQuery and PHP is in the drafts.

I ain’t making plans for this year, cause I have failed miserably to realize my most important plan for 2009; I’ll just improvise throughout the year.

Dear 2009 - up yours. 2010 - bring it on.

Happy new year!

Tags: about, fail, me, random.
Categories: Blablabla, Free time.
Robert Basic

Robert Basic

Software developer making web applications better.

Let's work together!

I would like to help you make your web application better.

Robert Basic © 2008 — 2020
Get the feed