Categories
t3ch

Google Reader and Zend_Http_Client Saves Time

I’ve been working a lot on Florida Death Metal lately. Part of that means that I need to know news, as it happens, from a lot of different sources. That can be difficult (and a pain in the ass) to keep track of. The last thing I want to do, is visit 50 different sites every 15 minutes to see if there’s any news I should know about.

The first thing I did, to filter input, was to setup a Google Reader account for Florida Death Metal. At least this allowed me to see updates from a variety of other sites in a single place. This also allows me to search through those new listings for keywords to bands and events I think are important to know about.

This still requires me to visit Google Reader, and parse through a lot of stuff (routinely over 1000 new items) to find out what’s going on in the world. The real dilemma I was having, was trying to implement an active alert system to news going on around me. The active part, was me actively searching through Google Reader for information relevant to Florida Death Metal. 

Needless to say, this started to suck. I have a great full time job, a wonderful wife, and a million other things I like to do with my spare time. Spending my days and nights on Google Reader, struggling to keep up with news for a project blew.

I love Florida Death Metal. I’ve never been involved in something that means so much to me. However, there are only so many days in the week, and so much time in each of those days. I need a way to passively keep up with relevant news and events. I needed some automation, so I could spend time hanging drapes for Melissa, or having a beer with Rob.

Enter Zend_Http_Client ….

One of my favorite programming tricks is breaking down requests to servers, and finding a way to do them programatically. So, I got to thinking about how I could parse through all of my stuff in Google Reader. For each of the views in Google Reader, there is an associated RSS feed. Well, that makes things simple enough. I could just grab the RSS feed and parse it. 

One minor detail, the RSS feeds aren’t publicly available. You have to be logged in to use them. This makes sense. I imagine Google doesn’t want to be used as an aggregator of RSS feeds to be used as a proxy to other sites. Sorry :/

So, here’s a breakdown of what I needed to write a script to do:

  1. Login to Google
  2. Grab the RSS Feed for ‘All Items’
  3. Parse the RSS Feed for keywords relevant to Florida Death Metal
  4. Email me alerts (if there are matches)

Not too bad. So, Here’s the script I came up with. I hope you like it :

<?php
/*
Description: Script to parse through a google reader aggregation of content for keywords
Version: 1.0
Author: Cory Collier
Author URI: http://corycollier.com/
*/
 
//Define the constants required for the script
define('GOOGLE_PASSWORD', 	'<your password here>');
define('GOOGLE_RSS_URI', 		'<the url for your google reader rss feed>');
define('GOOGLE_LOGIN_URI',		'https://www.google.com/accounts/ServiceLoginAuth?service=reader');
define('GOOGLE_LOGIN_EMAIL', 	'<your gmail account username / email>');
define('MAILER_FROM_ADDR',		'<where the email should come from>');
define('MAILER_TO_ADDR',		'<where the email should go to>');
define('MAILER_SUBJECT',		'Google Reader Parsing');
 
/*
* a Zend Framework Installation _MUST_ be located on the include path for PHP
*/
require 'Zend/Loader.php';
Zend_Loader::registerAutoload();
 
//Instantiate a new Zend_Http_Client, with the google login url
$client = new Zend_Http_Client(GOOGLE_LOGIN_URI);
 
/*
 * Set the client cookie jar ...
 * Set the method to POST ..
 * Set the parameters to post with
 */
$client->setCookieJar()
	->setMethod(Zend_Http_Client::POST)
	->setParameterPost(array(
		'continue'		=> GOOGLE_RSS_URI,
		'service'		=> 'reader',
		'niu'			=> 1,
		'hl'			=> 'en', 
		'Email'		=> GOOGLE_LOGIN_EMAIL,
		'Passwd'		=> GOOGLE_PASSWORD,
		'PersistentCookie'	=> 'yes',
		'asts'			=> ''
));
 
//make the login request, and store the response in the $response variable ...
$response = $client->request('POST');
 
//If the response was successful, change the uri value for the client object
// to the appropriate rss file for parsing
$client->setUri(GOOGLE_RSS_URI)
 
		//Change the request method to GET
		->setMethod(Zend_Http_Client::GET); 
 
//send the request, and store the results of it
$response = $client->request();
 
//Initialize an array of keywords to look for
$keywords = array (
	//Whatever your keywords are you're looking for
);
 
//SimpleXML is great!
$sx = simplexml_load_string ($response->getBody());
 
//Iterate through each of the retrieved entries
foreach ($sx->entry as $entry ) 
{	//Now, iterate through each of the defined keywords / keyphrases
    foreach ( $keywords as $keyword ) 
    {	//First, check to see if the title contains a keyword / keyphrase
        if ( stristr((string)$entry->title, $keyword) ) 
        {	//Append any matches to the matches arrays
            $matches[] = (string)$entry->link['href'];
        }
    	//Next, check to see if there are any matches in the summary
        if ( stristr((string)$entry->summary, $keyword) ) 
        {	//same deal: If there are matches, add them to the stack
            $matches[] = (string)$entry->link['href'];
        }
    } //END keyword iteration
 
} // END posting iteration
 
//IF matches were found, send an email
if(count($matches)) 
{	//Initialize a variable to store a mesage in
    $message = '';
 
    //Iterate through each of the matches
    foreach($matches as $match)
    {	//For each of the matches, append them to the message string, separated 
    	// by newlines
        $message .= "\n" .  $match;  
    }
    //Initialize a new Zend_Mail object 
    $mail = new Zend_Mail;
 
    //Set the parameters necessary to send a message 
    $mail->addTo(MAILER_TO_ADDR)
        ->setFrom(MAILER_FROM_ADDR)
        ->setSubject(MAILER_SUBJECT)
        ->setBodyText($message);
 
    //I never trust email, so wrap the email execution in a try/catch statement
    try  
    {	//Send the mail
        $mail->send();   
    } catch (Exception $e ){
        //Do something here
    }
}

Wrapping all of this up, I stuck this script on a spare debian box at my house, and setup a cronjob to run the script every 15 minutes. It saves me a lot of time. I’d love to hear some feedback from y’all about how I got this done. I’m a script guy at heart. So, this stuff is super fun for me.

I know I should have checked the success of the initial login attempt before assuming the second request would get anything at all. Keep in mind though, this is a script I use to make my own life easier. Exceptions being thrown here cause me no harm. If I don’t get anything, I can check my error logs for issues. Not a perfect solution, but it’s working pretty well for me now.

By ftpcory

I am a fitness, technology, and music enthusiast. I enjoy introspection and reflection. I believe in decentralization of the web and freedom of expression.

2 replies on “Google Reader and Zend_Http_Client Saves Time”

Leave a Reply

Your email address will not be published. Required fields are marked *