Feed Automation: WP-Cron

This will the the first in a series about ways to automate content into WordPress.  However good it is to have fresh new content, sometimes other blogs, news feeds, in fact, any website might have content that you might want imported onto your blog.  Alongside automating new posts for your blog, feed automation can also be useful to take WordPress out of its normal scope, for instance, reading in email and turning WordPress into your new email client!

The first way that we shall investigate is using the functions built into WordPress in the form of the WP-Cron functionality.

ExplainedUsageDrawbacksImplementation
 

Explained

The WP-Cron capabilities work in the background of WordPress and are triggered when page hits happen, this means that on a blog with high traffic it is reasonable that you could expect these triggers to run regularly and your content to be imported.

Usage

So in what situation will WP-Cron be able to do everything you want…  Ultimately, this will serve the purpose for most use cases, the feed will be processed and imported soon after a user makes a request to the site, triggering the cron to run.  This means that the page the user is viewing will not have the updated content but after their next couple of clicks it should be getting close.  Unless you only have one user once in a while then there should not be too many problems with the delay in importing items.

Drawbacks

Where this option falls down is if you need it to reliably poll the feed regularly.  If the feed you are polling only shows the 10 latest items, and your site doesn’t poll it for an extended amount of time then it is likely that you will start to miss some items, this may or may not be important to you.  With a model of importing news this might mean a few articles are missing which wouldn’t be ideal but would not cause a great impact.  If your use of the feed is for something where you wish to have every item, maybe with the previously mentioned use of having it check your email, then you might be quite annoyed that you’re missing a big block of items.

The other main drawback from this solution is the additional load to your website, while you are only importing a feed or two this will be minimal, but if you want your site to parse many feeds, maybe apply some logic to them so you only import relevant items, this will cause a much higher load.  With this situation PHP doesn’t allow a great range of options for running these import processes concurrently and if you do then it will tie up many instances that should otherwise be serving your content to users.

Implementation

A basic task can be registered using the wp_schedule_event function, you can set an interval of either hourly, twicedaily, or daily and this will then call a hook at at appropriate time.  This then makes it simple to use as you just need to use the action defined to trigger your function.

Let’s look at an example:

if ( !wp_next_scheduled( 'cron_hook' ) ) {
  wp_schedule_event( time(), 'hourly', 'cron_hook' );
}

function cron_run() {
  //run feed handler
}

add_action( 'cron_hook', 'cron_run' );

This stub shows a basic implementation that would trigger the cron_hook action hourly.  Using that action you could then run your feed importer to grab the new articles.

Custom Time Period

If hourly just isn’t often enough for you, or you want any other custom periods then don’t let the built in intervals limit you. There is a useful cron_schedules hook that you can use to add your own intervals.

function custom_schedule( $schedule ) {
  $schedule['fivemin'] = array(
    'interval'   => '300',
    'display'    => __( 'Every 5 minutes' ),
  );
  return $schedule;
}

add_filter( 'cron_schedules', 'custom_schedules' );

Using this you can now trigger your action as many times as you want during the day, just update the interval value to indicate the number of second that should be between triggers then as soon as a user hits the site after that time period has expired, the action will run.

Coming Next

The next part in the series will look at using a scripted feed handler running as a service on a separate Linux server to overcome the drawbacks highlighted.