Drupal Feeds Importer: Hacking the Item Log

I use the Feeds module for a variety of tasks, usually external data acquisition, but I often use it to synchronize my live and development copies of Drupal installs.  For example, I have a module that contains a dictionary of variable names and definitions that is used to provide context for a variety of data relationships in my analysis data taht lives as Drupal content.  Now, the authoritative copy of this dictionary lives on the live install, but often during development I am testing out new variables and variable names, so the bleeding edge variable defiinition set is living on my dev install.  Once I get the variable definitions tweaked to where I like them, I want to migrate them from dev to live.  Feeds Importer can do this, but there’s a catch: a given Feeds Importer will create new entries of an existing piece of content if that particular Feeds Importer did not itself import said piece of content.  So, in order to prevent this extra data, you need to trick feeds importer into thinking that it has already imported the content in question.

  1. Create a GUID field on your Feeds Importer based on a unique identifier of the incoming data.
  2. Create a Feed on the source Drupal install (my “beta”) that will serve up a Feed to the destination install (my “live”).
  3. Insert entries in the “feeds_item” log for any content pieces that already exist, but which were NOT imported via the feeds importer in question (maybe these were created by another Feeds Importer, or they were manually inserted into Drupal from the UI).

Code 1 accomplishes step #3 in this process.  This might not be the most efficient method in terms of storage, but it works.  And it’s a whole lot more efficient than havin to pick through your tables and root out duplicate entries from an ill-advised use of a Feeds Importer.

Code 1: Inserting “phantom” entries into the feeds_log for content that you wish to insure non-duplication.  In this example, my feeds importer machine name is “dh_import_variable_definition”, my entity type is “dh_vawriabledefinition”, and my entity id column on the content entity table is “hydroid”.  For a standard node-based content type, you would substitute ‘node’ for the ‘entity_type’ field, and ‘a.nid’ in place of ‘a.hydroid’, and ‘from node as a’ in place of ‘from dh_variabledefinition as a’.

insert into feeds_item (id,entity_type, entity_id, guid, hash, feed_nid, url)
select ‘dh_import_variable_definition’, ‘dh_variabledefinition’, a.hydroid,
a.varkey, ”, 0, ”
from dh_variabledefinition as a
left outer join feeds_item as b
on (a.hydroid = b.entity_id and b.id = ‘dh_import_variable_definition’)
where b.entity_id is null;

  • Import th an export

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s