Importing Catalog Data Using Exchange

Exchange is an extension for Joomla 1.5. It requires Joomla version 1.5.7 or higher and PHP 5.2 or higher.

Exchange an exceptional support tool for Catalog. With the data migration functionality in Exchange, you can migrate data from existing databases, or data export files (such as CSV files).

This tutorial will introduce you to the main concepts of how to migrate data from an external source into Catalog.

Using Exchange for data migration requires a good working knowledge of database design in general.

Setting up a Simple Business Directory via CSV

In this example we will create a very simple class that will form the basis of a simple business or contact directory.  We will also create categories into which we will put the contacts.

Creating the Catalog Class

First we need to create the class.  We want to have four custom text fields in this class to describe the person's position and record their phone number, mobile number and email address.

Select Components -> Catalog from the Menubar.

Select Classes from the Linkbar.

Click New in the Toolbar.

Screenshot

Enter Contact in the Title field for the class.

Click the Attributes tab.

Copy and paste the following XML code into the text area for the Standard Attributes.

<field name="phone"
 type="text"
 size="16"
 class="inputbox"
 label="Phone"
 default="" />

<field name="mobile"
 type="text"
 size="16"
 class="inputbox"
 label="Mobile"
 default="" />

<field name="email"
 type="text"
 size="35"
 class="inputbox"
 label="E-Mail"
 default="" />

Click Save in the Toolbar.

When the class list returns, click the Info link next to the name of the class to see a popup summary of its definition.

Screenshot

Click the (X) in the top-right to close the popup.

Creating the Categories

Our next step is to create several categories to demonstrate how we can automatically put nodes into categories.

Select Components -> Catalog from the Menubar.

Select Categories from the Linkbar.

Click New in the Toolbar.

Screenshot

Enter Worker in the Title field for the class.  Press the tab key to jump to the next field.  A suggestion for the Alias will be created for you.
We also need to consider how we are going to look up the category for mapping.  We could use the Title or the Alias field, but another option is using the Reference field.  This could remain static even if the Title or Alias change slightly over time.  It could also be used, for example, to map to an internal organisational code.

For our purposes, copy the Alias into the Reference field or simple type worker into the reference field.

Finally, ensure that the Parent is set to ROOT in the selection list.

Click Save in the Toolbar.

Now repeat the previous steps to create a second category called Admin.

Once you have saved the category, the list screen should look similar to the following screen image.

Screenshot

Setting up the Migration

Now that we've prepared the class, we turn our attention to creating a migration file.

Exchange allows for custom reports to be placed in a location off the extension folder so that these reports will not be at risk of accidental deletion when a component is upgraded.

You should put all your custom reports in the following directory:

/administrator/custom/com_exchange/migrate/

For the report we are going to use in this example, creating an empty file at the following location:

/administrator/custom/com_exchange/migrate/contacts.xml

Let's start off with looking at the main structure of the file:

<?xml version="1.0" encoding="utf-8" ?>
<migrate>
  <name>Contact Import from CSV</name>
  <author>Andrew Eddie</author>
  <description>
    Imports my contacts into a catalog class from a CSV file
  </description>

  <!-- Put parameters here -->

  <transforms>
    <!-- Put main node migration here -->
    <!-- Put custom attribute migration here -->
    <!-- Put category mapping here -->
  </transforms>
</migrate>

We always start an XML file off with what is called the XML prologue.  Next we open the structure XML document with a migrate tag and close this at the end of the file.  All other instructions are placed within the migrate tag.

You must include a name tag.  This is the name of the migration that is shown in the list in the Exchange component.

You should provide some information about what the migration does in the description tag, and also who the author is in an author tag.

Next is a placeholder for input parameters. These are fields that provide user supplied information for the migration, such as a file upload control, a radio set to run the migration in test mode, and many others.

Following this is a transforms tag.  Within this tag we provide the logic for the data migration itself. In this example we will be doing two types of migration.  The first will be to import data into the nodes table (a node is the basic data unit used in Catalog, called a Catalog Item in the User Interface).  Following this will be mapping code for each of the custom attributes.  Finally we will map some of the node records to categories.

Parameters

There are a number of parameters you can provide to give the user finer control over the migration.  In our case we are going to use the following set (replace the "Put parameters here" XML comment with the code):

  <params addpath="/administrator/components/com_exchange/helpers/elements">
    <param name="src_files[]"
      type="file"
      default=""
      label="Source File"
      description="" />

    <param name="delimiter"
      type="text"
      default=","
      label="Delimiter"
      description="" />

    <param name="enclosure"
      type="text"
      default=""
      label="Enclosure"
      description="" />

    <param name="dest_prefix"
      type="text_dbprefix"
      size="12"
      default="jos_"
      label="Destination Prefix"
      description="" />

    <param name="test_mode"
      type="radio"
      default="0"
      label="Test Mode"
      description="Run all queries in test mode">
      <option value="0">No</option>
      <option value="1">Yes</option>
    </param>

    <param name="show_data"
      type="radio"
      default="0"
      label="Show Data"
      description="Shows the source data in a table">
      <option value="0">No</option>
      <option value="1">Yes</option>
    </param>

    <param name="debug_mode"
      type="radio"
      default="0"
      label="Debug Mode"
      description="Turns on debugging information">
      <option value="0">No</option>
      <option value="1">Yes</option>
    </param>

    <param name="limit"
      type="text"
      size="12"
      default=""
      label="Selection Limit"
      description="The number of records to select from the source data" />
  </params>

We enclose all of the parameters in a params tag. We then define each parameter for which we want user information. These parameters are the same format as used in components, modules, views and other elements of the Joomla! API.

Source File

This is the CSV file you will upload.

Delimiter

Allows you to specify the field delimited.

Enclosure

Allows you to specify the field enclosure (usually a single or double quote).

Destination Prefix

This is the database prefix for the destination tables. It defaults to the setting for the site.

Test Mode

This option will record all the queries but not actually execute them.

Show Data

This option will display a table of the input data.

Debug Mode

Debug mode displays additional diagnostic information to help identify problems in the migration.

Selection Limit

This field is used to control the number of records processed. Leave it empty to do all records. However, if your source data set is quite large, you might want to set a limit of 20 or so in test mode initially.

Transformations

Now that we have the input data set up, we need to look at how to process it. We do this in two parts. First we migrate as much data as we can into the main node table. Following this, we migrate each attribute individually.

But before we do that, let's make some assumptions about the source data we are using.

Source File

We are going to assume this is the format of the file that has come out of some export process. The file can be located anywhere.

Emp ID Last Name First Name Position Phone Email Mobile Department Notes
21 Doe John Worker 1234 This e-mail address is being protected from spambots. You need JavaScript enabled to view it 5678 Sales Good worker

When the file is uploaded, the values of the columns are made available in variables based on the headings in the first row of the file. Any non-alphanumeric character is replaced with an underscore. In our example the variables that will be made available are:

$emp_id $last_name $first_name $position $phone $email $mobile $department $notes

Migrating to the Nodes Table

The first task is to import the base nodes for the directory. For this we need a maptable tag and in this case it takes the following arguments:

name

The name of the destination table (without the database prefix).

from

Is this case it is from a file.

Within the maptable tag we place a maprecords tag and this takes just one argument in our case:

replace

This will use a REPLACE INTO database instruction instead of an INSERT INTO instruction. We do this when the import may not be a one-off event and there is the possibility of importing more data in the same format at a later date.

The complete XML block looks like this:

    <!-- Put main node migration here -->
    <maptable name="jxcatalog_nodes" from="file" enabled="y">
      <maprecords replace="y">
        <field name="reference" from="emp_id" />
        <field name="id">
          <expression>lookupId( $emp_id, 'jxcatalog_nodes', 'reference' )</expression>
        </field>
        <field name="title">
          <expression>$last_name.', '.$first_name</expression>
        </field>
        <field name="subtitle" from="position" />
        <field name="class_id">
          <expression>lookupId( 'Contact', 'jxcatalog_classes', 'title' )</expression>
        </field>
        <field name="body" from="notes" />
        <field name="published" value="1" />
      </maprecords>
    </maptable>

We try to massage as much data as we can into the node. Within the maprecords tag we specify a field tag for each field that we want to process. The name attribute in the field tag is the field name in the nodes table to which the data (or value) is to be mapped.

reference

We are assigning the node reference field the value of the emp_id column (a unique number for the employee) by using the from attribute. The reference field is specifically provided to allow for referencing to external data sources like this. This is the most basic mapping that can occur being from a column in the CSV file to a column in the database table.

id

This field tag uses a very special and powerful child tag called expression. This tag evaluates a PHP expression and returns the result.
What we are saying in this case is to evaluate an expression, and then assign it to the id field. As we have chosen the replace method (see above) we need to ensure the primary key of the table is set if it can be found.

Exchange provides a helper function for this purpose called lookupId and it takes three argument: the value to look up, the table to use, and the field in the table for which to find a matching value. In this case we are asking the function to find any reference field matching the Employee ID and if so, return the ID of the row in the nodes table. In this way we can replace the information of existing node records rather than adding new ones.

title

For the title field, we are again using the expression tag. For the title we are simply joining together the Last Name and First Name separated by a comma and space. To do this we just use PHP's string joining operators.

subtitle

The subtitle is a simple map from the position column in the CSV file.

class_id

This is the internal field in the nodes table that specifies which class is used. We use the lookupId helper function again to find the ID of our

Contact class (matching by the title of the Class).

body

The body is a simple map from the notes column in the CSV file.

published

For the published field, we are manually setting the value to 1, meaning it will be automatically published.

Migrating the Attributes

The next step in our migration is to bring across the custom attributes.  Once again we use the maptable tag but this time we are pushing information into the jxcatalog_attributes table (the from attribute is file as it was in the preceding cases).

When dealing with attributes and the possibility that we are replacing data, we must first clear all existing attributes for the node if it exists.  If we don’t do this, we will double up on the information stored potentially leading to a horrible mix of new and out-of-date data.

To clear the existing attributes for a node, we use the deleterecords tag.  This tag takes no attributes and contains one field tag.  Using the principles we observed before with field and expression tags, we can see that for each line of the CSV, we will lookup the node_id based on the

Employee ID (stored in the reference field if the node exists) and then delete all records in the table where node_id equals the value returned by the expression.

Please note that this technique only works if you do not add more custom fields and edit them between the times you run the migration.  Any custom data you add will be deleted when you run the migration.

    <!-- Put custom attribute migration here -->
    <maptable name="jxcatalog_attributes" from="file" enabled="y">
      <!-- Only need to delete the attributes for this node once -->
      <deleterecords>
        <field  name="node_id">
          <expression> lookupId( $emp_id, 'jxcatalog_nodes', 'reference' ) </expression>
        </field>
      </deleterecords>

      <maprecords>
        <field  name="node_id">
          <expression> lookupId( $emp_id, 'jxcatalog_nodes', 'reference' ) </expression>
        </field>
        <field  name="name"      value="phone" />
        <field  name="value"     from="phone" />
        <field  name="ordering"  value="1" />
      </maprecords>

      <maprecords>
        <field  name="node_id">
          <expression> lookupId( $emp_id, 'jxcatalog_nodes', 'reference' ) </expression>
        </field>
        <field  name="name"      value="mobile" />
        <field  name="value"     from="mobile" />
        <field  name="ordering"  value="2" />
      </maprecords>

      <maprecords>
        <field  name="node_id">
          <expression> lookupId( $emp_id, 'jxcatalog_nodes', 'reference' ) </expression>
        </field>
        <field  name="name"      value="email" />
        <field  name="value"     from="email" />
        <field  name="ordering"  value="3" />
      </maprecords>
    </maptable>

We only need to delete the existing attributes once.  After this we then add the required data for the three custom fields.  For this we use a maprecords tag.  We don’t use the replace attribute because we have already deleted all the attributes for the node.

We are adding values for the following four fields:

node_id

The attributes table has what is called a foreign key to the nodes table called node_id. In this case we again use the expression tag to lookup the ID (called the primary key) from the nodes table based on the Employee ID external reference.

name

The name attribute must match the name attribute as you defined it in the Class XML.

value

The value will be take from the matching column in the CSV file.  In this case we will be pulling values from the  phone, mobile and email columns respectively.

ordering

If you want to ensure that your attributes appear in a particular order, such as the order in which they are defined in the Class XML then you can provide the ordering attribute as set values appropriately.  Setting this field is optional.

Mapping the Catagories

The final step in our migration is to apply the category mappings.  As for the attributes, we again we use a combination of the deleterecords tag and the maprecords tag, within the maptable tag, to remove and insert records in the jos_jxcatalog_categories table.

The deleterecords tag this time uses a dual condition to delete existing records where the left_id matches the node ID field and the right_id matches the category reference field

The maprecords tag then inserts records where the left_id matches the node ID field and the right_id matches the category reference field.

    <!-- Put category mapping here -->
    <maptable name="jxcatalog_node_category_map" from="file" enabled="y">
      <deleterecords>
        <field  name="node_id">
          <expression>
            lookupId( $emp_id, 'jxcatalog_nodes', 'reference' )
          </expression>
          <expression>
            lookupId( $position, 'jxcatalog_categories', 'reference' )
          </expression>
        </field>
      </deleterecords>

      <maprecords>
        <field  name="left_id">
          <expression>
            lookupId( $emp_id, 'jxcatalog_nodes', 'reference' )
          </expression>
        </field>
        <field  name="right_id">
          <expression>
            lookupId( $position, 'jxcatalog_categories', 'reference' )
          </expression>
        </field>
      </maprecords>
    </maptable>

Performing the Migration

When you have set up our Catalog Class, and prepared the migration XML file you are ready to test and perform the migration.

Select Components -> Exchange from the Menubar.

Select Migrate from the Linkbar.

Screenshot

You should see the title of the XML migration file that you created.  Click that link.

On the next page you will see all the options that you set up.

Screenshot

Click the Browse button and find the CSV file that you created.

For the Delimiter, you will normally enter a comma (this is the default).

For the Enclosure you might enter a single quote or a double quote depending on how you enclose long text strings in the CSV file.

The Destination Prefix will normally be jos_.

The Test Mode option allows you to display all the queries for the migration without actually running them.  We usually recommended selecting Yes for the test mode on your first run.

The Show Data option allows you to see a table of the data as parsed by Exchange.  You may chose to display this if you are having trouble getting the migration results you want.  Please note that if your CSV contains a great many rows of data, showing the data table will slow down your browser (see the Selection Limit below for how to alleviate this problem).

The Debug Mode option allows you to see additional diagnostic information.  Using this option is useful if you are trying to debug problematical results from an expression tag.

Finally, the Selection Limit allows you to work on a subset of the rows in the CSV file.  This is useful for debugging large data sets that would otherwise take a long time to process.  Using a small number (for example, 20) will allow you to see the data table (using the Show Data option) and used in conjunction with Test Mode, can verify that a migration will run successfully.

In this first example we will use the options Test Mode and Show Data.

Screenshot

You can see how Exchange has imported the CSV data and also shows the query it expects to run.

If you look carefully at the queries, you might notice that the queries that are run for the custom attributes have empty node_id values.  Don’t immediately panic if you see this.  Because you are running in Test Mode, you haven’t actually created the nodes to which that attributes will be assigned.  When you run the script out of Test Mode the situation will correct itself.

Now click Back in the Toolbar.  If the Source File is empty then browse for it again.  This time select Test Mode and Debug Mode and then click Migrate in the Toolbar.

Screenshot

You will now see additional diagnostic information returned by the various tags.  For example, the maptable tag displays some of the attribute settings.  The expression tag shows the code it is trying to evaluate followed by the value returned by the expression.

If you see potential problems, click Back in the Toolbar again, edit the XML migration file and rerun it in Test Mode and Debug Mode.

When you are satisfied that you have done everything to ensure the migration will run with the desired results.

Screenshot

With Test Mode off, you can see that records have been created in the nodes table and that the attributes are now picking up a valid node_id value.  You can also see the category mapping at the end.

Conclusion

This completes the general theory behind migrating data from a CSV file into Catalog.

Please refer to the Appendix as this contain additional schema information that may be of assistance to you.

Appendix

Catalog Database Table Reference

The following outlines the table structure and field data types for Catalog version 1.1.  Please note that the tables are named with the default database prefix of jos_.  Replace this prefix with the one used for your site appropriately.

jos_jxcatalog_attributes

This table stores the values of the custom attributes mapped to a node.

id            int unsigned
node_id        int unsigned
taxon_type_id    int unsigned
taxon_id        int unsigned
name            varchar
value            varchar
ordering        int
access        int unsigned

jos_jxcatalog_categories

This table stores the catagories for Catalog.  Please note that Catalog does not use the Joomla! jos_categories table.  Catalog also supports deep category trees.

id            int unsigned
parent_id        int unsigned
right_id        int
left_id        int
title            varchar
alias            varchar
subtitle        varchar
reference        varchar
path            varchar
body            text
published        tinyint
ordering        int
params        text
media            text
checked_out        int
checked_out_time    datetime
created_date    datetime
modified_date    datetime
metakey        text
metadesc        text
georef_x        double
georef_y        double
georef_search    varchar
georef_params    mediumtext

Catalog uses a pre-order traversal technique (this is what the left_id and right_id fields are used for) to improve the efficiency of queries.  If you use Exchange to migrate categories from another source, please ensure that the parent_id is set to the ROOT node (usually 1).  Once migrated you will need to trigger a rebuild.  To do this, edit any Catalog category and then save it.  Once you have done this, you should see the L-R values change (they will have been all zero previously).  Do not use Exchange to set values for the left_id and right_id unless you are familiar with Nested Set theory.

jos_jxcatalog_classes

This table stores information about the Catalog classes.

id            int unsigned
title            varchar
description        text
comments        text
attributes_std    mediumtext
attributes_opt    mediumtext
params        mediumtext
group_id        int unsigned
published        tinyint
ordering        tinyint
checked_out        int unsigned
checked_out_time    datetime
created_date    datetime
created_user_id    int unsigned
modified_date    datetime
modified_user_id    int unsigned

jos_jxcatalog_currencies

This table stores information about monetary currencies.

id                int
title            varchar
code                varchar
symbol_left        varchar
symbol_right        varchar
decimal_point    char
thousands_point    char
decimal_places    int unsigned
rate                double
modified_date    datetime
published        int
ordering            int

jos_jxcatalog_node_category_map

This table maps nodes (left_id is a foreign key to the node ID) to categories (right_id is a foreign key to the category ID).  Note that by using this table, nodes can be mapped to multiple tables.

left_id        int
right_id        int
ordering        int

jos_jxcatalog_node_user_map

This table maps nodes (left_id is a foreign key to the node ID) to users (right_id is a foreign key to the user ID in the jos_users table).

left_id        int unsigned
right_id        int unsigned

jos_jxcatalog_nodes

This table stores the information for Catalog nodes.  These are referred to as Catalog Items in the Administrator.

id            int unsigned
parent_id        int unsigned
left_id        int unsigned
right_id        int unsigned
class_id        int unsigned
status_id        int unsigned
reference        varchar
title            varchar
alias            varchar
subtitle        varchar
abstract        text
body            text
comments        text
node_date        datetime
node_user_id    int unsigned
start_date        datetime
finish_date        datetime
params        text
media            text
metakey        text
metadesc        text
language        varchar
version        int unsigned
score            int unsigned
score_count        int unsigned
georef_x        double
georef_y        double
georef_search    varchar
georef_params    mediumtext
price1        double
price2        double
currency_id        int unsigned
available        int unsigned
available_date    datetime
stock            int unsigned
published        tinyint
access        int
ordering        int
checked_out        int unsigned
checked_out_time    datetime
created_date    datetime
created_user_id    int unsigned
modified_date    datetime
modified_user_id    int unsigned

jxcatalog_nodes_compare

This table allows comparison relationships to be created between nodes.  Any number of comparisons can be supported including one-to-one, one-to-many, many-to-one and many-to-many.

You can use this table, for example, to compare prices between nodes in a Product class and nodes in a Retail Outlet class.

id            int unsigned
node1_id        int unsigned
node2_id        int unsigned
title            varchar
alias            varchar
reference        varchar
currency_id        int
price1        double
price2        double
stock            int unsigned
description        mediumtext
comments        mediumtext
params        mediumtext
published        int
ordering        int
access        int unsigned
checked_out        int unsigned
checked_out_time    datetime
created_date    datetime
created_user_id    int unsigned
modified_date    datetime
modified_user_id    int

jxcatalog_options

This table is used to store additional optional attributes for a node.

id            int unsigned
node_id        int unsigned
taxon_type_id    int unsigned
taxon_id        int unsigned
label            varchar
modifier_value    double
modifier_type    int unsigned
ordering        int
access        int unsigned

jxcatalog_taxa

This table stores the taxa (plural of taxon) used in Catalog.  These are referred to as List Items in the Administrator.

id            int unsigned
title            varchar
description        text
type_id        int unsigned
icon            varchar
color            varchar
factor        double
plugin_id        int unsigned
published        tinyint
ordering        tinyint

jxcatalog_taxon_types

This table stores the taxon types used in Catalog.  These are referred to as List Types in the Administrator.

id            int unsigned
name            varchar
title            varchar
ordering        tinyint
is_core        tinyint