Tweeked notes from the amazing tutorial and git resourse put out by jigarius.
Usually when a huge site makes the (wise) decision to migrate to Drupal, one of the biggest concerns of the site owners is How to migrate the old site's data into the new Drupal site. The old site might or might not be a Drupal site, but given that the new site is on Drupal, we can make use of the cool migrate module to import data from a variety of data sources including but not limited to XML, JSON, CSV and SQL databases.
This project is an example module showing how to go about importing basic data from a CSV data source though things would work pretty similarly for other types of data sources. Apart from a basic data import, I have also included certain other important things which a migration might involve, for example, import of 2 different types of entities and the relation between them.
Though being written to serve as a simple example for demonstrating the basics of Drupal 8 migrations, in this project we cover:
- Import of basic content as Drupal node entities or taxonomy term entities.
- Certain ways of basic data manipulation during data the import process.
- Import of basic relationships between two entities, eg, articles and tags.
- Import of images / files as Drupal file entities and relating them to the relevant content.
- Create a new D8 site. For this step by step, I'm using DevDesktop 2 but will walk bv\ack through the entire process with a more stable stack. Download the latest version at https://dev.acquia.com/downloads
- [ Create New Drupal Site ], select the latest D8 release ( not a distrobution, ie Lightnig or Open Atrium )
- Download the files from this repo and put them in the
- Install the module.
drush en xyz_migrate -y
- See current status of the migrations.
- Run / re-run the migrations introduced by this module.
drush migrate-import --group=xyz --update
As per project requirements, we wish to import certain data for an educational and cultural insitution.
- Academic programs: We have a CSV file containing details related to academic programs. We are required to create nodes of type program with the data.
- Tags: We have a CSV file containing details related to tags for these academic programs. We are required to import these as terms of the vocabulary named tags.
- Images: We have images for each academic program. The base name of the images are mentioned in the CSV file for academic programs. To make things easy, we have only one image per program.
Before we start with actual migrations, there are certain things which I would point out so as to ensure that you can run your migrations without trouble.
- Though the basic migration framework is a part of the D8 core as the migrate module, to be able to execute migrations, you must install the migrate_tools module. You can use the command drush migrate-import --all to execute all migrations. In this tutorial, we also install some other modules like migrate_plus, migrate_source_csv.
- Migration definitions in Drupal 8 are in YAML files, which is great. But the fact that they are located in the
config/install directory implies that these YAML files are imported when the module is installed. Hence, any subsequent changes to the YAML files would not be detected untill the module is re-installed. We can to do that by re-importing the relevant configurations like
drush config-import --partial --source=path/to/module/config/install.
- While writing a migration, you would usually be updating your migration over and over and re-running them to see how things go. So, to do this quickly, you can re-install the module containing your custom migrations (in this case the xyz_migrate module) and execute the relevant migrations in a single command like
drush config-import --partial --source=modules/custom/xyz_migrate/config/install -y && drush migrate-import --group=xyz --update -y.
- To execute the migrations in this example, you can download this repo and rename the downloaded directory to xyz_migrate. It should work without issues with a standard Drupal 8 install.
I usually prefer to name project-specific custom modules with a prefix of xyz (being the numeronym for customization). This way, have a naming convention for custom modules and I can copy any custom module to another site without worrying about having to change prefixes. The fact to be noted is that we have a .info.yml file instead of the .info file we were used to in Drupal 7.
Nothing fancy about the module file as such. It includes a basic project definition with certain dependencies on other modules. Though the migrate module is in Drupal 8 core, we need most of these dependencies to enable / enhance migrations on the site:
- migrate: Without the migrate module, we cannot migrate!
- migrate_plus: Improves the core migrate module by adding certain functionality like migration groups and usage of YML files to define migrations. Apart from that this module includes an example module which I referred to on various occasions while writing my example module.
- migrate_tools: General-purpose drush commands and basic UI for managing migrations.
- migrate_source_csv: The core migrate module provides a basic framework for migrations, which does not include support for specific data sources. This module makes the migrate module work with CSV data sources. There are other modules which provide support for other data sources like JSON, XML, etc.
- node: Academic Programs is our only Content Type, obviously created as nodes... we need the node module.
- taxonomy: We will be importing tags as taxonomy terms. Thus, we need the taxonomy module.
xyz_migrate.module (not needed)
In Drupal 8, unlike Drupal 7, a module only provides a .module file only if required. In our example, we do not need the
.module file, so I have not created one.
Just the basics here, just like the old D7 days:
# Package info
name: 'xyz: Migrate'
description: Module for performing site-specific migrations.
To import the data, we would obviously require a data source. NEVER implement a process that reads data from anywhere besides the
public:// data space! For the sake of this example, I have provided the source data in a import directory inside the module and the data is copied to the
public:// directory using
hook_install. The install file sets up the destination directory, Copy data files, define $dirname.
Like we used to implement hook_migrate_api() in Drupal 7 to declare the API version, migration groups, individual migrations and more, in Drupal 8, we do something similar. Instead of implementing a hook, we create a migration group declaration inside the config/install directory of our module. The file must be named something like migrate_plus.migration_group.NAME.yml where NAME (xyz) is the machine name for the migration group.
In this example, we define a migration group xyz to provide general information like:
- id: A unique ID for the migration. This is usually the NAME part of the migration group declaration file name as discussed above.
- label: A human-friendly name of the migration group as it would in the UI.
- description: A brief description about the migration group.
- source_type: This would appear in the UI to provide a general hint as to where the data for this migration comes from.
- dependencies: Though this might sound a bit strange for Drupal 7 users (like me), this segment is used to define modules on which the migration depends. When one of these required modules are missing / removed, the migration group is automatically removed.
We can execute all migrations in a given group with the command
drush migrate-import --group=GROUP.
Migration definition: Metadata
Now that we have a module to put our migration scripts in and a migration group for grouping them together, it's time we write a basic migration! To get started, we import basic data about academic programs, ignoring complex stuff such as tags, files, etc. In Drupal 8, like many other things, we do this in a YML file.
In migration declaration file, we declare some metadata about the migration:
- id: A unique identifier for the migration. In this example, I allocated the ID program_data, hence, the migration declaration file has been named migrate_plus.migration.program_data.yml. We can execute specific migrations with the command
drush migrate-import ID.
- label: A human-friendly name of the migration as it would appear in the UI.
- migration_group: This puts the migration into the migration group xyz we created above. We can execute all migrations in a given group with the command
drush migrate-import --group=GROUP.
- migration_tags: Here we provide multiple tags for the migration and just like groups, we can execute all migrations with the same tag using the command
drush migrate-import --tag=TAG
- dependencies: Just like in case of migration groups, this segment is used to define modules on which the migration depends. When one of these required modules are missing / removed, the migration is automatically removed.
- migration_dependencies: This element is used to mention IDs of other migrations which must be run before this migration. For example, if we are importing articles and their authors, we need to import author data first so that we can refer to the author's ID while importing the articles. Note that we can leave this undefined for now as we do not have any other migrations defined. I defined this section only after I finished writing the migrations for tags, files, etc.
Once done with the meta-data, we define the source of the migration data with the source element in the YAML.
- plugin: The plugin responsible for reading the source data. In our case we use the migrate_source_csv module which provides the source plugin csv.
- path: Path to the data source file - in this case, the program.data.csv file.
- header_row_count: This is a plugin-specific parameter which allows us to skip a number of rows from the top of the CSV. I found this parameter in the plugin class file, but I'm sure it must also be mentioned in the documentation for the migrate_source_csv module.
- keys: This parameter defines a number of columns in the source data which form a unique key in the source data. Luckily in our case, the program.data.csv provides a unique ID column so things get easy for us in this migration. This unique key will be used by the migrate module to relate records from the source with the records created in our Drupal site. With this relation, the migrate module can interpret changes in the source data and update the relevant data on the site. To execute an update, we use the parameter
--update with our
drush migrate-import command.
- fields: This parameter defines provides a description for the various columns available in the CSV data source. These descriptions just appear in the UI and explain purpose behind each column of the CSV.
Similarly, we need to tell the migrate module how we want it to use the source data. We do this with the destination element in the YAML.
- plugin: Just like source data is handled by separate plugins, we have destination plugins to handle the output of the migrations. In this case, we want Drupal to create node entities with the academic program data, so we use the node plugin.
- default_bundle: Here, we define the type of nodes we wish to obtain using the migration. Though we can override the bundle for individual imports, this parameter provides a default bundle for entities created by this migration. We will be creating only program nodes, so we mention that here.
If you ever wrote a migration in an earlier version of Drupal, you might already know that migration are usually not as simple as copying data from one column of a CSV file to a given property of the relevant entity. We need to process certain columns and eliminate certain columns and much more. In Drupal 8, we define these processes using a process element in the migration declaration. This is where we put our YAML skills to real use.
- title: An easy property to start with, we just assign the Title column of the CSV as the title property of the node.
- sticky: Though Drupal can apply the default value for this property if we skip it, I wanted to demonstrate how to specify a default value for a property. We use the default_value plugin with the default_value parameter to make the imports non-sticky with sticky = 0.
- uid: Similarly we specify default owner for the article as the administrative user with uid = 1.
- body: The body is a filtered long text field and has various sub-properties we can set. So, we copy the Body column from the CSV file to the body/value property (instead of assigning it to just body). In the next line, we specify the body/format property as restricted_html. Similary, one can also add a custom summary for the nodes using the body/summary property. However, we should keep in mind that while defining these sub-properties, we need to wrap the property name in quotes because we have a
/ in the property name.
- field_program_level: With this property we get to try out the useful static_map plugin. Here, the source data uses the values graduate/undergraduate whereas the destination field only accepts gr/ug. In Drupal 7, we would have written a few lines of code in a ProgramDataMigration::prepareRow() method, but in Drupal 8, we just write some more YAML. Here, we have the plugin specifications as usual, but we have small dashes with which we are actually defining an array of plugins or a plugin pipeline. With the first plugin, we call the function strtolower (with
callback: strtolower) on the Level property (with
source: Level). Once the old value is in lower case, we pass it through the static_map (with
plugin: static_map) and define a map of new values which should be used instead of old values (with the
map element). Done!
With the parameters above, we can write basic migrations with basic data-manipulation. If you wish to see another basic migration, you can take a look at migrate_plus.migration.program_tags.yml. In the sections below, I would explain how to do some complex tasks like importing taxonomy terms and their relations with nodes and uploading files / images and associating them with their relevant nodes.
Migrating taxonomy terms
This section is about migrating relations between two entities. Before jumping to this section, one must ensure that a migration has already been written to import the target entities. In our example, we wish to associate tags (taxonomy terms) to academic programs (nodes). For that, we need to import taxonomy terms and we do that in
In the migration for tags data, we use the tag text as a unique key for the tags. This is because:
- The tags data-source, program.tags.csv, does not provide any unique key for the tags.
- The academic programs data-source, program.data.csv, refers to the tags using the tag text (instead of unique IDs).
Once the tag data is imported, all we have to do is add some simple lines of YAML in the migration definition for academic programs to tell Drupal how to migrate the field_tags property of academic programs. As we did above for program_level we will be specifying multiple plugins for this property:
- explode: Taking a look at the data-source, we notice that academic programs have multiple tags separated by commas. So, as a first step, we use the explode plugin which would split/explode the tags by the delimiter , (comma), thereby creating an array of tags. We do this using
plugin: explode and
delimiter: ', '.
- migration: Now that we have an array of tags, each tag identifying itself using it's unique tag text, we tell the migrate module that these tags are the same ones we imported in migrate_plus.migration.program_tags.yml and that the tags generated during that migration are to be used here in order to associate them to the academic programs. We do this using
plugin: migration and
As simple as it may sound, this is all that is needed to associate the tags to the academic programs!
For the sake of demonstration, I also included an alternative approach for the migration of the field_program_type property. For program type, I used the entity_generate plugin, which does the following:
- Looks up for an entity of a particular type (in this case, taxonomy_term) based on a particular property (in this case, name).
- If no matching entity is found, an entity is created on the fly.
- The ID of the existing / created entity is returned for use in the migration.
So, in the process instructions for field_program_type, I use
plugin: entity_generate. So, during migration, for every program type, the entity_generate plugin is called and a particular taxonomy term is associated to the academic programs. The disadvantage of using the entity_generate method is when we rollback the migration, these taxonomy terms created during the migration would not be deleted.
To make sure that tag data is imported and available during the academic program migration, we specify the
program_tagsmigration in the
migration_dependencies for the
program_data migration. Now, when you re-run these migrations, the taxonomy terms get associated to the academic program nodes.
Migrating files / images
In our example, every academic program has an associated image file. Say, the client wants us to associate these files to the academic programs created during the migration. Though it might sound difficult, the solution involves only two steps:
Like we did with the taxonomy terms above, first we need to create file entities for each file. This is because Drupal treats files as file entities which have their own ID and then Drupal treats node-file associations as entity references, referring to the file entities with their IDs.
We create the file entities in the
file, but this time, using some other process plugins. Following are some important notes on the program_image migration:
- We specify the key parameter in source as the column containing file names, ie, Image file. This way, we would be refer to these files in other migrations using their names, eg,
- We mention an additional parameter constants in the source element.
- file_source_uri is used to refer to the path from which files are to be read during the import.
- file_dest_uri is used to refer to the destination path where files should be copied to. The newly created file entities would refer to files stored in this directory.
public:// URI refers to the files directory inside the site in question. This is where all public files related to the site are stored.
- In the process element, we prepare two paths - the file source path and the file destination path.
- file_source is obtained by concatenating the file_source_uri with the Image file column which stores the file's basename. Using
delimiter: / we tell the migrate module to join the two strings with a
/ (slash) in between to ensure we have a valid file name. In short, we do
file_source_uri . '/' . basename using the
- file_dest, in a similar way, is
file_dest_uri . '/' . basename. This is where we utilize the constants we defined in the source element.
- Now, we use the file_source and file_dest paths generated above with
plugin: file_copy. The file_copy plugin simply copies the files from the
file_source path to the
file_dest path. All the steps we did above were just for being able to copy the files.
- Finally, since the destination of the migration is
entity:file, the migrate module would use the file created in the previous step to generate a file entity, thereby generating a unique file ID.
Step 2, associate files:
Once the heavy-lifting is done and we have our file entities, we need to put the files to use by associating them to academic programs. To do this, we write add processing instructions for
file_image ( back in ):
Just like we did for taxonomy terms, we tell the migrate module that the Image file column contains a unique file name, which refers to a file entity created during the program_image migration. Hence, we write
plugin: migration and
migration: program_image. And it's done!
To make sure that tag data is imported and available during the academic program migration, we specify the
program_imagemigration in the
migration_dependencies for the
program_data migration. Now, when you run these migrations, the image files get associated to the academic program nodes.
But wait! Drupal just has to have one last obstacle before you can enjoy your efforts, and here is is:
A broken image reference or failed migration copy. Even though we defined our output path to be:
Rolling over the image has the image referenced in
Easy enough. There are only two places that can refine image imports on a clean install, lets check the obvious.
Sure enough, its in the default image field handling settings in Manage Display (admin/structure/types/manage/program/display) where all subtle evil can happen, such as rescaling and relocating files post-process. To resolve for this basic example, change the dropdown "Image style" value to "Original image".
Tada! Pulling teeth for the day is over.
Fun is not over yet - delimeter switch
The CSV parser is not so hot with wrapping up, or consistently escaping Quote characters for long sets of HTML. For my next trick on this D8 journey- I'll do an override of the delimeter character used to replace the [,] - going to use the [^]
One painful lesson you will quickly learn is that you have to REINSTALL both your module code AND data files before the changes are correctly installed and configured. The initial install scoots the datafiles over to the public file system, but you have to update all your files in the install directory, then fire off a:
drush config-import --partial --source=modules/custom/xyz_migrate/config/install -y && drush migrate-import --group=xyz --update -y
So back to the migrate_plus.migration.program_data.yml updates...
- academic program
# Under the source, we define the key "plugin" and other
# configurations specific to the plugin.
# We will be importing from a CSV file, so we will require
# the migrate_source_csv module which introduces the CSV
# source plugin with the identifier "csv".
# Specify the path to the CSV data source.
# Column delimiter. Comma (,) by default.
# Number of rows at the beginning which are not actual data.
Pardon the color-happy formatting, just pointing out the red delimeter override definition within the custom yml.
That's that. Just need to create an output set or modify the existing data set with a file-wide [search][replace] for [,] to a [^]