JENS MALMGREN I create.

Porting my blog for the second time, walk the old data.

This is post #2 of my series about how I port this blog from Blogengine.NET 2.5 ASPX on a Windows Server 2003 to a Linux Ubuntu server, Apache2, MySQL and PHP. A so called LAMP. The introduction to this project can be found in this blog post https://www.malmgren.nl/post/Porting-my-blog-for-the-second-time-Project-can-start.aspx.

In my previous post I got started by writing a little Perl program that walks the directory of the old blog.

#!/usr/bin/perl
use strict;
use warnings;

Walk('/usr/local/bin/jensblog');

# http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-walk-the-old-data.aspx
sub Walk
{
	my ($directory) = @_;
	
	opendir my $dirHandle, $directory or die "Failed to open $directory: $!";
	my @dirItems = readdir $dirHandle;
	
	for my $dirItem (@dirItems)
	{
		if ($dirItem =~ /^(\.|\.\.)$/)
		{
			next;
		}
		
		my $path = $directory . "/" . $dirItem;
		
		if (-f $path)
		{
			print "f $path\n";
		}
		if (-d $path)
		{
			print "d $path\n";
			Walk($path);
		}
	}
}

This is just the beginning of a program. Currently this walks the directory structure and prints f for file and d for a directory and that is all there is. From here I can start filtering directories I am not interested etc. It is expecially the xml files I am interested in and specifically categories.xml. That is the file I will start working on. For a moment I stopped working on the program walking the directory structure and started on a program working on the categories.xml file.

This type of project when everything is new, the operating system is new to me and the development tools are new it is a little like a holiday to a foreign country before you started the jurney. I use to say that the most pleasent part of the jurney is planning it. At that stage everything is fun and adorable simply fantastic. When you get there you have all sorts of issues to deal with. With this jurney I had a great struggle to load an XML parser into Perl. I searched for XML and Perl and find out that the parser I want is XML::LibXML. How do I get it? In Perl you get it via a package manager. So I found out that I had to use CPAN. So I happily burned some hours using CPAN. It said a lot to me but inbetween it also said that it could not compile and make and what not.

At some point I came to the conclusion it was something with Ubunto. Found this article: http://twiki.org/cgi-bin/view/TWiki/HowToInstallCpanModules

It turns out that on Ubunto (Wich is a Debian flavor) there you use apt-get, not CPAN. AHA! This command: apt-get install libxml-libxml-perl to get XML::LibXML so I had to add Lib in front of what I needed and replace :: with a dash and end with dash perl. It worked, wohoo! And then I fell into the next pithole "how is this LibXML thing working?".

≺?xml version="1.0" encoding="utf-8" standalone="yes"?≻
≺categories≻
    ≺category id="6e20e883-bcec-4f3a-80e5-a3ffabd86453" description="1980" parent=""≻1980≺/category≻
    ≺category id="4d30923f-e7e6-47af-bc56-ce7764bb1114" description="1982" parent=""≻1982≺/category≻
    ≺category id="b7e42612-9919-4adb-ae04-f2ec91e0440f" description="1983" parent=""≻1983≺/category≻
    ≺category id="0c8ce848-1f55-4308-b2d2-f9344531d7ec" description="1984" parent=""≻1984≺/category≻

Here above is an example of the categories XML file. These long codes in the id attribute are GUIDs, Global Unique Identifiers. I could keep them but to me the GUIDs has no added value when I use a database. I would rather use an integer as ID.

With small steps I start to get control over the categories file.

#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML; # apt-get install libxml-libxml-perl

my $filename = "/usr/local/bin/jensblog/App_Data/categories.xml";
my $parser = XML::LibXML-≻new();
my $xmldoc = $parser-≻parse_file($filename);

for my $category_node ($xmldoc-≻findnodes("//category"))
{
	my $strGuidId = $category_node -≻ findnodes('@id') -≻ to_literal;
	my $strCategoryName = $category_node-≻textContent;
	print $strGuidId . " " . $strCategoryName. "\n";
}

This program here prints a list of GUID and categories from the XML file. It looks like this:

6e20e883-bcec-4f3a-80e5-a3ffabd86453 1980
4d30923f-e7e6-47af-bc56-ce7764bb1114 1982
b7e42612-9919-4adb-ae04-f2ec91e0440f 1983
0c8ce848-1f55-4308-b2d2-f9344531d7ec 1984

My perl program will need to convert from the GUIDs into my own IDs. So here comes the moment to set up a new database for my blog and create a table category. It have two fields: ID and Name.

mysql≻ create database jensblog;
Query OK, 1 row affected (0.00 sec)

mysql≻ use jensblog;
Database changed

mysql≻ create table category ( ID int auto_increment primary key, name varchar(80) not null);
Query OK, 0 rows affected (0.10 sec)

mysql≻ desc category;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| ID    | int(11)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(80) | NO   |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

Here I created my database and table. The ID field is auto_increment wich means it will get its value by the database engine rather than that I tell MySql what number I want. Now on to the next challenge: Convince Perl that it can connect to the database and upload data to it! I already had my lesson, the hard way, that on Ubunto you use apt-get to get perl packages so I went for the search 'perl mysql ubuntu' and found this page: http://www.microhowto.info/howto/connect_to_a_mysql_database_using_perl_dbi.html pretty much explained all the things I had to know.

My previous script reading the categories now learned how to connect to the database. And it also creates an empty hash table, "dictionary", that can convert from GUID to new database ID. Then I let the loop add the categories and fill my conversion dictionary. The program looks like this:

#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML; # apt-get install libxml-libxml-perl
use DBI; # apt-get install libdbd-mysql-perl

my $dbh = DBI-≻connect('dbi:mysql:database=jensblog;#host=localhost','user','supersecretpasswordbutnotthis',{AutoCommit=≻1,RaiseError=≻1,PrintError=≻0});

my $filename = "/usr/local/bin/jensblog/App_Data/categories.xml";
my $parser = XML::LibXML-≻new();
my $xmldoc = $parser-≻parse_file($filename);

my %dictGUIDToID = ();

for my $category_node ($xmldoc-≻findnodes("//category"))
{
	my $strGuidId = $category_node -≻ findnodes('@id') -≻ to_literal;
	my $strCategoryName = $category_node-≻textContent;
	
	$dbh-≻do('INSERT INTO category (name) VALUES (?)', undef, $strCategoryName);
	my $intId = $dbh-≻{mysql_insertid};
	
	$dictGUIDToID{ $strGuidId } = $intId;
	print $strGuidId . " " . $strCategoryName. " " . $intId . "\n";
}

And the result when running the perl script is this:

6e20e883-bcec-4f3a-80e5-a3ffabd86453 1980 3
4d30923f-e7e6-47af-bc56-ce7764bb1114 1982 4
b7e42612-9919-4adb-ae04-f2ec91e0440f 1983 5
0c8ce848-1f55-4308-b2d2-f9344531d7ec 1984 6

And in the database it looks like this:

mysql≻ select * from category;
+----+----------------------+
| ID | name                 |
+----+----------------------+
|  1 | 1980                 |
|  2 | 1980                 |
|  3 | 1980                 |
|  4 | 1982                 |
|  5 | 1983                 |
|  6 | 1984                 |
|  7 | 1988                 |

You can see that my program has been running for a couple of times adding 1980 three times. That is not good when I run it for real. But this is how far I came for now. Next time I will come back to walking the directories of XML files again. I will learn how load the XML files with posts and insert the data of the posts into the database. And I will fix this 1980 problem.


I moved from Sweden to The Netherlands in 1995.

Here on this site, you find my creations because that is what I do. I create.