Porting my blog for the second time, images part 4 |
Porting my blog for the second time, images part 6 |
Porting my blog for the second time, images part 5
This is post #12 of my series about how I port this blog from Blogengine.NET 2.5 ASPX on a Windows Server 2003 to a Linux Ubuntu server, Apache2, MySQL and PHP. A so called LAMP. The introduction to this project can be found in this blog post https://www.malmgren.nl/post/Porting-my-blog-for-the-second-time-Project-can-start.aspx.
I can download the images and put them in a local directory. Now it is time to store the URL data in the database.
But first I need to improve the routine deleting the data from the tables. Since I will be loading and loading the data again and again I want to load from scratch every time. The routine I made earlier (https://www.malmgren.nl/post/Porting-my-blog-for-the-second-time-tags.aspx) deleting the data in the table it would not reset the auto increment counters so that IDs would increase every time. Now I changed my delete routine to reset the auto_increment counters.
And then I had decided to go on like a pro but MySQL smacked me right over my fingers and gave me something called Schrödingers Table. It goes like this. You enter show tables; and MySQL tells you what tables you got. One of mine was missing which was good. So I could create it? No because it already existed! So it existed and was missing at the same time. So here I started searching for solutions and at one point I had corrupted my entire database by deleting ibdata files. I got it back. When I had it back I backed up my other database sunneras with mysqldump before trying anything else. Then I started flushing tables and finally I deleted my entire jensblog database. It did not help. The table still existed.
Then I got a little idea. Suppose the name of the database was bad. It was named Comment. That is also a reserved word. So then I called it Feedback. Then I tried to Synchronize again and this time a got message 121 that something with a foreign key was wrong. So then I fixed that of my Feedback table it worked again! Then I got the idea that maybe it was that foreign key all along that was the error. So I tried to rename it back to Comment. Failed.
Executing SQL script in server ERROR: Error 1050: Table './jensblog/Comment' already exists SQL Code: ALTER TABLE `jensblog`.`Feedback` RENAME TO `jensblog`.`Comment` SQL script execution finished: statements: 5 succeeded, 1 failed Fetching back view definitions in final form. Nothing to fetch
Grmbl... From now on it is going to be called Feedback. Obviously it is called Comment in Blogengine.NET but here it is going to be Feedback. Changed my perl program to say Feedback. I don't like it but I need to continue.
This is a little bit like computers rule the world these days. For example on my laptop before the domain controller breakdown earlier this year I could use my index finger to log in. After the breakdown that is not possible anymore. A reset is not provided for finger prints. Luckily I have 10 fingers so 9 more to go.
Another example. My bank card broke. So I went to my bank and complained. They told me to log in via Internet Banking to order a new card. Then I asked them "HOW?". The lady behind the counter then took the same story one more time and promptly I asked the lady "HOW?". That went on for two more times. I am not making this up, this happened. That ABN-AMRO lady was programmed to say the same things hundreds of times per day and she simply malfunctioned outside her scripts. So I told her that her reasoning was flawed instead of asking "HOW?". Then she solved the situation by redirecting me to another counter.
So now after the Schrödingers Table Incident it was time again to fix the routine deleting the data from the tables. Now it looks like this:
# Delete data from the tables # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-5.aspx # https://www.malmgren.nl/post/Porting-my-blog-for-the-second-time-tags.aspx if ($deleteDataBase) { $dbh-≻do('delete from PostTag'); $dbh-≻do('delete from PostCategory'); $dbh-≻do('delete from PostURL'); $dbh-≻do('delete from URL'); $dbh-≻do('delete from Tag'); $dbh-≻do('delete from Category'); $dbh-≻do('delete from Feedback'); $dbh-≻do('delete from Post'); $dbh-≻do('ALTER TABLE Post AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE PostTag AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE PostURL AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE PostCategory AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE URL AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE Tag AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE Category AUTO_INCREMENT = 1'); $dbh-≻do('ALTER TABLE Feedback AUTO_INCREMENT = 1'); }
I added the routines for inserting the URL data. It is done in two steps. First I do an insert without Target and FileName and later I update these two values.
# Insert the URL record. Get the ID of it for replacing URLs with place holders such as {URL :1} # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-5.aspx $dbh-≻do( 'INSERT INTO URL (TagName, Width, Height, IsLocalURL, Style, IsFileContent ) ' . 'VALUES (?, ?, ?, ?, ?, ? )' , undef, $strTagName,$strWidth,$strHeight,$bIsLocalURL,$strStyle,$bIsFileContent); my $iDatabaseIDofThisURLentry = $dbh-≻{mysql_insertid};
Formatting the field names, values and the variables under each other to see that they add up. After creating the place holder the FileName and Target values are stored.
# Update the FileName and Target of the current URL, # Connect the Post with this URL by creating the PostURL record. # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-5.aspx $dbh-≻do('UPDATE URL SET Target = ?, Filename = ? WHERE ID = ?' , undef, $strTarget, $strFileName , $iDatabaseIDofThisURLentry); $dbh-≻do( 'INSERT INTO PostURL (URLID, PostID )' . 'VALUES (?, ? )' , undef, $iDatabaseIDofThisURLentry,$postID);
And finally when the entire post has been processed, all URLs are replaced with place holders then the Content field is updated.
# Here all URLs of the content replaced with place holders. Now it can be updated to the database. # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-5.aspx $dbh-≻do('UPDATE Post SET Content = ? WHERE ID = ?' , undef, $contentResult , $postID);
So far this works really well for one test post. Later on I will start processing more posts but before doing that I want to make it possible to detect already present target URLs. Suppose an image is already in use in a post. Then it should not be downloaded again and there should not be a duplicate URL record. To solve this I will need to rebuild the program slightly. There are two ways of doing this. Either i can query the target in the database before inserting it. I could also use a hash table. I think the easiest is to use a hash table. That means that the program cannot run halfway through an existing database. Either it runs on an empty database or it is not running at all. I could try to make the Target column unique but it is 1024 characters wide and max is 746 bytes. Other than that I could try making FileName unique, I made that to 246 characters. But if I make it unique then it cannot really be null. So then I would have to shuffle around considerably to make it to work. No I will use a hash table.
Hold your horses!, it is not that simple. A URL can be used in an IMG tag or in an A tag and that gives different URL records. Oh no..., but wait. We can make the key in the table into a combination of target and tagname. By doing so we also take size into account for Blogger etc. So I decide to declare a new global %dictTargetAndTagNameToURLID hash table at the beginning of the program.
For this to work I need to check the existence of the target + tagname before I insert the URL. If it already exists then we just take the URL record ID from the %dictTargetAndTagNameToURLID hash table. If it did not exist then we need to insert the URL record and... store it in the %dictTargetAndTagNameToURLID hash as well.
# Replace an URL with an URL place holder. Store the URL in dictURLToDatabaseID so that # later on it is possible to search for URLs a second time. # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-3.aspx my $iDatabaseIDofThisURLentry = -1; if ($tag =~ /(.*?)(http|https)(:\/\/.+?)[\"']/i) { my $strParsedURL = $2.$3; $strTarget = uri_decode($strParsedURL); if (exists $dictTargetAndTagNameToURLID{$strTarget . "|" . $strTagName}) { # URL record already available for this URL $iDatabaseIDofThisURLentry = $dictTargetAndTagNameToURLID{$strTarget . "|" . $strTagName}; $bIsAlreadyLoadedInAnotherPost = 1; } else { # Insert the URL record. Get the ID of it for replacing URLs with place holders such as {URL :1} # http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-images-part-5.aspx $dbh-≻do( 'INSERT INTO URL (TagName, Width, Height, IsLocalURL, Style, IsFileContent ) ' . 'VALUES (?, ?, ?, ?, ?, ? )' , undef, $strTagName,$strWidth,$strHeight,$bIsLocalURL,$strStyle,$bIsFileContent); $dictTargetAndTagNameToURLID{$strTarget . "|" . $strTagName} = $iDatabaseIDofThisURLentry = $dbh-≻{mysql_insertid}; } $dictURLToDatabaseID{$strTarget} = $iDatabaseIDofThisURLentry; $tag =~ s/^(.*?)($strParsedURL)(.*?)$/$1\{URL:$iDatabaseIDofThisURLentry\}$3/; }
How can I test if this is going to work? Well by creating an URL record and store a reference to it in the table. Like so in the beginning of the program:
$dbh-≻do( 'INSERT INTO URL (TagName, Width, Height, IsLocalURL, Style, IsFileContent ) ' . 'VALUES (?, ?, ?, ?, ?, ? )' , undef, "test","w","h",1,"s","1"); $dictTargetAndTagNameToURLID{"http://www.jens.malmgren.nl/Images/20140211_075637_tn.jpg|img"} = $dbh-≻{mysql_insertid};
Then I tested my program, concluded it worked by checking the content of Post ID 1 and that the URL table had my test entry and then I removed the test code.
So that is it for this time. Bye bye Schrödinger, I hope not to meet again.
I moved from Sweden to The Netherlands in 1995.
Here on this site, you find my creations because that is what I do. I create.