Showing posts with label php. Show all posts
Showing posts with label php. Show all posts

Friday, March 25, 2011

PHP-Azure Migration - Part 4

Up to the last post of the migration, we have done all data and file migration. How about the new files being uploaded from the CMS?

PHP
We are using a very old version of FCKEditor as our rich text editor. Since FCKEditor supports uploading file from the editor, we need to change the code in editor/filemanager/connectors/php/command.php. Here is an example of getting the folder and files for the file browser.
function GetFoldersAndFiles( $resourceType, $currentFolder ) {
        $sServerDir = MapAzureFolder( $resourceType, $currentFolder);
        $blobs = $blobStorageClient->listBlobs($ContainerName, $sServerDir, '/');
 $aFolders = array();
 $aFiles = array();
 $folders = GetSessionFolders($resourceType, $currentFolder);
 foreach($blobs as $blob){
  $name = GetName($blob->Name);
  if ($blob->IsPrefix){
   $folders[$name] = 0;
  } else {     
   $iFileSize = $blob->Size;
   if ( !$iFileSize ) {
    $iFileSize = 0 ;
   }
   if ( $iFileSize > 0 ) {
    $iFileSize = round( $iFileSize / 1024 ) ;
    if ( $iFileSize < 1 ) $iFileSize = 1 ;
   }
    $aFiles[] = '<File name="' . ConvertToXmlAttribute( $name ) . '" size="' . $iFileSize . '" />' ;
  }
 }
 foreach($folders as $name => $value){
  $aFolders[] = '<Folder name="' . ConvertToXmlAttribute( $name ) . '" />' ;
 }
}
It is worth to mention the GetSessionFolders() function. Azure Storage does not have concept of folder. All folders are actually derived by the paths of the files. So, when listing the blobs, we can use $blob->IsPrefix to distinguish the derived folders and the files.

If a user want to create a folder and upload the file to the folder, we need to put the folder into the session. That is how GetSessionFolders() function come from.

When upload a file, we have to handle the Flash format. It seems that modern browsers are intelligent enough to recognize image files like JPEG, GIF, PNG without the content type. When it comes to Flash file, it could not be played properly.
function FileUpload( $resourceType, $currentFolder, $sCommand ) {
...
 $sFileName = 'uploadedfiles/'. strtolower($resourceType).$currentFolder.$sFileName;
 $result = $blobStorageClient->putBlob($ContainerName, $sFileName,  $oFile['tmp_name']);

 if(strtolower($resourceType) == 'flash') {
  $contentType = 'application/x-shockwave-flash';
  $blobStorageClient->setBlobProperties($ContainerName,  $sFileName, null, array('x-ms-blob-content-type' => $contentType));
 }
 $sFileUrl = $currentFolder.$sFileName;
...
}

ASP.NET
For the ASP.NET portal, we have to do the same. Under /editor/filemanager/connectors/aspx/connector.aspx, we changed the first line to
<%@ Page Language="c#" Trace="false" Inherits="FCKAzureAdapter.Connector" AutoEventWireup="false" %>

The connector is simply extending FredCK.FCKeditorV2.FileBrowser.Connector and override the GetFiles(), GetFolders(), and CreateFolder() methods with Azure API. Here is an example of GetFolders():
protected override void GetFolders(System.Xml.XmlNode connectorNode, string resourceType, string currentFolder) {
    string sServerDir = this.ServerMapFolder(resourceType, currentFolder);

    // Create the "Folders" node.
    XmlNode oFoldersNode = XmlUtil.AppendElement(connectorNode, "Folders");

    CloudBlobContainer container = GetCloudBlobContainer(containerPath);
    var dir = container.GetDirectoryReference(folder);

    var items = dir.ListBlobs();
    List<string> dirList = new List<string>();
    foreach (var x in items.OfType<CloudBlobDirectory>())
    {
        dirList.Add(x.Uri.ToString());
    }
    var dirs = dirList.Select(o => o.Substring(o.LastIndexOf('/', o.Length - 2)).Replace("/", "")).ToArray();

    Dictionary<string, List<string>> folderList = 
            HttpContext.Current.Session["Folders"] as Dictionary<string, List<string>>;
    foreach (var dir in dirs)
    {
        // Create the "Folder" node.
        XmlNode oFolderNode = XmlUtil.AppendElement(oFoldersNode, "Folder");
        var dirName = dir.Uri.GetName();
        XmlUtil.SetAttribute(oFolderNode, "name", dirName);
        if (folderList != null && 
            folderList.ContainsKey(resourceType) 
            && folderList[resourceType].Contains(dirName))
        {
            folderList[resourceType].Remove(dirName);
        }
    }

    if (folderList != null && folderList.ContainsKey(resourceType))
    {
        foreach (var folder in folderList[resourceType].Where(o => o != currentFolder && o.StartsWith(currentFolder)))
        {
            var folderName = folder.Substring(currentFolder.Length).Trim('/');
            if (!folderName.Contains('/')) {
                // Create the "Folder" node.
                XmlNode oFolderNode = XmlUtil.AppendElement(oFoldersNode, "Folder");
                XmlUtil.SetAttribute(oFolderNode, "name", folderName);
            }
        }
    }
}
Then, we do the same for editor/filemanager/connectors/aspx/upload.aspx.

Next time, we will discuss what other problems we have encountered.

Thursday, March 03, 2011

PHP-Azure Migration - Part 3

In the previous post, we have fixed all database queries to make the web site start working in SQL Server. The next task is tackling the file I/O without the file system. The file system is not a permanent storage in Windows Azure. We need to copy all resource files (file attachments of articles) from the local file system to Azure Storage.

File Migration
The migration seemed quite straight forward. We can just copying the file to Azure Storage. However, there is no FTP or any batch utilities from Microsoft for batch upload. Given we have 4GB of files with some hierarchical directories, it is infeasible to the files one by one. We used Cloud Storage Studio for the migration.

URL Redirection
The web site and the resource files are now in different servers. We can no longer use relative path. However, since the database is storing the relative path of the resource files, we either change the data (as well as semantics) to store the absolute URIs of the resource files; or redirect the browser to get the files from the correct URLs.

As changing the data would involve a lot of testings in business logics, we decided to create a ASP.NET Module to redirect the requests. Given that
  1. IIS (as well as Windows Azure) allow us to call ASP.NET Module in any web site (including PHP web site)
  2. All files were stored in a particular folder so we could derive the redirect pattern easily
Now, all existing files have been migrated and will be redirected to new URL. Next time, let's take a look how to deal with new uploaded files.

Wednesday, March 02, 2011

PHP-Azure Migration - Part 2

In the previous post, I have discussed the database migration from MySQL to SQL Azure. Data migration is only the first part of data processing, the next step is to modify the queries in the web site due the the syntax differences between two database systems.

Driver Adaption
In the ASP.NET frontend, we only need to replace the ADO.NET driver from MySQL to SQL Server driver and it is all done. In the PHP CMS, however, we will need to do some adaptation since PHP did not have a unified API for database driver.

Lucky enough, it is not much work as there is a driver layer in the CMS. There were only three functions we need to change: query, fetch, total which corresponding to mysql_query, mysql_fetch_arry, and mysql_num_rows in MySQL driver and sqlsrv_query, sqlsrv_fetch_array, and sqlsrv_num_rows in SQL Server.

Query Modification
The major differences can be summarizes as follows:
  1. Getting last inserted record identity
  2. Paging
  3. Scalar functions
  4. Aggregation functions
  5. Escaping strings
There is a DAO layer in the CMS that storing all queries.  So, all we did is go through all queries in the DAOs and modify them.

1. Getting last inserted record identity
MySQL PHP driver provides mysql_insert_id() to get the auto-increment column value of the last record inserted. There is no equivalent function in SQL Server PHP Driver. However, it is no more than just querying "@@IDENTITY" in SQL Server.

2. Paging
While doing paging in MySQL only takes "LIMIT n OFFSET skip", SQL Server will need
SELECT * FROM (
    SELECT TOP n * FROM (
        SELECT TOP z columns      -- (where z = n + skip)
        FROM table_name
        ORDER BY key ASC
    ) AS a ORDER BY key DESC
) AS b ORDER BY key ASC

3. Scalar functions
MySQL uses NOW() to get the current timestamp. SQL Server uses GETDATE().
MySQL provides RAND() so we could do random order. SQL Server provides NEWID(). Please note that NEWID() generate a GUID so the ordering is random. It does not mean NEWID() is same as RAND() in MySQL.

4. Aggregation functions
MySQL provides many convenient aggregation functions like GROUP_CONCAT(). The queries have to be rewritten for SQL Server or we need to create the corresponding user defined aggregation functions.

5. Escaping strings
Most developers use \' to escape single quote. In fact, both MySQL and SQL Server use two consecutive single quotes to escape a single quote. MySQL just accepting the other formats.

Up to this moment, we discussing the data layer (if you have a data layer in PHP). There is no code change in the business layer yet. This is also an example that how we could modularize a program if we partition layers clearly regardless which platform we are using.

Tuesday, March 01, 2011

PHP-Azure Migration - Part 1

Azure is Microsoft's cloud platform. The working models are different in a few ways. We migrated a web site to Azure. The web site consists of a PHP CMS, ASP.NET frontend, and a MySQL database.

Database Migration
There was some discussion among the team about whether using MySQL worker role or SQL Azure. As the final decision is SQL Azure, the first task we need to do is migrating the MySQL database to SQL Azure. The most obvious tools we can work on is Microsoft SQL Server Migration Assistant (SSMA) for MySQL.

SSMA was working fine in SQL Server. It reconstructed the SQL Server equivalent schema from MySQL and copied the data from MySQL to SQL Server. However, there are a few catches when migrating to SQL Azure:
1. Tables without primary keys:
For some reason, some tables in the web site did not have primary keys. The problem is a table in SQL Azure need a clustered index. No primary key means no clustered index (in SSMA sense). We have used a development SQL Server to re-create the schema first. Then, look for the tables without primary keys and add the primary keys back.

2. Large records
The CMS designer has a concept that all data should be put into the database as the backup strategies. So, everything including images files, Flash video files, and video files were stored in some attachment table as well. SSMA could handle most of them but not all. However, when the record is large (say 10MB), it will eventually failed and interrupted the how migration process. It is a very traditional timeout problem. We are just dealing with two:
  1. Command timeout for SQL Server
  2. Idle timeout for MySQL
Since we do not have control for timeout configuration and error handling strategies, we skip those tables during the migration. We dumped out the attachment into file systems. After a few tuning in the upload and download script to make the CMS working, we create a PHP script to migrate only the metadata columns in the attachment tables.

To conclude, the migration were performed in a two steps:
  1. Execute the SQL script to create the tables
  2. SSMA to copy most data
  3. Execute the PHP script to migrate the attachment metadata
The performance was quite satisfactory.

Wednesday, December 17, 2008

Slow Performance of PHP in Windows IIS

There are a few reasons that PHP runs slowly in Windows. Not because of the request delegation from IIS to PHP via FastCGI or CGI. In fact, the delegation is quite robust in FastCGI.

TCP/IP connection problem between database and web server may be a reason. Some Windows specific behaviors may be another. One of the Windows specific behaviors is the invocation of gethostbyaddr() function.

Here is the high level behavior of gethostbyaddr() function making the invocation slow:
Original
gethostbyaddr() on windows waits for WINS resolution timeout even when you disable WINS on the server.

This means that your DNS server may return "No hostname found" and windows gethostbyaddr() will just sit there, never having asked a WINS server anything, for 3 - 4 seconds. I watched it do just that with a sniffer.
Here is the root cause:
Original
Difference between gethostbyaddr and getnameinfo.

Originally posted by: Gregor The Eye

First at all, sorry for my english. It's not my native language, and, unfortunately, i havn't time to verify all text i enter.

Now, to subject. Computer may have many "names", and two main name groups of names computers have is a "DNS" names and a "NetBIOS" names.

The "NetBIOS" name is aname you assign to your computer using system->network identification->network ID. This name
is displayed in you "network neighborhood" and generally used in local networks to identify computers.

The "DNS" name is a name given to machine in internet. This name is a synonim to machine IP (becouse IP is\s not huma-friendly). "DNS" name is stored in special internet services (so-called "DNS" servers), not at the computer itself. Generally, computer can be shut off, but it's DNS name will be available and produce correct IP.

For example, if I install OS to new computer and call it "EYE", it will have a NetBIOS name "EYE", and, connecting it to my local network, i can access it in "network neighborhood". But, from the internet, i can't access to it using "EYE" name, and instead i must use it's IP - for example - 113.54.25.14. But if i need over users to access it from internet using a human-friendly name, i pay amount of money to sertain organization, and they register a name for it - for example - "www.GregorTheEye.com". Then, user can access my machine using this name, fopr example, writing:

ping www.GregorTheEye.com

or

telnet www.GregorTheEye.com

So, we have 2 names for machine - NetBIOS name and DNS name.
In my local network, following commands will be valid:

ping 113.54.25.14
ping EYE
ping www.GregorTheEye.com

Third command will be valid ONLY if my computer is conected to internet.

And from the internet, valid commands are:

ping 113.54.14
ping www.GregorTheEye.com

As you can see, NetBIOS name is not available from internet.

The main difference between NetBIOS name and DNS name is that DNS name will be available only if computer is connected to the internet and has i's name registered on it. NetBIOS name will be always available to computers that directly connected to target one.

To get DNS name, you must send a request to DNS server (it's IP is written in the system registry if you computer is connected to internet). If DNS server is unavailable, it will take default timeout time to discover this. If it's available, DNS server will return you human-friendly name of target machine if it's exist in database.

To obtain a NetBIOS name, you must send UDP packe to target machine and wait for responce. Becouse UDP is not a guaranteed-to-deliver protocol, responce may ot came, came corrupted, came many times etc. Your program must patiently wait for responce (it's not from 137 port - you must SEND UDP packet to 137 port of target machine, but responce will be returned to the port you specify. Only Win95 no OSR2 has a bug there responce always returns to port 137).

And, finally to Win32 functions -

gethostbyaddr() will first try to connect DNS server, and, if it's unavailable, will try to get NetBIOS name. So, it will get machine name one method or another in most cases.

getnameinfo() will ONLY try a DNS name, no attempts to get a NetBIOS name(). That's the difference.

If you write a program that must get names of local machines as well as names of internet machines (for example, network or port scanner), you MUST use gethostbyaddr(). If you write a program that will operate only with names of remote machines (IRC client, for example) - you MUST use getnameinfo() becouse Microsoft instructs so in lates MSDN releases.

If you have any questions - feel free to contact me. Also, if i write something wrony, i will be very thankfull if you send ma a hint.

Best regards, Gregor The Eye.
However, there is no getnameinfo() function in PHP. One of the nice feature in PHP documentation is the commenting. A commenter contributed a gethostbyaddr_timeout() function that only query the DNS server. You need to pass the DNS address (instead of using the OS specified DNS) to the function, though.