Wednesday, May 16, 2012

Using System.IO.Packaging Package class to zip a file

Hello World,

I believe all of us who work with software that writes reports would have encountered scenarios when you had a huge report e..g one month of sales data, that required to be compressed to make it portable. I was faced with one such requirement and as always I started researching on what my options were.
               A google search came up with options like the popular 7-Zip library which is un-managed code, and DotNetZip, a codeplex open source library that seemed fit for the purpose. My mandate though (from the powers that be in an IT organization :)) was to try to avoid using a third party dll.
              Now, as a .NET programmer, I would love if the framework itself provided services to pacakge and compress files rather than having to look for a third party library: and the framework does not dissappoint in this regard. There are a couple of classes that provide packaging and compression:
  1. The System.IO.Compression GZipStream class
  2. The System.IO.Packaging Package class 
A requirement from the solution I was trying to devise was that the compression should produce a file that does not require additional software to decompress (unzip) it. Though the GZipStream class offers an elegant and efficient way to package and compress files, it requires a software that understands the gzip compression scheme. Such programmes are freely available, but require downloading and installing nevertheless! That ruled out GZipStream (I'll try to post sample code for that in another post)

Notice how I'm referring to "Package" and "compression" as the two steps involved here. A "Package" is a container that holds several objects in it. Consider a directory with several files in it - that is a container, although not very portable (not too straightforward to move across network from one machine to another). I know you'll say "I can use xcopy to copy over the directory anywhere I like!" you very well can, but wouldn't it be nice if there was some way of treating the directory as a single object that holds a relationship with it's content? Unix based systems have the "tar" utility (short for tape archive) that can package several resources (files) into a single file-like object that can be treated thus. This object then is a "Package".

Now, MSDN has an example showing how to use the Package class, but then, if all samples did everything you wanted, you wouldn't have a job : ). If you run the sample (from the downloaded solution), it would package everything nicely, but does not compress anything !!

So here is how I achived the task of packaging and compressing the file: First the class that calls the methods on Package class-



   1:      /// <summary>
   2:      /// TODO: Update summary.
   3:      /// </summary>
   4:      public class ZipPackageCompressor
   5:      {
   9:   
  10:          //  -------------------------- CreatePackage --------------------------
  11:          /// <summary>
  12:          /// Creates a package zip file containing specified
  13:          /// content and resource files.
  14:          /// </summary>
  15:          /// <param name="fileToZip">
               ///    The path to the file to zip. <example>E.g. C:\myfolder\thefileToxip.csv
               ///    </example></param>
  16:          /// <param name="pathToOutput">
               ///    The path where output(zipped file) will be placed. 
               ///    <example>E.g. D:\output</example>
               /// </param>
  17:          public void CreatePackage(string fileToZip, string pathToOutput)
  18:          {
  19:              // Convert system path and file names to Part URIs. In this example
  20:              string fileName = Path.GetFileName(fileToZip);
  21:              Uri partUriDocument = PackUriHelper.CreatePartUri(new Uri(fileName, UriKind.Relative));
  22:              string outputFileName = Path.Combine(pathToOutput, fileName + ".zip");
  23:   
  24:              // Create the Package
  25:              // If the package file already exists, FileMode.Create will automatically 
                   // delete it first before creating a new one.
  26:              using (var package = ZipPackage.Open(outputFileName, FileMode.Create, FileAccess.ReadWrite))
  27:              {
  28:                  // Add the Document part to the Package
  29:                  PackagePart packagePartDocument = package.CreatePart(
                                                                  partUriDocument, 
                                                                  System.Net.Mime.MediaTypeNames.Text.Plain, 
                                                                  CompressionOption.Maximum);
  30:                  
  31:                  // Copy the data to the Document Part
  32:                  using (FileStream fileStream = new FileStream(fileToZip, FileMode.Open, FileAccess.Read))
  33:                  {
  34:                      CopyStream(fileStream, packagePartDocument.GetStream());
  35:                  }
  36:              }
  37:          }
  38:   
  39:   
  40:          /// <summary>
  41:          /// Copies data from a source stream to a target stream.
  42:          /// </summary>
  43:          /// <param name="source">The source stream to copy from.</param>
  44:          /// <param name="target">The destination stream to copy to.</param>
  45:          private void CopyStream(Stream source, Stream target)
  46:          {
  47:              const int bufSize = 0x1000;
  48:              byte[] buf = new byte[bufSize];
  49:              int bytesRead = 0;
  50:              while ((bytesRead = source.Read(buf, 0, bufSize)) > 0)
  51:              {
  52:                  target.Write(buf, 0, bytesRead);
  53:              }
  54:          }
  55:      }

Notice the call to package.CreatePart on line 29 with the third parameter "CompressionOption.Maximum" this is what actually compresses the file. I believe the comments in the code make it pretty straight forward to understand what is going on.

To work with the System.IO.Packaging namespace, you would need a reference to the WindowsBase assembly:


To wind up, here is a test program that calls the CreatePackage above:


   1:      public class Program
   2:      {
   3:   
   4:          public static void Main()
   5:          {
   6:              // Path to directory of files to compress and decompress.
   7:              string dirPath = AssemblyDirectory;
   8:              var zipPackageCompressor = new ZipPackageCompressor();
   9:              zipPackageCompressor.CreatePackage(
                          @"D:\Sudhanshu\Projects\FileZipper\FileZipper\Contents\MyFile_bcce1c37.csv", 
                          AssemblyDirectory);
  10:          }
  11:   
  12:          /// <summary>
  13:          /// Gets the directory from where the (EXE) assembly is executing.
  14:          /// </summary>
  15:          public static string AssemblyDirectory
  16:          {
  17:              get
  18:              {
  19:                  string assemblyPath = Path.GetDirectoryName(Assembly.GetEntryAssembly().Location);
  20:                  return assemblyPath;
  21:              }
  22:          }
  23:   
  24:      }
Happy Coding!