Windows Azure – Storage
Now that we have already completed the part of compute, we will now dive into another bigger advantage of the Azure; the storage. This shouldn’t be confused with the hosting space provided by many hosting vendors. Unlike the hosting space, or disk space provided by most of the vendors, this is a storage space where Azure allows to store any data in a much relevant and organized way. Above all Azure provides you a bunch of REST API over HTTP to communicate with the account.
Why Storage
This should be the first question by any consumer or developer before we enroll ourselves for the storage account. Though there are multiple advantages based on situations, we will discuss a few which advocates the storage account on a massive scale.
As we have already seen in our previous parts that web roles or web role servers are not advised for using it as a temporary storage or any flat file storage for any other files other than which is a part of the published project, storage account is a good option to overcome this shortcoming. If you don’t remember the exact reason behind this, let me re-iterate that the hosting machines are virtual and can change at any point of time. So ideally machine M1 might be one of your web role server for now but in another 15 minutes, we would be using M9 instead of M1 and since this is totally concealed from the end user, in-proc session states and using of the storage place for custom files (files which are not part of the deployment)
Though space is cheaper to good olden times, it is still a challenge when it comes to a premier support along with SDK support and facilities of storing and retrieving it in a organized way. Azure provides Blobs, Tables and Queues for the same. Above all as of today Azure provides a space of 100 TB. Yeah you heard it right, it’s 100 Terra Bytes.
Mirroring of your valuable data is supported out of the box which also allows you to choose the geographical location of your server which is called as CDN (Content Distribution Network).
Blob Containers
Blobs are one of the most credible feature you get with Azure. You can create an ‘n’ number of blob containers on your storage account. Inside every blob containers you can upload or download ‘n’ number of files through the Azure API. Now you may ask me what’s new about this? Well as the name (Binary Large Object) tells, the service is meant for bigger files, above all the files can be split to chunks and read / write giving you more reliability while working on larger files. Now lets talk about the two type of blobs supported by Azure.
Block Blob
It is comprised of smaller blocks which is identified uniquely for any block using “Block Id”. Any block can be read or written at any order. If you are looking for streaming then, block blobs are the best option to go with.
The below video will show you on the uploading a file as a BLOB.
Let me list down few points to be remembered on Block Blob.
- Files has to split to have blocks when it crosses 64 MB.
- Every block can be a maximum of 4 MB size.
- The maximum size of the blob / file can be 200 GB or 50,000 blocks.
- The blocks uploaded is not committed unless the final API call “PutBlockList” is called.
Block blobs though offer a very good and effective way of working with bigger files, it lacks on certain places.
- The maximum size of the file can be no larger than 200GB.
- It needs at least two API calls to write a blob when uploaded as blocks. [PutBlock to upload a block with block id and PutBlockList to commit all the changes].
- Any uploading cannot be committed immediately unless the final call “PutBlockList” is made.
- Reading a byte range other than the block split cannot be done.
Page Blob
Page blobs focus on overcoming few shortcoming of the block blobs. Though both the blob types support random read and write, when it comes to random access of the file content, Page blobs are preferred to blocks.
Below are few points which would advocate my statement above.
- The maximum size of a page blob file would be 1 TB.
- The least page size is 512 bytes and can accommodate any data in multiples of 512 bytes up to 4 MB into a page.
- As soon as a data is uploaded, it is written into the cloud / disk.
- All the data / pages inside the page blobs are indexed to allow faster read / write.
- Windows Azure Drive (TBD) is supported.
- A page can be considered as a individual file and read / write operation can be performed on it.
Notes & Limitations
- Blobs can be uploaded / read as a single file without using the functionalities of block or page blobs.
- One of the biggest limitation of the blobs is that we cannot have sub containers for the containers created in the blob section. And therefore you cannot physically create any folder structure inside blobs. However this can be overcome using “/” inside the blob names. Unlike file names on local disk, the blob names accept the special character “/” where you can name the file which will look as if we are using a folder structure.
- All the blob names should be lower case.
Tables
Tables are entity or object collection stored on the disk. This should not be compared or confused with the tables available in the RDBMS like MS SQL, Oracle etc.
Below are few data points of the tables.
- There is no limitation on the number of table / collection and rows.
- Every entities can have up to 255 properties.
- Every entity should have a property defining the row key “RowKey” and partition key “PartitionKey”. Row key is the unique identifier of the row and partition key is generally any property which would be better for maintaining partition. (e.g., DepartmentId in an entity of Employee).
- The limitation of not being an RDBMS is easily and effectively overcome by the support of LINQ on the tables. A LINQ query can therefore be written and executed against the tables.
- Every table is mirrored thrice in Azure and the reliability and availability is maintained automatically.
Queues
Yet another great facility available on the azure storage account is queue. As the name says it all, it is a queue sitting in the cloud where a program can push or pop messages .
- Every queue has a valid DNS uri.
- A message in queue has a limitation of 8 KB in size.
- A Queue has no limitation on the number of messages it can contain.
- Messages cannot choose their destinations individually but they follow the path or receiver of the queue.
- Message when consumed by acquiring token which expires by time i.e., once a message is got it will be locked / hidden from any other process unless the time expires or the message is deleted; if the time expires, the message will be marked as new and will be provided to the next consumer waiting. A detailed process is explained in the animation shown below.
- Messages can be even got without a time expiry and immediately marked as processed.
The below animation will brief about the Queue process with sample criteria.
Windows Azure Drive / X Drive
Azure drive / X drive is a virtual drive which is mounted from an image or a page blob. One biggest advantage using the drive is the assistance in easily migrating the existing web applications which uses the physical drives or folder as part of its execution. Since Azure doesn’t support the fact of local drive or folder access, the Azure drive can be easily programmed to work instead of the local storage with minimal amount of changes to existing code.
We will look into few data points about Azure drive.
- Only page blob (which is a VHD) can be used for mounting a Azure drive.
- Drive mount will expire once the lease expires or the VM on which it is running crashes.
- Once a drive is mount and the lease is got, it will remain for 3 minutes before it expires.
- Currently you can create up to 8 Azure drives with a storage account.
- Only one instance of the application can use a single drive. If some other process has to use the same drive, then it has to wait for the other process to release the lock or lease to expire.
- A snapshot can be created on a drive and provided with read only access to any number of instances.
- Caching is supported by default by having a local cache of the drive you create. Any write operation performed will be updated both on the cache and the drive. This will ensure a fast read / write process to a drive.
What’s Next
Well if you are sad that the tutorial about the storage account is done, I would not like to disappoint you but we will continue with working samples about the features we have discussed on our next part.
Good work Jeba ! Nice presentation !!!