Pega File processing Azure storage files
In this article, we will see how Pega file listener can process the file uploaded into Azure storage repository.
We already have a dedicated article on file listener processing.
In this article, we will concentrate more on accessing the file location part.
We know how a file listener works – The listener can easily access and process any file on the server path location. Now, what if the file is in a different location, say in cloud storage?
In this tutorial, we will discuss two solutions
- Mounting file system < Pega 8.4 version
- Using Repository rule – Pega 8.4 + versions
This tutorial was implemented using Pega personal edition 8.5, but the core concepts remain the same in higher versions as well
Before getting into the solution, let us talk about storage accounts in Azure.
To get a basic understanding about Azure cloud computing, I would recommend you to go through few docs on Azure fundamentals –
https://docs.microsoft.com/en-us/learn/certifications/azure-fundamentals
Anyways, I will try to explain a few azure terms throughout the tutorial.
Azure portal – Is the GUI where you can manage your Azure resources.
Business scenario – Let’s say organization ABC started using Azure cloud computing and all IT applications are started moving towards cloud migration. For now, let’s assume that still, our pega application is running on premise. A Java application produces a CSV file daily for the loan arrear details and places it in our pega app server via some FTP protocol. Pega uses file listener to process the CSV file. All is well. Now, the Java application migrated to Azure cloud and they are able to create the CSV file and store it under the azure storage. So, we have a find a solution to make our file listener process the file from Azure storage.
First, let’s see where the files are stored in Azure.
What are the pre-requisites for this tutorial?
1. A Valid Azure subscription
You can use the free subscription for a month (you need to provide credit card details!!, but will be charged only when you cross certain limit and you can have control on that).
https://azure.microsoft.com/en-in/free/
If you registered for training, you may have Azure pass subscription which you can use as well.
You definitely need a subscription before proceeding.
2. A file listener in Pega
To change the source location and test different scenarios.
Let’s start with the configurations
Create Storage account resource in Azure
Launch the Azure portal and add a new resource- storage account.
Just like other resources, you need to enter some basic details like – Subscription, location, networking etc.
Also the type of storage account defines what type of service you can use.
There are a few services like – Blob, file, queue, table etc.
If you select Account kind as BlobStorage then it is specific for only Blob service. General purpose kind supports all type of services.
Look at this link for more details on it –
On creating storage account, you can decide what type of protocol you will be using for file share.
Mostly it can be SMB or NFS.
What is the difference between SMB and NFS?
– Service Messaging Protocol (SMB) is the native file sharing protocol for windows systems and Network File System (NFS) is used in Linux systems for file sharing.
– With additional configuration, we can enable NFS is Windows, similarly SMB in Linux.
Actually I wanted to try mounting blob storage using NFS protocol, but there were lot of restriction with Windows home edition. I have to give up at one point and start with SMB protocol for File Shares.
Azure has a nice documentation on blob storage for NFS protocol in preview/Beta mode for both Linux and Windows–
You can also mount blob storage as a filesystem using blobfuse –
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
In the first part of the tutorial, I will mount the file share using SMB protocol.
For SMB configuration, you don’t need to worry much during storage account creation.
But with NFS, you will find a handful of restrictions – like restricted to a certain kind, location, networking (full public is not allowed!), etc.
Click create to complete the storage account creation.
Like every resource, in the left panel you will see useful settings to manage and monitor the resource.
You see the four main services – Blob, File, Table and Queue
I will give one line description for each service.
Blob – Unstructured data in huge size
Files – Normal file system.
Tables – Store structured NoSQL data.
Queues – Asynchronous message processing like events.
Azure has nice documentation for each of these terms –
https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction
Here in this tutorial, I am going to use Azure Files. In real-time projects (may be mostly) you will see Azure blob storage. In the next repository solution, I will be using Azure blob storage.
One more thing is on the networking.
You can define your firewall or virtual if needed ( mostly in real time you will set this up).
For this tutorial, We will not use any virtual network resource, instead allow all network access.
Let’s start with mounting Azure files using MSB protocol.
Solution 1: Mount your file system
Step 1: Configure File shares
Click on File shares on the left panel.
You can create a new file share.
I am not going to specify any quota limit.
Tiers – It defines the price and performance of the file shares. I am going to accept default and create.
Here, I created a new file share.
Step 2: Create the file structure inside the File share.
You can use the Upload / Add directory to define your file system. I am creating a directory – loans and upload a CSV file inside the directory.
Now I need to mount it in my Windows system, where I run my Pega personal edition.
You can mount it in 2 ways.
- Using powershell
- Using file explorer.
We will do both ways.
Step 3: Prepare your Windows system to support SMB protocol.
In windows you can search for “Turn windows features on or off”. Then enable SMB support.
You need to restart your system at this point.
Step 4: Get the connect command from your FileShare service.
In the FileShare screen where you added directories and files, you can also use connect option.
Click on connect.
On the right panel, you will get few configurations and the command to execute.
I am going to mount it on P drive with storage account keys.
On selecting these two options, the command dynamically renders based on our input.
You see the storage account username and password is passed through.
You can find the same keys under the left panel of your storage account.
You can use either key 1 or key 2.
Step 5: execute the command from Powershell and verify the mounted drive.
So what the command does is,
It first checks if the TCP port 445 is available.
In an organization because of some network restrictions, you may not have the port open. Please check the troubleshooting page for more details –
– Saves the password, so that the drive will persist on reboot.
– Mount into P drive.
If you see the console output, you find everything went well.
Now switch to the file explorer and you will find the new mounted drive P
The other test drive you see in the picture is because I did test drive.
In the file explorer, you can also see the directory and file.
Now you can play in this P drive and check in the FileShare from Azure. You will see both will always be in sync, of course, It is just mounted :).
Step 6: Mount the network drive from File Explorer.
In the file explorer, you have an option to mount the network drive.
In the folder specify your fileshare location – filelistenerdemo.file.core.windows.netfilelistenerdemo
Use Connect using the different credentials checkbox.
Click finish.
You will get a popup to enter the password. Use the storage account access keys.
It will take a couple of seconds to mount the network drive.
You see, R drive is also now mounted.
In real-time scenario: Most Organizations maintain a on-prem storage called Network Attached Storage (NAS). Let’s say your pega application has 4 servers and 2 servers are dedicated for background processing. On your infrastructure, you can have NAS storage that can be connected to the network where your pega servers are connected. Now you can mount the cloud storage file system into NAS file system using NFS, MSB protocols, that can accessed by the background processing nodes.
You can read about NAS in the redhat site –
https://www.redhat.com/en/topics/data-storage/network-attached-storage
Step 1: Configure the listener location –
Here I will use the mounted location – P:loans
Step 2: Start the listener from admin studio.
Step 3: Verify the file processed successfully with new folders created automatically.
You can see the same in FileShare.
So, this is one way of solution where you can mount the Azure storage fileshares into your Pega running machine and hence file listener can easily access the file.
One drawback is, you need to maintain your own storage in your on-premise system for mounting.
Solution 2: Using the repository rule.
Let’s see the second solution which can be used in Pega 8.4 + versions.
What is the repository rule?
– Repositories usually act as a centralized storage to store documents, artifacts and can also provide versioning support.
– Repositories can connect to different external sources.
In Pega, mostly we use repositories for two things.
1. To store the deployment artifacts. Eg: PDM uses the repository rule to store the generated zip into different repositories.
2. We can also save the attachments in the case processing.
Create a new repository rule.
Important note: Pega uses two access roles to manage the creation and use of repository
PegaRules:RepositoryAdministrator – To create, manage and delete instances.
PegaRules:RespositoryUser – Cannot create or delete but can view, browse, and fetch instances.
Records -> Sysadmin -> Repository -> Create new
Here you get a wizard to connect to different repository types. JFrog, Amazon S3, File system, Azure.
In our scenario, we will use Azure.
There are 3 main configurations –
Container – Wou need to specify the container name.
Authentication profile – We will maintain the storage account name and key.
Root path – We can specify the root path for your repository. In our scenario, we can use loans directory.
We will create a container in azure storage account and then come back and fill the details.
Step 1: Create a blob container service in azure storage account.
In the left panel, click on Blob service – Containers
Click + Container, to create a new container.
I am creating in name dev, this will be my container name.
Step 2: Configure the directory in Blob container.
Let’s use the storage explorer to connect to add a folder and a file.
Note: You can also download Azure storage explorer tool and connect to your existing storage account outside azure portal to manage the stored artifacts. Follow the download link – https://azure.microsoft.com/en-us/features/storage-explorer/
Now I created a container, root directory and I have the storage account keys.
Step 3: Configure the repository rule in Pega.
Step 3.1: Click on the authentication profile pointer to create new instance.
You will be automatically selected with the Azure auth profile screen.
Click create & open.
There you can give the storage account name and key copied from the storage account left panel Access keys link.
Save the authentication profile.
Save the repository instance.
Step 4: Update the file listener to look into the container root directory.
The format will look like relative path reference. I used – file:// <Azure repository rule name>:
Save the listener.
Step 5: Restart the listener and check the blob container.
I restarted the listener from admin studio.
There you see the file is processed and the respective folders got created automatically.
Just open reports folder – you will see empty blob file created.
The reason is, Microsoft Azure does not support creating empty folders.
Think, how this folders are created by pega? – Everything is through repository API.
You will see list of APIs like
- Create folder API
- Create file API
- Get file API
- List files API
- Delete API
You have nice documentation for each API.
I will save it for my next post on uploading case attachments to Azure Storage.
By this solution, you DON’T need to manage any extra storage to mount the Azure storage file shares or containers. You can directly manage cloud storage using your repository APIs.
As a summary,
– File listeners can process files stored in cloud storage.
– You can mount the cloud storage file system to the container or VM file system using NFS, SMB or other applicable ways. File listeners can easily process the mounted files.
– With Pega 8.4+ versions, you don’t need managed the mount storage volume, instead you can perform the actions – listening, reading, writing using repository instance and APIs.
Enjoying writing posts on Azure 🙂