File listener Usage and Configurations in Pega
Here comes the missing piece in the file processing series articles. FILE LISTENER
As a recap, here are my previous articles related to file processing
Service File
Parse delimited
Till now we have seen, how to parse the data using parse rules and how to process the enter file using service file rules.
Let’s check on configuring the file listener rule.
We will stick with the same requirements, we used in the previous articles.
Business requirement: External system sends us the input file with the list of customer details. For each record in the file, you need to create a new Sales case with the details provided.
Below is the sample file.
We have implemented Part-2 using service file and parse rules and tested by manually providing the input file to service-file rules.
Note: for testing, you can always do that by running service file manually by providing the input file.
But in real-time, the incoming file will be coming from a different system to a different location.
Let’s concentrate on Part1 – External system sends us the input file with the list of customer details.
Think about the mailbox.
Ok, now imagine an agreement is made with the external system.
External system will always place the input file at a certain location and forget. Our pega system should process the incoming file.
Pega provides a rule called File Listener to listen and process the file ( via service file)
Before getting into the file listener configuration part, ask yourself what are the basic details you need to listen to a file?
1. At which location should I keep on checking the file?
2. Should I process only specific file type?
3. Do I need to perform some parallel processing to improve the performance?
4. How frequent should I keep on checking the location?
5. Do I have access to the source location?
Note: Platforms like Windows, Linux provide different permission level for different folder structure.
Permissions for the location/directory can be configured in the platform level. We will see more about it later in the post
6. What should I do, once I process the file successfully?
7. How to perform error handling?
8. Do I need to log and report?
9. Do I need to perform any post-processing activity?
10. After picking/listening to the file, how you will process it? – You know the answer to this question 🙂 Using service file rules!!
These are the basic details that need to be configured in the file listener rule
Note: Much of the technical difficulties are properly buried in the underlying Java code. So no need to worry about those.
Configure and complete the file listener rule form
What is file listener rule?
you know the definition already. File listener answers all those 10 questions 😉
Where do you reference file listener rule?
File listener is the starting point for file processing. So, no rules can refer to file reference rule!
How to create a new file listener rule?
File listener can be created in 2 ways
a) Using service wizard
b) You can always create it manually.
Let’s create it manually
Step 1: Records -> Integration resources -> File listener -> Create new
Provide a valid listener name and click create and open
Step 2: You will see three main tabs in file listener rule.
- Properties
- Process
- Error
What are the configuration points in the file listener rule?
I am going to try a different way. Let’s configure the file listener rule by answering the 10 questions 🙂
1. At which location, should I keep on checking the file? – Source location
Normally, we will specify the location that is accessible in the server path.
We have to decide the location. Okay, let’s find a good location for our file listener.
Here I have installed my personal edition in Windows10.
Note: Most of the enterprise pega applications use the Linux operating system.
This is my path, where Pega web application is hosted. C:/PRPCPersonalEdition/tomcat/webapps
You can see different web applications (help, SMA) are hosted here in the location.
It is a good practice to have a common directory (location) for batch processing.
So, I am creating two new folders inside prweb – batch and CustomerData in parent-child tree hierarchy
Below is my directory location, where I expect the customer data file.
Now, I know my source location
C:/PRPCPersonalEdition/tomcat/webapps/prweb/batch/CustomerData
Let’s configure it in the file listener rule form
File listener keeps on polling this location for new input files.
We need to specify the file location/directory in the source location field, under Properties tab.
Here I did the worst development, by hardcoding the path!! In real-time, never hardcode this location. This field supports dynamic referencing where you can populate the value in a data page.
Use the syntax =<Pagename>.<PropertyName> to refer the location dynamically
Next question
2. Should I process only specific file type?
We know that pega can process different file types (CSV, PDF etc). We may also get multiple files in the same location. Some of the files might not be interesting. You may need some filtering to process only the right file type!
In such a case, you can use masking on the source name.
In the agreement, it was mentioned the external system will always send CSV file. The name format of the file will be
CustomerData_<DD_MM_YY>.csv
So, daily I get a new file with only changing date values as file name (CustomerData_29_12_2019.csv).
Here my masking would be CustomerData_*.csv
* – refers to wildcard match!
You need to specify the source name mask under source properties in the Properties tab.
Disable case sensitivity – You can use this checkbox to disable case sensitivity for both the fields in source properties – Source location and source name mask.
Note: If you want to listen to any file that comes to the specific location, make the source name mask as empty or go all out wildcard *.*
I am going straight to the last question! You know it’s an easy question
10. After picking/listening the file, how you will process it?
You know the answer to this question 🙂 Using service file rules!!
You can specify the service file rule under listener properties in properties tab
You need to select the service file key name in 3 different dropdowns
Service package-Service class-Service method
I am using the same service file, I created in the service file article. You can always visit my last article for more info!!
In brief, the service file, parses the customer records and creates a sales case 🙂
Save the rule form and click test connectivity. You should see the status Good
I say, these 3 are the minimum viable configurations for the service file rule.
In remaining fields accept the default configuration. Let’s test once 🙂
Step 1: For now, just trace open the service file rule
Step 2: Just place a sample file ( the file I used in my previous post) in the source location
Below is the data
Note: remember to follow the file naming pattern (CustomerData_*.csv)
As soon as you place the file, the listener will begin to pick.
I see Tracer ran well 🙂
You can verify the sales cases!
Note: updated by is null, because this is batch processing. You can initialize the service context with default values
Important note: If you see, your listener is not picking the file, please check from SMA or Designer studio -> system -> Operations -> listener management and start the listener if it is not started!
Note: you can always start, stop listeners in the listener management landing page as well as in the SMA.
If you don’t find any active listeners in the grid, you can manually start the listener from the above highlighted screen.
We have successfully listened and processed a file.
Wait, the post is not ending here. You still have a looongg road to walk beside me.
Go back and check the magic happened in the source location!!
You can see the file disappeared (picked by the file listener) and two new folders are created – report and work_<listener name>. But currently, both the folders are empty!!
Why it is empty? What is the usage of these folders?? – I will reveal the reason soon in the coming questions
Let’s go back to the question line
3. Do I need to perform some parallel processing to improve the performance?
Listener nodes
In a real production environment, you may have multiple nodes in a cluster.
In some projects, they may have some specific node for batch processing to improve performance. Other projects, they may have only one node.
You can decide on which node you need to run this listener. You can make the configurations in the properties tab.
Block startup – Always keep this unselected. If selected, the listener will be blocked. Listener will run again only when this is unchecked.
Startup option – you get 4 options
a) Run on all nodes – you know what it means :). The listener runs on all nodes in the cluster
b) Node-based startup – Here you can specify to start the listener only on specific nodes, using node id.
You can specify multiple node id in the array
c) Host-based startup – Similar to node based startup, we can start the listener only on specific nodes, using hostnames.
Here you can specify hostnames. There may be some situations, where nodes can share the same host name, you can specify the count
d) Node classification –
What is node classification?
– This is a new concept introduced in Pega 7.3
– You can classify nodes based on its usage.
Imagine, your application has some long running batch. If the batch or agent runs on the same node, where the end users log in and work, then it MAY have some impact on the performance. Batches can eat up the system’s resources. Already there are some ways to make the batch or agent to run on specific nodes.
With the node classification, you can distinguish the nodes based on its usage.
There are certain ways to configure node types. I am not going to explain it. Please check out the links in Pega community site for more details
So with the node classification, you can map the agents and listeners to run on a specific node type.
Note: Try to use node classification option over node based and hosed based startup
You have a button to reset the startup. You can use it when you want to change the assigned nodes.
There is one more area to improve the performance
Multithreading
In the listener properties block, you have the option to specify concurrent threads.
Usually, the listener uses Java thread for processing a file. Say, for example, you need to process more than 1 file at the same time, then you can go for multithreading.
Note: please make sure you will have sufficient JVM memory before going for multithreading.
4. How frequently should I keep on checking the location?
Based on the agreement, you can decide the polling frequency. The default value is 60. It means for every 60 seconds, a new Java thread is initiated to poll the location and check for a new file.
You need to configure the polling frequency in the process tab.
I left it to be default 60 seconds.
Let’s see a few more general configurations
Lock listener’s temporary file – This can be used only when the listener runs on multi-nodes.
Don’t think that same file will be processed by multiple listeners :). Never. Actually, the listener creates some temporary files (rpt, err, dup) and they follow some naming convention in some order. So using the option can help to avoid name collisions.
Note: Using this option will have an impact on the performance!
Process empty file –
By default, the listener will not process the empty file! So, by default, this will always be unchecked. If you really really need to process it!!!, then select this checkbox!
Next question, please 🙂
5. Do I have access to the source location?
Platforms like Windows, Linux provide a certain permissive level to folders and directories. If the pega user (not the pega operator ID) on the platform, does not have access to that folder/directory, then the file will never get picked up for processing!
Usually, these settings and configurations are maintained by the System administration guys. We don’t want to configure the access within Pega, because the configuration stays with the operating system.
I explored the permission level in the Linux box, I can show something in Windows as well.
How to change the permission level in Windows?
Step 1: Go to the source location folder – CustomerData
Right-click and click on properties.
Step 2: Go to the security tab and click edit.
Step 3: On the new popup, You can either add and remove users/groups or deny the access for specific operations.
Do not apply it!
If you really want to try it. You can. Once you change the permissions to deny, then try to stop-start the listener, you cannot!
All these are for Windows machines.
Similarly, you can change the permission levels on files and directories in the Linux platform using the command line terminal.
You can also search in Google for permission level in Linux
For now, make sure the source location directory has the right access
6. What should I do, once I process the file successfully?
Now, first let’s go check the folders created in the source location.
You see two folders.
a) Report – this I will explain in another question!
b) Work_ProcessCustomerRecordsListener
Click and open the work_<listenername> folder.
You will see a folder called completed.
Once the file gets picked up from the source location and processed (success or failure), it can move to the completed directory.
This is the default folder structure behind the file listener architecture
If you look at the completed folder, you will see it as empty! Why??? – because of the default configuration
Switch to the process tab, cleanup block.
Delete – will delete the input file after it gets processed.
Keep – It will never remove the file, despite success or failure.
Again, the decision can be made wisely. For debugging, you can always keep the file accepting the maintenance overhead!
Time to test cleanup
Step 1: Change the configuration to Keep. Save the file listener.
Step 2: Place an input file in the source location CustomerDate_30_12_2018.csv
Step 3: You can verify the sales cases created.
Step 4: You see the successfully processed file is backed up in the completed folder.
Now, you know what to do when the processing is successful!
Obviously, the next question would be what to do in case of error.
7. How to perform error handling?
We have a separate tab for error handling 🙂
Error recovery – This is simple.
Attempt recovery – On selecting this checkbox, you can retry processing the error file.
If not selected, no retry attempt will be made.
Max recovery attempts – You can specify, how many times you want to retry
One more similar block – Cleanup
Same like process tab, we get an option to do what, when the file processing ends in error.
Rename the file with different extensions. The default is .err.
Or you can delete the file.
It is wise to use the rename option and keep the file so that it can be used for debugging error.
Let’s test this error handling with the retry option. Below is my configuration.
Step 1: Save the file listener rule form with the error tab configured.
Step 2: Trace open the service-file rule.
Step 3: Prepare a junk data .csv file
Step 4: Save the file and place it in the source location, where our file listener polls every 60 seconds.
You will see the error in the tracer. Wait for 3 attempts, 180 seconds. Till then you can see the file in the source location.
You can see it retried 3 times and the file disappeared on the third retry attempt.
Step 5: Move to the completed folder and check the error file.
Cool!!
What happens if you get the same file again and again – duplicate file
You don’t want to create multiple sales cases for the same customer.
You got a configuration point.
You can ignore duplicate file names
You may get a question, how pega identify the duplicate file name? So where will Pega look up to identify the duplicate file?
All are some data instances!!
Go to App Explorer and search for the class Log-Service-File. Click on the class to view the class instances.
You see a lot of instances there with the file name column. You see i tried uploading the file 13 times
(I do always try multiple times before documenting the same in the blog :D).
Now you know the duplicate search. So what happens after identifying the duplicate file?
It skips processing and saves the file with .dup extension in the completed folder (same like .err extension)
You can test it on your own.
That’s it on error handling within listener rule form.
8. Do I need to log and report?
I hope you remember a specific folder for the report in the source location directory
First, let’s check the configuration in file listener rule form.
You know Pega takes care of the report handling very well. So it’s better to leave the default configuration as such 🙂
It uses a default page – LogServiceFile and generates the entire page content ($XML) as the source property and then saves the report file as .rpt
Note: You can always customize by using your page and property to report.
Generate report file? – Only on checking this, a report file will be created under the report folder.
Persist Log-Service-File instances? – I really don’t know the use of this checkbox!!! Every time, we persist the instances (Log-Service-File) by default. I am going to ignore it!
Let’s test by generating the report file
Step 1: Follow the below configuration in the reporting block.
Step 2: Place a file in the source location.
Step 3: Wait for processing and then check on the report folder.
Step 4: You can open and see the XML data.
Now you know to report on the files.
Okay, last question!
9. Do I need to perform any post-processing activity?
I never used one such till now, but pega provides an option to configure an activity.
You can specify an activity to run after file processing is completed.
Note: you can always do some post-processing stuff in service-file rule
We are at the end of the file listener article.
The configuration for the file listener lies in answering the 10 questions!!
This is one of my favourite articles! Hope you liked it.