Automation of RPA products can be roughly broken down into desktop applications/browser applications/others (Mainframe/Java, etc.) by type. This article will analyze the principles of automation of desktop applications by RPA or automation class tools.
No matter what technology is used to perform automation, it must include at least the following elements:
Look for elements, controls, input fields, and so on
Read the text, status, location, whether to check or not, and so on
Click, double – click, input content, drop – down and other operations belong to the scope of control
Windows API is a set of core application programming interfaces (apis) provided by Microsoft in the Windows operating system.
The Windows API (Win32) focuses on the programming language C because its exposed functions and data structures are described in that language in the latest versions of its documentation. However, the API can be used by any programming language compiler or assembler that can handle low-level data structures and call conventions specified for calls and callbacks. Similarly, internal implementations of API functions have historically been developed in several languages. Although C is not an object-oriented programming language, both the Windows API and Windows have historically been described as object-oriented. There are also many object-oriented language wrapper classes and extensions that make this object-oriented structure more explicit (Microsoft Foundation Class Library (MFC), Visual Component Library (VCL), GDI+, and so on). For example, Windows 8 provides the Windows API and WinRT API, which are implemented in c++ and are object-oriented by design.
Whether it is MFC, VCL or VB6, Win32 SDK is the root of it. In fact, what we deal with in the end are HWND and Windows Message.
【This article takes Windows 10- calc.exe as example】
Spy++ information available is very limited
Spy++ can only identify the most basic calculator form name, location, HWND and other information, can not identify more details of each button, Win32API can not fully automate this kind of application.
Spy++ difficult to find information
Spy++ tool extracted hundreds of various forms of information, resulting in low search efficiency, more suitable for some older applications, and MSAA and UIA can be used with a good effect.
Win32API can operate on the underlying OS interface, used to handle the simulation of the keyboard and mouse, some special forms have a magic effect.
BluePrism win Mode information obtained is consistent with Spy++
BluePrism can manually switch Win32/AA/UIA pickup mode.
Based on Win32API has a very well-known automation tool AutoIT
AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting. It uses a combination of simulated keystrokes, mouse movement and window/control manipulation in order to automate tasks in a way not possible or reliable with other languages (e.g. VBScript and SendKeys). AutoIt is also very small, self-contained and will run on all versions of Windows out-of-the-box with no annoying “runtimes” required!
【 Win32API Disadvantage：】
Complex to use and high implementation cost. The use of Win32 API has a lot of need to pay special attention to the details, such as some Win32 API can not work across the process, some Windows Message can only be sent to the current thread created by the window, a little careless, will lead to automation program instability.
Too low level, inconvenient to use. In order to facilitate invocation, it is often necessary to encapsulate the API, which increases the implementation cost. For example, Win32 query form method, built-in methods FindWindow and FindWindowEx need to get HWND layer by layer, layer by layer search, in the nested layer UI program before particularly inconvenient. So you need to encapsulate some practical methods and classes
Different development tools, such as MFC, VCL, and.net Framework, deal with Win32 API in many details on the internal implementation. For example, the maintenance of WinForm Control on Win32 HWND in.net is dynamic, and the same WinForm Control’s HWND may change during the life cycle of the program, which is a fatal problem for Win32 apis that rely on HWND as their unique identity.
Many new framework form contents cannot be recognized. This calculator application, for example, makes the Win32 API almost useless for such applications.
Microsoft Active Accessibility (MSAA) is an application programming interface (API) for user interface Accessibility. MSAA was introduced in 1997 as an additional platform for Microsoft Windows 95. MSAA is designed to help assistive technology (AT) products interact with the standard and custom user interface (UI) elements of the application (or operating system) and access, identify, and manipulate the application’s UI elements. The AT product works with msaa-supported applications to provide better access for individuals with physical or cognitive difficulties, disabilities or disabilities. Some examples of AT products are screen readers for people with limited vision, on-screen keyboards for people with limited physical contact, or voice-over devices for people with limited hearing. MSAA can also be used for automated test tools and computer-based training applications.
AccExplorer recognizes the number 9
The AccExplorer tool recognizes the response very quickly, but it ignores many of the attributes that are important for automation, such as AumationId, Control Type, etc., so it’s inconvenient in practice.
MSAA pick-ups can be adapted to a wide range of Window desktop applications, and although UIA will inherit MSAA, there are plenty of scenarios where MSAA can handle automated elements at the moment.
Uibot(Local RPA Tool) recognizes the number 9 by default
MSAA way to identify calculator keys 9,
The Control Type of “numeric keypad” : 50026
Control Type of “nine” :50000
Other indicators can also be corresponding.
Thus, it can be determined that the default method for Uibot to recognize the Window application form is MSAA, and the name of the “Control Type” property is changed to “cid”.
Uibot USES MSAA to identify and influence elements very quickly, and it should be well optimized.
【 MSAA Disadvantage：】
MSAA is based on COM technology, but IAccessible is not a COM standard interface. For example, users do not need to call CoInitialize to use it, nor can they get further custom interfaces through QueryInterface. This limits what MSAA can offer.
The definition of the IAccessible interface is flawed. Many of the methods are optional, but some of the key methods to support UI automation are missing. For example, it provides accSelect to support the selection of controls, but there is no method like accExpand to support the expansion of tree controls and so on.
Complex form element positioning is inefficient
Microsoft UI Automation (UIA) is an application programming interface (API) that allows access to, identification, and manipulation of user interface (UI) elements of another application. The goal of UIA is to provide UI Accessibility, which is the successor to Microsoft Active Accessibility. It also promotes GUI test automation, and it is the engine on which many test automation tools are based. Many RPA tools also use it to automate desktop applications in business processes.
UIAutomation is a new set of UI automation testing technology, or UIA, launched by Microsoft from Windows Vista. In the latest Windows SDK, UIA and other components that support UI Automation, such as MSAA, are released together, called the Windows Automation API. UIA’s property provider supports both Win32 and.net programs.
Uipath picks up key 9 by default
Reading the Selector property of the UiPath picker, you can see that UiPath has extracted the above image by UIA’s picker.
RPAPlus SPY+ can clearly show the tree structure, UIA attribute
[example of UIA application 2- WeChat desktop version]
Uipath pick up WeChat desktop – entire form highlighted
Can’t pick up the WeChat desktop version? It’s not
With rpaplus-spy + scan analysis, the reason can be found immediately
WeChat desktop – added a popupshadow to the form surface
The WeChat desktop version adds a shadow-like or masked form to the surface of the application to block detection by the UIA recognition tool. (Wow~ ~ ~)
[UIA VS MSAA:] from Microsoft official documentation
Basic Design Principles
Although Microsoft Active Accessibility and UI Automation are two different technologies, the basic design principles are similar. The purpose of both technologies is to expose rich information about the UI elements used in Windows applications. Developers of accessibility tools can use this information to create software that makes applications running on Windows more accessible to people with vision, hearing, or motion disabilities.
Both Microsoft Active Accessibility and UI Automation expose the UI object model as a hierarchical tree, rooted at the desktop. Microsoft Active Accessibility represents individual UI elements as accessible objects, and UI Automation represents them as automation elements. Both refer to the accessibility tool or software automation program as the client. However, Microsoft Active Accessibility refers to the application or control offering the UI for accessibility as the server, while UI Automation refers to this as the provider.
Properties and Control Patterns
Microsoft Active Accessibility offers a single Component Object Model (COM) interface with a fixed, small set of properties. UI Automation offers a richer set of properties, as well as a set of extended interfaces called control patterns to manipulate accessible objects in ways Microsoft Active Accessibility cannot.
For more information, see UI Automation Properties Overview and UI Automation Control Patterns Overview.
MSAA Roles and UI Automation Control Patterns
Microsoft designed the Microsoft Active Accessibility object model at about the same time as Windows 95 was released. The model is based on “roles” defined a decade ago, and you cannot support new UI behaviors or merge two or more roles together. There is no text object model, for example, to help assistive technologies deal with complex web content. UI Automation overcomes these limitations by introducing control patterns that enable objects to support more than one role, and the UI Automation Text control pattern offers a full-fledged text object model.
Object Model Navigation
Another limitation of Microsoft Active Accessibility involves navigating the object model. Microsoft Active Accessibility represents the UI as a hierarchy of accessible objects. Clients navigate from one accessible object to another using interfaces and methods available from the accessible object. Servers can expose the children of an accessible object with properties of the IAccessible interface, or with the standard IEnumVARIANT COM interface. Clients, however, must be able to deal with both approaches for any server. This ambiguity means extra work for client implementers, and broken accessible object models for server implementers.
UI Automation represents the UI as a hierarchical tree of automation elements, and provides a single interface for navigating the tree. Clients can customize the view of elements in the tree by scoping and filtering.
Object Model Extensibility
Microsoft Active Accessibility properties and functions cannot be extended without breaking or changing the IAccessible COM interface specification. The result is that new control behavior cannot be exposed through the object model; it tends to be static.
With UI Automation, as new UI elements are created, application developers can introduce custom properties, control patterns, and events to describe the new elements. For more information, see Custom Properties, Events, and Control Patterns.
Transitioning from MSAA
The Windows Automation API framework provides support for transitioning from Microsoft Active Accessibility servers to UI Automation providers. The IAccessibleEx interface enables support for specific UI Automation properties and control patterns to be added to legacy Microsoft Active Accessibility servers without needing to rewrite the entire implementation. The IAccessibleEx interface also allows in-process Microsoft Active Accessibility clients to access UI Automation provider interfaces directly, rather than through UI Automation client interfaces. For more information, see The IAccessibleEx Interface.
Choosing Microsoft Active Accessibility, UI Automation, or IAccessibleEx
This section helps you determine which Windows Automation API solution to use to implement an assistive technology product or to make your application accessible to assistive technology products.
New Applications and Controls
If you are developing a new application or control, Microsoft recommends using UI Automation. Although Microsoft Active Accessibility can be easier to implement in the short term, the limitations inherent in this technology, such as its aging object model and inability to support new UI behaviors or merge roles, makes it more difficult and costly over the long term. These limitations become especially apparent when introducing new controls.
The UI Automation object model is easier to use and is more flexible than that of Microsoft Active Accessibility. The UI automation elements reflect the evolution of user interfaces, and developers can define custom UI Automation control patterns, properties, and events.
Microsoft Active Accessibility tends to run slowly for clients that run out of process. To improve performance, developers of accessibility tool programs often choose to hook into and run their programs in the target application process: an extremely difficult and risky approach. UI Automation is much easier to implement for out-of-process clients, and offers much better performance and reliability.
Existing Microsoft Active Accessibility Implementations
If you are updating an existing application or control that is based on Microsoft Active Accessibility, consider adding support for UI Automation by implementing the IAccessibleEx interface. First, ensure that your application or control meets the following requirements:
- The baseline Microsoft Active Accessibility server’s hierarchy of accessible objects must be well organized and error free. IAccessibleEx cannot fix problems with existing accessible object hierarchies.
- Your IAccessibleEx implementation must comply with both the Microsoft Active Accessibility specification, and the UI Automation Specification. Microsoft provides a set of tools for validating compliance with both specifications. For more information, see Testing Tools.
If either of these requirements is not met, consider implementing UI Automation natively. You can keep legacy Microsoft Active Accessibility server implementations for backward compatibility if it is necessary. From a UI Automation client perspective, there is no difference between UI Automation providers and Microsoft Active Accessibility servers that implement IAccessibleEx correctly.
These are the most standard and reliable technical implementations for handling desktop automation, but there are many more options that can be extended beyond the standard approach:
[based on coordinates] / [based on color] / [based on CV] adds more possibilities and infinite imagination to desktop automation, and the end result must be that as long as people can operate, so can robots.
In all automation technology scheme based on “based on the coordinate” is probably the most impossible, the main reason is lack of the articles mentioned “beginning to find, read,” read one of the three elements, results in the decrease of the accuracy of the automated process, mature intelligent automation engineer should try to avoid using the method based on coordinate.
In the future, we will also do more and more in-depth sharing on other types of automation, web automation.
Include: Chrome/IE automation, Java Access Bridge supported automation /Mainframe Emulator automation, and more.
The service number article cannot be updated in real time later, but the article on the official website will be modified and added with the passage of time.