Stealing focus from Firefox in Linux
This documentation previously located on the wiki
This page describes an essential component of the native events implementation on Linux - focus maintaining. In order for native events to be processed in Firefox, it must always retain focus. In case the user decides to switch to another window (a thing which could be understood), Firefox must not know it lost focus.
Solution overview
Basic idea
The basic idea is to get between the XLib (X-Windows client library) layer and the application. X-Windows notifies the application of events (user input, windows being destroyed, mouse movements) by asynchronous events. The events that indicate loss of focus FocusOut are discarded. The idea is based on Jordan Sissel’s implementation of a pre-loaded library that over-rides XNextEvent - see http://www.semicomplete.com/blog/geekery/xsendevent-xdotool-and-ld_preload.html.
Extension
This simple implementation works well as long as there’s one browser window. When multiple windows are involved, several challenges arise:
- Even though new windows may be opened, native events must continue to flow to the active window. However, most window managers will give focus to newly-opened windows.
- Window Switching: When wishing to switch to another window, the focus has to be moved. This requires cooperation between WebDriver’s Firefox extension and this component.
- Closing windows: When a window is closed, focus must move to another window. By design, WebDriver does not guarantee anything if the active window is closed - until a new window is being switched to. In this situation, special care must be taken.
Interaction with other components
The basic idea requires no interaction with other components of WebDriver. However, when multiple windows are involved - creating, switching or destroying, this component should be aware of it. New window creation cannot be tracked - as it may happen as a side effect of many operations. Switching and closing can be tracked.
Involved technologies
To understand this solution, one should be familiar with X-Windows and its events. Knowledge of the GDK event processing loop is also useful.
Implementation Details
All of this describes the code in firefox/src/cpp/linux-specific/x_ignore_nofocus.c
.
The shared library
Hijacking the events is done by over-riding XNextEvent.
A shared library containing a modified implementation of XNextEvent
is loaded using LD_PRELOAD
.
The modified function opens /usr/lib/libX11.so.6
and invokes the real function.
Then the event that the real function returns (i.e. the real event) is inspected.
Identifying events
Under the basic idea, FocusOut
events will be simply discarded. However, window switching complicates matters.
Data structure
There’s a global data structure that remembers the following information:
- The active window ID (if there is such one at the moment)
- The ID of a new window that’s being created (again, if exists)
- If window switching is in progress.
- If window closing is in progress.
- Was the focus given to another window and should be stolen back to the active one?
- Was a
FocusIn
event already received by the active window? - Did we set the currently active window as a result of a close operation?
Firefox starts up
FocusIn
event arrives and the active window ID is 0. A new active window is set. Note that during the creation of the main window, another sub-window is created and a FocusOut
event is sent to the active window. Fortunately, this FocusOut
event indicates that the focus is going to move to a sub-window (identified by NotifyInferior
) so it is allowed.
The user has switched to another window
This is indicated by a FocusOut
event with a detail field that is neither NotifyAncestor
nor NotifyInferior
. This event is simply discarded and replaced with a KeymapNotify
event, which is promptly discarded by GDK.
A new window is being created
This condition is identified by a ReparentNotify
event. When this happens, the new_window field will be set to the ID of the newly created window. Subsequent FocusOut
events will be allowed - during the new window creation events will flow as usual (FocusOut
event from the active window, FocusIn
event to the new window, FocusOut
to the new window and FocusIn
to a sub-window of the new window). After the sub-window of the new window receives FocusIn
, a call to XSetInputFocus
will be issued to return the focus to the active window.
A window switch occurs
During a window switch events will flow as normal. A window switch is considered done when the sub-window of a window receives the FocusIn
event. A window switch starts by identifying the file /tmp/switch_window_started
. In this file, a switch:
string following a window ID is written (the ID is just for debugging purpose). This will change the active window ID to 0 and the state to “during switch”. During a switch (or when there’s no active window) no events are discarded.
A window is being closed
Very similar to window switching (also identified by reading the file). However, it is indicated that the window is being closed - in case it was closed, no focus stealing will take place. In addition, the DestroyNotify
event is being identified to find out when the active window is being closed (explicitly by the user or implicitly by some other operation that is not an explicit call to close). In this case, the active window ID will be set to 0 as well.
Important Links
- Jordan Sissel’s original XSendEvent hack
- XLib events and the XLib programming manual
- The X programming manual / specification