Why PESSRAL is not PESS
The lift industry is quite old-fashioned in electric/electronic/programmable electronic (E/E/PE) safety: they have used the electric safety chain for over more than 30 years. However, since the EN 81-1/2 A1:2005 amendment the standard allows to use programmable electronics for safety systems (PESS). Also, the standard committee decided to implement a subset of the leading standard (IEC 61508) into EN 81 in order to decrease the difficulty and increase the implementation speed: PESSRAL (Programmable Electronic System in Safety Related Applications for Lifts) was born. However, due to cherry picking and skipping the basics the old and even the newest code (EN 81-20/50) makes it possible to create unsafe systems. Where are the potential risks?
The IEC 61508 itself consists of seven different pieces with a total of more than 500 pages. It describes the complete path to follow when creating an E/E/PE safety device. It contains calculations, assumptions, design strategies, risk analyses, and descriptions of quality systems. It results in a SIL (Safety Integrity Level) which is a mathematical number expressing the safety of the system. All of this documentation is needed to end up in a safe system. In contrast: EN 81-20/50 uses eleven pages and claims to be a full package.
The entire process flow for making a PESS is described in a separate part of the standard, the 61508-1. By a clear way of working and project management we try to minimize systematic failures in a system. There are clear demands and this results in a SC (Systematic Capability) value. Techniques which can be used are e.g. project management, documentation, structured design and modularization as well as the SC as this techniques are not demanded or described in EN 81-20. Projects without proper management can contain major mistakes, and these are hard to spot.
For safety software, SIL is used to measure safety. It is a mathematical number expressing the safety of the system. For example: SIL 3 has an average chance of failure between 10-9 and 10-8 or 10-5 to 10-4 an hour depending on the demanded rate. Normally you have to perform a risk analyses in order to determine the needed SIL rate. The EN 81-1/2+A3 and EN 81-20/50 have already performed this risk analyses in it and ask for SIL ratings. This way there is no need for a risk analyses anymore, which creates uniformity in the systems of competitors. However: a risk analyses gives insight in the project and influences the design. This is mandatory in IEC 61508 procedure, but not in EN 81-1/2 and EN 81-20.
So a SIL level is available, but it is not clear by the standard if we’re working in high or low demand. The difference in demand rate between these however is exactly a factor of 10.000 failures/hour. Low demand is explained in IEC 61508-4 as “where the safety function is only performed on demand, in order to transfer the EUC (Equipment Under Control) into a specified safe state, and where the frequency of demands is no greater than one per year”.
For a lift, we do not use the over speed governor more than once a year, so is it than low demand? This is necessary to know because it gives a difference in the calculated safety by a factor of 10.000. It is not plain set in the standard. However the IEC-62061 states that machines shall fulfill high demand. Most of the certifying organizations are following this guideline. Unfortunately it is not set plainly in the
SAFE FAILURE FRACTION
When building a SIL 3 system, the relevant tables in
EN 81-1/2+A3 and EN 81-50 require that a double channel system is mandatory. The main idea of this is “when one channel fails, the other channel will put the system to a safe state”. IEC 61508 has the same principles, but there are some major discrepancies. IEC 61508 describes the model of SFF (Safe Failure Fraction): the fraction of failures which is safe and which is dangerous.
For components where the failure mode cannot be predicted (like CPU’s and other complex systems) the demands are set higher. As last, diagnostic software also increases the SFF. Due to the fact that EN 81-20/50 demands a two channel system for SIL 3 it excludes the use of a totally fail-safe (SFF = 100%) 1 channel system and makes it possible to create a fail-unsafe (SFF 90%) system. If every possible fault in a channel is directly dangerous (SFF = 0%), and if the fault remains undetected a second fault causes an unsafe system. This way, PESSRAL solutions can be less safe than the fault tree analyses present in the EN 81-20.
Due to not performing a risk analyses and the demand for two channels for SIL 3, a new difficulty occurs. By demanding two channels without any further specification it becomes possible to build two identical channels. These identical channels introduce the risk to fail at the same time due to the same error (common cause). Typical errors are a slightly to very low supply voltage, design faults inside a CPU, or temperature. When working with multiple channels, the common cause errors are the largest part of the total. I will demonstrate this with an example.
You can compare it with throwing a dice: by throwing a “1” you will lose: your chance of losing is exactly 1/6. To decrease this chance of losing you can add another dice, now you need two ones to lose the game. When calculating the chance of losing, we do 1/6*1/6 = 1/36. Now we introduce a common cause fault in this “system”: a fault which influences both channels (the dices). Due to the fact that on the other side of the dice the number “6” is represented, and for painting six dots we need slightly more paint. More paint means also more weight, and two opposite sides on a dice always give a total of seven. Due to this faulty design the chance of throwing a “1” is bigger than the other numbers. The chance of a double “1” is also bigger than the chance of another double combination. If I have 5% more chance of throwing two 1’s, the system is 5% less safe than 1/36: we need to add 1/120 to the 1/36.
For this system the impact is relatively small. However the fault chance of a PESS channel is a lot smaller: for example 10-9. Doing the same calculations, the two channel system has a chance of failing of 10-9*10^-9 = 10^-18. Now we add the 5% common cause: 5*10-11. We can see clearly that the common cause part is way bigger than the single channel faults. If we have smaller failing chances in channels, than the common cause will become more important and be the dominant part of the safety calculations, as well as the real safety. EN 81 does not tackle this problem, no techniques for common cause avoidance are described or calculated.
EN 81-20 cherry picks a number of techniques and states them as mandatory. There is no calculation needed anymore
(EN 81-50 states that IEC 61508-6, which explains the calculations, is not needed for understanding). IEC 61508 gives a large number of options; the most suitable technique can be chosen for the system. It can happen that completely non-relevant techniques are demanded, where other techniques are quite more useful. For example; there are no demands for sensors in the lift standard, but when we use a CLPD (Complex Logic Programmable Device) there are still demands for RAM checks and watchdogs; this is not right according
IEC 61508. As last, we cannot check if our diagnostics are good enough. Normally DC (Diagnostic Coverage) has a direct influence on the SFF, and so on the entire safety calculation of the system.
The backbone of IEC 61508 are the underlying calculations. By looking at all components FIT (Failure In Time) rates and design, a calculation of the chance of failure can be made. The calculated numbers should be in line with the SIL rate. FMEA on components and DC in order to improve the SFF ends up in a safer system. IEC 61508 has demands on the SFF which needs to be met.
The calculation is the theoretical basis, it gives insight in the weakest points of the system and proves that the system is safe enough. This calculation is not needed for EN 81, by fulfilling all demands you’re done. These demands describe techniques only, but does not give any numbers. There is no check if the system is “safe enough”. It is possible to end up with a mathematical unsafe system.
For example: I can use two really bad relay’s parallel. When they fail every 10 times, they will both fail at the same time every 100 times (excluding common cause!). It still fulfills
EN 81-20 (double channel with diagnostics): I can detect that both relays are failing.
However: I cannot act on it anymore. When we calculate the failure rates for the system with IEC 61508, we will directly find out that the relays are not good enough for this system: the FIT values will be devastating for the PFH (Product Failure/Hour). Due to the calculation, bad components are filtered out.
Every system needs testing after development: there are always unforeseen problems which are filtered out during the test phase.
Of course a PESSRAL system will be tested, but what test strategy is the proper one? Known that most of the industry has no practical experience with safety software and there are no test strategies mandatory or even mentioned in the standard.
Most commonly known test method is black/white box testing: it is a basic way of screening a system. It is usable for electric and mechanical systems. When creating PESS, the system is a full black box: however, IEC 61508 can also ask for traceability of the requirements, full modeling, software simulation and performance testing. Also there is no test procedure or awareness for common cause faults in the lift norm.
PROOF TEST INTERVAL
As last the lifetime of a system is not considered. Due to the fact that periodical inspection on PESS systems is almost impossible, a lifetime must be specified.
Diagnostics in the system also cannot detect every possible fault, the DC is always smaller than 100%. Normally PESS systems have a “proof test interval”. Meaning of this proof test is to detect the normally undetected errors. EN 81 does not require this. This allows a system to build up an endless amount of errors and gives the possibility to end up with a dangerous fault.
At this moment, only a small amount of lifts work with PESS. For the ones that work, there are no major failures yet. PESS is possible since the first amendment of EN 81-1/2
in 2005. We do not know how many installations are in the field today, so we cannot determine why there were no failures.
There are some possible explanations that can explain the fact that we did not have any accidents:
When making something revolutionary, a company must be absolutely sure that it is safe: otherwise the product will not be accepted in the market by the costumer. For PESSRAL, most lift company’s want to be absolutely sure that after several years it still works: so endurance tests will probably be done. This is a powerful testing method.
There are not that much PESSRAL systems in the world: most lifts have a long lifetime and controls are not regularly changed. Also the development of PESSRAL has just started: there are not that much PESSRAL systems on the market. Most of them are still in development.
The major certification bodies also perform tests on PESS systems. They have their own demands for testing, or will ask for a calculation. Certification bodies also want safe systems, and most of them know how to perform the tests properly.
There is no guideline for reporting crashes, and we cannot be sure that we will hear about all crashes in the world including the cause.
The biggest problems of these possible explanations are the fact that they are not mandatory: there are no demands on test time, there is no requirement for experience in PESS for notified bodies. Also worldwide information about lift catastrophes do not exist related to this topic.
PESSRAL is not PESS: and this is not only due the absence of a lot of background information. The entire mathematical backbone is gone: we cannot calculate if the chance of failure of the system is right. This has a huge impact on the common cause faults.
These are the most dangerous faults for a double channel system. Also the channels itself can be made out of unsafe components. The only way to check the system now is by testing, but testing strategies are not described. as writing, there are no fatal accidents yet. However we cannot explain why they didn’t happen, or predict that they won’t happen. In the end, it is possible to build unsafe systems with the rules of PESSRAL. For now we can only hope that lifts will stay safe, for the future we need EN 81-20 to change as quickly as possible.
Tijmen Molema is a product specialist certification for Liftinstituut. His specialty is in software and electronics. He studied Electronic Engineering and Design at the Hogeschool Utrecht. He started in 2014 as a lift inspector, but quickly became a product specialist for all kind of electronic challenges. His personal goal is to help the lift industry to leave the “old” relays systems, and lead it to a new and progressive market.
By Tijmen Molema