WEBVTT 00:00.000 --> 00:29.480 Good morning everyone. Thank you for the organisers to invite me here today. I'm going to be discussing 00:29.480 --> 00:38.120 our approach to custom hardware in the loopcasts to assure our defence of Linux as a feasible 00:38.120 --> 00:46.920 real time operating system. I'm a software engineer at Colting joined in June of 2024 and in 00:46.920 --> 00:53.080 my spare time I'm a student at the UK's Open University studying combined science technology engineering 00:53.080 --> 01:00.760 and maths. So what we're going to go through today, I'm going to go through what it is exactly 01:00.760 --> 01:05.880 we're trying to test what we're trying to measure, how we measure it, the results we've we've 01:05.880 --> 01:13.080 come across, the results we've come across and how we communicate those results to our colleagues 01:13.080 --> 01:19.640 with the trustable software framework from eclipse. So let's go to the starting point. The 01:19.640 --> 01:27.960 starting point of what we're trying to measure is we're considering Linux as as a component in 01:27.960 --> 01:34.840 a safety-related system running on a multi-threaded multi-core processor which this means that 01:34.840 --> 01:46.520 the in typical safety applications you want you want to have you want you want good real-time 01:46.520 --> 01:51.800 guarantees and those are much easier to generate on a micro-controller running something like a 01:51.800 --> 01:56.440 single-threaded time trigger architecture than they are on a multi-core operating system with 01:56.440 --> 02:05.720 something as complex as Linux. So to meet we have to evidence that for Linux to be feasible 02:05.720 --> 02:11.240 it has to meet those same real-time hard constraints as a micro-controller running something 02:11.320 --> 02:19.640 much simpler and we're working towards certifying that under IEC 615OH as a safety function of Linux. 02:20.840 --> 02:26.040 So this is very important property that the system needs to have and we need to measure and 02:26.040 --> 02:34.040 assure that for every iteration every build of our operating system. So how does the Linux then 02:34.760 --> 02:41.720 offer real-time capabilities? Well since kernel 3.14 it's provided a scheduling policy 02:41.720 --> 02:49.320 called Shed Deadline. So scheduling policies are how the kernel balances time on the CPU 02:49.320 --> 02:55.240 between different competing threats and Shed Deadline is the only scheduling policy well one of 02:55.240 --> 03:01.400 two scheduling policies in the kernel that provides real-time guarantees. So this means that 03:01.480 --> 03:07.800 the requirements for the runtime and period of processes use a specified in no seconds rather than 03:07.800 --> 03:16.280 arbitrary noise values which are giving processes priority relative to one another. So just to 03:16.280 --> 03:21.480 say that we want a particular thread to have these real-time constraints and to use the Shed Deadline 03:21.480 --> 03:28.520 kernel policy for that we use a Cisco called Shed Setata in our thread and when we're 03:28.520 --> 03:33.640 spawning the thread we use Shed Setata and we specify the period the deadline and the runtime for 03:33.640 --> 03:41.720 that thread. So the period says how often should I be on the CPU? The deadline says within each period 03:41.720 --> 03:48.360 by what point in that period should I have finished my execution and the runtime says what is my worst 03:48.360 --> 03:54.920 case total execution time and if we exceed that total execution time the kernel will send us a 03:54.920 --> 04:03.480 SIGX CPU signal and kick us off the CPU. So the kernel has got to balance the demands of multiple 04:03.480 --> 04:08.600 threads all calling Shed Setata and all providing these real-time constraints that they have. 04:08.600 --> 04:17.160 And to balance those it performs the admissions test. The admissions test says it runs every 04:17.160 --> 04:23.240 time you call Shed Setata and it and the kernel calculates whether it can balance this thread 04:23.240 --> 04:28.200 with all the other threads that are already on the system and if it can't balance and the real-time 04:28.200 --> 04:33.880 requirements of this new thread then it will refuse to admit it will refuse to Shed Setata altogether. 04:35.960 --> 04:42.440 So this is a diagram just to show what I meant by that so we have so this is CPU time 04:42.440 --> 04:48.040 along the x-axis we have a period each period, Rusty Workup which I'll come to later this is our 04:48.120 --> 04:54.440 process under investigation it gets this block of green run time it gets it every period and it 04:54.440 --> 05:00.200 gets it before the end of the deadline in every period. It doesn't matter when it happens when it 05:00.200 --> 05:04.440 gets it's run time within the deadline as long as it happens before the deadline that stuck 05:04.440 --> 05:12.920 is factory. So how are we going about measuring deadline Shed Setata? Well we're measuring deadline 05:13.080 --> 05:21.960 scheduling in the context of control OS. Control OS is a Linux distribution based on the latest 05:21.960 --> 05:28.920 upstream kernel with a user land from the free desktop SDK projects and we're measuring scheduling 05:28.920 --> 05:35.720 performance on two hardware platforms with two different architectures we use an Intel look for AMD 64 05:35.720 --> 05:44.840 architecture and a rock 5B for AR64 architecture these are representative, these are representative 05:44.840 --> 05:50.200 target hardware platforms obviously our client would would bring their own hardware into play 05:51.560 --> 05:57.320 and within within our control OS images we have a Rust program called Rusty Workup 05:57.320 --> 06:05.000 Rusty Workup is a program to exercise the deadline scheduler so it has two threads it has a 06:05.000 --> 06:10.680 monitor thread which starts and then it spawns a worker thread by calling Shed Setata but 06:10.680 --> 06:18.360 the critical thing is it allows users to specify the period deadline and runtime that we want 06:18.360 --> 06:23.480 the process to have so it lets us customize what's going to Shed Setata to be a representative 06:23.480 --> 06:30.840 critical process. That worker thread in the meantime while it's spinning it just does 06:31.560 --> 06:37.480 some simple numerical calculations on the CPU with the important thing is measuring how it comes 06:37.480 --> 06:43.880 on and off the CPU and it measures when it's coming off on and off the CPU and measures it's own runtime 06:45.400 --> 06:52.200 and it sends those measured periods and run times to an onboard safety monitor which is another 06:53.080 --> 06:58.760 Rust application that is designed to trigger an external mitigation if the scheduling would be 06:59.480 --> 07:04.520 faulty outside of specified parameters that might indicate a fault with the system that might 07:04.520 --> 07:11.560 need the system to be rebooted or but we also log all the periods and run times for hours and hours 07:11.560 --> 07:16.840 as a system runs through the system de-journal so that these can be downloaded from the system 07:16.840 --> 07:24.280 and analyzed later but the problem that we came across was that when the Rusty Workup is measuring 07:24.360 --> 07:32.440 its own period and runtime it's using the same clock that the kernel use to constrain period and 07:32.440 --> 07:38.680 runtime so there's a common cause of failure here if the kernel was lying it would be lying both 07:38.680 --> 07:44.200 to itself and to Rusty Workup and we wouldn't know the difference so we thought we need a way to 07:44.200 --> 07:49.800 get a signal out of the system somehow during scheduling so that we can have an external clock 07:49.800 --> 07:54.120 measure those periods and run times as well so we can determine whether the kernel was 07:54.120 --> 08:01.960 lying to us or not so that's what this looks like so if we see this Rusty Workup run times 08:01.960 --> 08:07.320 this is the time on the CPU for our March critical process if we customize that critical process 08:07.880 --> 08:13.240 such that when it starts running it gets sends a signal and when it stops running it sends another 08:13.240 --> 08:18.840 signal and it keeps doing that all the time we can measure the runtime which is the time difference 08:18.840 --> 08:23.960 between the started signal and the completed signal but we can also get a measure of the period 08:24.120 --> 08:29.880 which will be the time difference between one started signal here and the next started signal here 08:29.880 --> 08:37.160 so all the while that the critical process is measuring itself and logging to a journal we can have 08:37.880 --> 08:43.880 a piece of hardware listening in and taking second measurements of what of the scheduling performance 08:43.880 --> 08:50.280 so so just to do this we went about designing some custom hardware we used high count for that 08:50.280 --> 08:58.120 and we based our hardware around the Raspberry Pi Pico so this is the PCB that we use you'll notice 08:58.120 --> 09:03.560 it has two Raspberry Pi Pico microcontrollers the one on the right is the one that's actually 09:03.560 --> 09:09.160 listening to those signals and performing arithmetic with its own clock to determine periods and run 09:09.160 --> 09:14.440 time and the one on the left is responsible for flashing firmware to the one on the right 09:15.320 --> 09:20.680 at the start of every C.I. drop so that we have good assurance of where our measurements are 09:20.680 --> 09:25.400 coming from we have traceability of the firmware that was used to generate those measurements 09:26.520 --> 09:34.360 so as I say the one on the right this is the verifier we have we have here two U.R.T input inputs 09:35.080 --> 09:42.440 so the signal descent over U.R. the top one is that RS232 voltage level and bottom one is at TTL 09:42.520 --> 09:49.720 logic level that's just because the rock 5B and the knock which are our two target hardware platforms 09:49.720 --> 09:57.240 they have different voltage levels for their serial ports and and every 10 seconds it computes the 09:57.240 --> 10:03.080 mean maximum and minimum period and run time based on the signal that is received on the U.R. and 10:03.080 --> 10:09.800 it sends those out because the ammo over U.S.B. CDC and then the probe we use the open source 10:09.800 --> 10:18.440 debug probe firmware that the Raspberry Pi Foundation produces which receives binaries from the 10:18.440 --> 10:24.680 test run at M. Flashism at the start of each test over serial wire debug protocol because it turns 10:24.680 --> 10:31.960 out that the Raspberry Pi debug probe that you can buy they're running open source firmware and 10:31.960 --> 10:36.440 have basically identical hardware to a Raspberry Pi peak so we thought you'll be integrated 10:36.440 --> 10:51.480 along one PCB and that A to maintain a better T the T maintain a better T so the firmware then 10:51.480 --> 10:56.360 the peak on the right the one that's doing the measurements this needs firmware on it 10:56.920 --> 11:04.520 we chose to write that in rust we use the embedded how and RP 2040 how crates and 11:05.480 --> 11:11.800 we architected that firmware such as as soon as the signal is received on on the U.R. from 11:11.800 --> 11:16.920 from the rusty worker on the device and the test it will trigger an interrupt it will immediately 11:16.920 --> 11:22.680 take the monotonic timestamp from the pcos clock and it will then queue that into a buffer 11:23.720 --> 11:29.720 and then in the main loop of the program when it's not been interrupted it will deco all those time 11:29.800 --> 11:35.240 stance it will perform the computations on them and it will produce a Yamal output like this 11:35.240 --> 11:41.000 with our measurements down to microsecond precision so we say the runtime in the period how many 11:41.000 --> 11:47.720 samples we had and the minimum maximum and mean the reason we chose rust compared to the alternative 11:47.720 --> 11:55.720 which is C is that rust requires far more of the both peripheral initialization code or at least it 11:55.720 --> 12:03.800 does when you use the RP 2040 how crates and that means we have better understanding of exactly 12:03.800 --> 12:10.600 what the code is doing and exactly the timings and precision of our measurement so for example 12:10.600 --> 12:17.000 in our rust code we have to declare up front the frequency we want to use on the crystal oscillator 12:17.000 --> 12:23.640 of the Raspberry Pi peak and when we do that in code and it comes out to 125 megahertz and then that 12:23.640 --> 12:29.240 makes us realize if we're taking the reciprocal of that then that means if we're trying to measure 12:29.240 --> 12:36.040 a critical process which has a period on the order of magnitude of hundreds of nanoseconds which is 12:36.040 --> 12:42.920 about the reciprocal of 1.525 megahertz you know actually that's inappropriate because it's only a few 12:42.920 --> 12:50.680 clock cycles of the microcontroller but the benefits of rust is that you get compile time guarantees 12:50.680 --> 12:56.840 of type memory safety and actually thanks to the type system in rust and the way the 12:56.840 --> 13:03.240 the crates have been constructed any invalid hardware configurations will be rejected as type 13:03.240 --> 13:08.360 errors at build time which lets you catch problems with the firmware a lot easier a lot more 13:08.360 --> 13:14.760 easily and lastly the embedded rust tool chain is far far easier to handle than the embedded 13:14.760 --> 13:19.960 seat or chain it doesn't go and download stuff from the internet at build time you can use the 13:19.960 --> 13:24.920 native rust compiler and when you're building that rust compiler there's only two lines in the 13:24.920 --> 13:35.240 configuration you need to change to specify you want to use you want to target the thumb v6 and 13:35.240 --> 13:42.120 architecture with the Raspberry Pi peak code for cross compilation so about building that firmware 13:42.280 --> 13:48.360 one of the things we assert as part of the quality assurance of our projects is that all the 13:48.360 --> 13:53.400 testing tools need to be built with the same strict guarantees about reproducibility and 13:53.400 --> 13:59.000 provenance as the operating system itself so for that purpose we use Apache build stream 14:00.680 --> 14:04.680 which I won't go into detail about but it's just to show that our builds 14:05.480 --> 14:09.960 all our operating system build and our firmware builds are declarative so we just 14:09.960 --> 14:15.000 specify which one of the tool chain we want to use we specify which link are we want to use 14:15.800 --> 14:21.800 and we simply specify the source code so this is when source code is coming from 14:21.800 --> 14:25.960 we specify all the crates we want to use and then we just specify an install command and this 14:25.960 --> 14:32.200 goes straight into the docket image that our test runer uses and a benefit of this is 14:33.080 --> 14:37.160 that one of my colleagues will be speaking about later today is because we're using a standard build 14:37.160 --> 14:44.840 system you get a software build material for this for free so that will aid your traceability 14:44.840 --> 14:52.840 story so looking at the recap of the test architecture then we have our two picos we have 14:52.840 --> 14:57.960 the Rusty Worker sends a signal to a verify pico which does the measurements and through a 14:57.960 --> 15:03.720 Python test library these end up in an open search data lake to be an allies later and then 15:03.720 --> 15:09.160 from the second path coming along the bottom the Rusty Worker measures its scheduling itself 15:09.160 --> 15:13.720 it would also report that to the test library through the system de-journal and that 15:13.720 --> 15:19.560 ends up in the data lake so in our data lake for the same scheduling events we have we have 15:19.560 --> 15:26.520 two measurements one made by the process itself and one made by our external hardware so what do 15:26.520 --> 15:34.200 our results say so we've been running this now for about nine months we have hundreds of thousands 15:34.200 --> 15:39.560 of measurements in our data lake and according to our external measurements it's a kernel isn't 15:39.560 --> 15:45.880 lying we're sure that our self measured our self-reported measurements are accurate on the 15:45.880 --> 15:51.960 look where we can say their accurate within plus or minus a hundred microseconds on the Rock 5B 15:51.960 --> 15:58.600 we can say their accurate within plus or minus ten microseconds and that is a factor of 16:00.360 --> 16:06.920 the u-art on the Rock 5B being a peripheral on the same chip as the SOC where is on our 16:06.920 --> 16:13.880 look are u-art is a PCIe peripheral elsewhere on the motherboard so we've used Jupiter notebooks each 16:13.880 --> 16:21.080 each week to allies previous weeks tests and check our performance and if you this is a graph 16:21.080 --> 16:27.720 on the right just to show it's not it's come across all right on the projector but the distribution 16:27.720 --> 16:34.680 in orange is the self measured distribution for the runtime the main runtime of a critical process 16:34.680 --> 16:40.120 and the distribution in bloom is from our external hardware you can see they they line up very 16:40.120 --> 16:48.280 very neatly and some other interesting results we've got from this is that I said we found that 16:48.280 --> 16:52.760 the external measurements are more accurate on the Rock 5B than the look because we understand the 16:52.760 --> 16:57.640 hardware differences between where the u-art lies in the system how close it is to the CPE 16:57.640 --> 17:04.040 but we also saw that during the development of of control ls during that period my colleague 17:04.040 --> 17:10.920 Mateo introduced the preemptarty kernel option which means that kernel threads are allowed to 17:10.920 --> 17:15.880 be interrupted as they run so this is supposed to give better real time performance and we actually 17:15.880 --> 17:21.400 found the car external measurements got less accurate when we enable preemptarty and we understand 17:21.400 --> 17:25.720 the reason why for that is because cave threads are involved in writing to serial ports and if 17:25.720 --> 17:31.560 they're they're getting interrupted then the time taken for a signal to get out of the system is no 17:31.560 --> 17:39.080 longer deterministic so finally I'm going to come to how we communicate these results to 17:39.080 --> 17:45.720 stakeholders and we use the eclipse trustable software framework to do this on control so 17:46.440 --> 17:51.640 the idea of the adjustable software framework in brief is that it gives you some very high 17:51.640 --> 17:57.160 level assertions about what's required for a quality software project and then it gives you 17:57.160 --> 18:04.360 tools to make measurements you make of your own system against those assertions so an assertion 18:04.360 --> 18:10.520 is get each assertion is given a score between zero and one it's a decimal number between zero 18:10.520 --> 18:15.480 and one about how confident we are that score is true and we aggregate the scores for low 18:15.480 --> 18:22.760 level evidence items to get an overall trustable score for the project so if we want to 18:23.320 --> 18:28.920 for each CI run that's using this external hardware and putting data in open search we want to 18:28.920 --> 18:34.280 make assertions for that exact commit of control because we're building a commit with 18:34.280 --> 18:39.160 testing a commit and then we're generating a report for a commit we can use Python code which 18:39.160 --> 18:46.920 will automatically check over search compute the results through the same computations we're doing 18:46.920 --> 18:52.840 in our Jupyter notebooks and provide a confidence score between zero and one for example here saying 18:52.840 --> 18:58.760 that the absolute error according to the data we have in our data like for this particular 18:58.760 --> 19:03.960 share of control on look was we're asserting that that delta is less than 100 microseconds and you 19:03.960 --> 19:09.800 can see it's got a score of one because the validators checked the data and produced that score 19:09.800 --> 19:19.160 and we get a HTML report visible to state holders and engineers for each iteration so what does our 19:19.160 --> 19:25.240 argumentation look like then so at the very top are two of those very high level objectives 19:25.240 --> 19:31.960 I said the trustable software framework dictates for particular software projects on the left hand 19:31.960 --> 19:38.360 side we have argumentation to say that all our tests or the tools we use for our tests are reproducible 19:38.360 --> 19:43.880 and we're reproducible because we're using embedded rust tool chain in build stream so we have 19:43.880 --> 19:49.880 intermediate argumentation which fundamentally we're saying we're looking at a CI job that tests 19:49.880 --> 19:54.680 the reproducibility of the image and validators looking at whether that passed or failed 19:54.680 --> 19:59.160 and it's producing the score that flows through intermediate argumentation along with other 19:59.160 --> 20:05.800 items to provide a score for this high level assertion that the build and test environments are 20:05.800 --> 20:12.360 constructed from their resources and then on the right hand side the bit that we're at the part of 20:12.360 --> 20:19.640 the graph that's concerned with the part of the graph that's concerned with the quality of the software 20:19.640 --> 20:28.440 we look at our open search data we have intermediate argumentation we feed that into an 20:28.440 --> 20:33.640 STPA analysis and risk analysis of the project to provide the final goal that we've identified 20:33.640 --> 20:39.320 a risk and that risk was the kernel might be lying to us and we've provided evidence to show 20:39.320 --> 20:43.320 actually we're happy with that risk because we've got evidence that the kernel isn't 20:43.400 --> 20:54.200 so in conclusion I'd say external measurements have provided great confidence in our analysis of 20:54.200 --> 20:59.160 the next deadline schedule and it's meant we have a deeper understanding of the kernel and a 20:59.160 --> 21:04.440 deeper understanding of the hardware we're using to make those measurements which means it's 21:04.440 --> 21:11.400 our risk analysis will therefore be more informed and you can see that all the different pieces of 21:11.400 --> 21:17.320 open source software and hardware that we've orchestrated together so that we can do complex hardware 21:17.320 --> 21:23.640 in the loop tests to prove that that an open source kernel like Linux on a multi-core system is 21:23.640 --> 21:31.640 suitable for a hard real-time applications with hard real-time constraints and industry 21:32.200 --> 21:37.320 and lastly to say that the trustable software framework point of that is that it allows you to 21:37.320 --> 21:42.920 communicate those results to the state holders and also to engineers as engineers and we're looking 21:42.920 --> 21:48.280 at the bottom of the graph well looking at those statements that say the difference was less than 21:48.280 --> 21:53.320 100 microseconds or whatever but state holders they're looking at the top of the graph in the 21:53.320 --> 21:59.720 searching that yes we've done a proper analysis of the risks in our product and so thank you all 21:59.720 --> 22:04.120 for your attention thank you for the organizers that's all I have to say today if there's any questions 22:04.200 --> 22:06.120 please 22:12.440 --> 22:17.720 John on the half of code that we'd like to thank the customer nation last year we had indicated 22:17.720 --> 22:23.320 that we were contributing and opening the project up we did unfortunately there's some infrastructure 22:23.320 --> 22:28.440 issues that we had to work through over the last 12 months that we can we can claim unfortunately 22:28.440 --> 22:32.680 it happened after the slides that submitted but on the right the right hand side you don't know 22:32.680 --> 22:36.600 if there's foundation that or you will find the trustable project fully there the full 22:36.600 --> 22:41.240 break was an out-up the changes for the last year is now available in the board the commuters 22:41.240 --> 22:44.440 coming in helping us advance the project 22:45.400 --> 22:54.120 okay that's made any questions for me or no one question yes but so you were using 22:54.120 --> 22:58.920 separate crops so you can confirm that they're not affecting each other that they're 22:58.920 --> 23:04.920 separately being measured so on yes do you have like any initial stage where you're checking to see 23:04.920 --> 23:09.720 whether or not there's significant problems between them naturally and better of you than taking 23:09.720 --> 23:13.720 that into account in your statistics we're not going to imagine it would improve the results 23:13.720 --> 23:19.880 of our progress and there's some of this is being done well so the question was about whether 23:19.880 --> 23:26.440 we can whether we can measure the drift between those clocks the clock of the pico and the 23:26.440 --> 23:32.920 clock of Linux before to improve the accuracy of the test I'd say the answer to that is that 23:32.920 --> 23:39.400 we've we've not found it necessary such I think this this external measurement is only there 23:39.400 --> 23:45.960 as a sanity check but performance these differences of on the order of magnitude of tens of 23:45.960 --> 23:52.440 microseconds are acceptable enough that we can attribute it to the hardware to the UART 23:53.240 --> 23:58.200 to the actual time well I mean we're getting quite close with a with a board rate of 23:58.200 --> 24:02.760 1,500 we're getting pretty close to actually the time it takes to the bits to get along the 24:02.760 --> 24:09.720 wire but it's something we can I believe we have got an issue open to be for example connecting 24:09.800 --> 24:16.360 these to a more accurate and precise as ever scum to be plotting to be calibrating our external 24:16.360 --> 24:23.000 measurements is such thank you very much thank you