WEBVTT 00:00.000 --> 00:15.280 So please welcome Devgang Lu, he's a software engineer, and it's first time at first 00:15.280 --> 00:20.960 them, so it's a big occasion, it's not his first time reverse engineering, but it's the first 00:20.960 --> 00:33.960 time doing something that big, so a big one of applause for him. 00:33.960 --> 00:35.960 Thank you very much. 00:35.960 --> 00:41.560 So today we will be talking about reverse engineering in the world largest music streaming 00:41.560 --> 00:49.120 platform, which you may guess is Spotify, so I want to ask if there's anyone from Spotify 00:49.120 --> 00:53.120 in the room. 00:53.120 --> 01:04.120 Okay, actually no one, so I shouldn't get arrested for doing this. 01:04.120 --> 01:10.720 So a bit on who I am, I'm the kind of software developer, I'm a big contributor for open 01:10.720 --> 01:17.720 source, I'm going from the more projects to a little bit more stuff, and also my own projects. 01:17.720 --> 01:22.720 I enjoy, I will see reverse engineering and a bit of software security. 01:22.720 --> 01:30.720 For that reason, I'm also interested in IoT stuff, I have a couple of CVs to my name and I have fun with 01:30.720 --> 01:33.720 other stuff that isn't necessarily public. 01:33.720 --> 01:40.720 I'm also a CTF player, so for those that don't know, CTFs are capture the flag competitions, 01:40.720 --> 01:42.720 which are cybersecurity related. 01:42.720 --> 01:49.720 I am a former Tmitary member, which is the team that competes in the European cybersecurity challenge. 01:49.720 --> 01:57.720 With my old team, I was an organizer for CC-2024, which is the European cybersecurity competition, 01:57.720 --> 02:04.720 and also a platform provider for ICC, which is the international cybersecurity competition. 02:04.720 --> 02:09.720 And I've also been a deaf confinalist with macaroni. 02:09.720 --> 02:13.720 Before we start, there's a big disclaimer. 02:13.720 --> 02:20.720 This talk is present early in my opinion, not that of my current previous future employer, 02:20.720 --> 02:25.720 and not that of my friends, my colleagues or people I met online. 02:25.720 --> 02:33.720 For the same reason, this talk includes only things that I learned and discovered personally, 02:34.720 --> 02:37.720 and no work from other people. 02:37.720 --> 02:42.720 The work presented in this slide is to enhance Spotify, not undermining. 02:42.720 --> 02:49.720 We don't do piracy, and I have nothing to do with another archive. 02:49.720 --> 02:55.720 This talk was decided well before that happened. 02:55.720 --> 03:00.720 And to Spotify, please don't mind me again, please don't sue me. 03:00.720 --> 03:02.720 Let's have a chat first. 03:02.720 --> 03:05.720 So now we begin. 03:05.720 --> 03:10.720 We will talk about Spotify, but we will also talk about Libre's Pot, 03:10.720 --> 03:17.720 which is the actual open-source projects that we and me and other people work on. 03:17.720 --> 03:22.720 When I talk about Libre's Pot, actually I'm talking about many projects, 03:22.720 --> 03:26.720 because there are many projects within different languages, 03:26.720 --> 03:31.720 but all of them share the fact that they are client library for Spotify, 03:31.720 --> 03:34.720 capable of playback through various decades. 03:34.720 --> 03:40.720 It is completely headless, so it does not require a desktop environment, 03:40.720 --> 03:43.720 it's entirely based or a version engineering. 03:43.720 --> 03:47.720 It is fully capable as a Spotify connect endpoint, 03:47.720 --> 03:51.720 and it's an alternative to all the time they prepared the Libs Potify, 03:51.720 --> 03:55.720 which I never got to know, even. 03:55.720 --> 04:00.720 This project is not a bookdownloader, don't use it for that, please. 04:00.720 --> 04:04.720 A way to keep our bypass heads, we do not support free accounts, 04:04.720 --> 04:09.720 it's not a wrapper around public APIs, it's not a control library, 04:09.720 --> 04:13.720 it's not an alternative, go into it or clean. 04:13.720 --> 04:17.720 So why Libre's Pot exists? 04:17.720 --> 04:22.720 You can find your own reasons, but these are some of the most common. 04:22.720 --> 04:27.720 You can turn any device into a Spotify connect endpoint. 04:27.720 --> 04:30.720 It is edless, it does not require a desktop environment, 04:30.720 --> 04:33.720 it has very low resource users, 04:33.720 --> 04:35.720 who use it for that reason. 04:35.720 --> 04:38.720 If you're into really into open source, 04:38.720 --> 04:41.720 you may want to know exactly the code that you're running, 04:41.720 --> 04:45.720 so all the Libres for projects allow you to do that. 04:45.720 --> 04:48.720 And also you may be an audio nerd, 04:48.720 --> 04:51.720 so you may want to build your own audio pipeline, 04:51.720 --> 04:55.720 your multi-room setup, your DIY doc, 04:55.720 --> 04:59.720 and streamer setup, all those kind of things. 04:59.720 --> 05:03.720 So I was saying the Libre's Pot is a family, 05:03.720 --> 05:07.720 because there are many projects, depending on the language, 05:07.720 --> 05:10.720 mainly on the support. 05:10.720 --> 05:13.720 We have Python version, we have a C version, 05:13.720 --> 05:16.720 which runs on ESP32, for example, 05:16.720 --> 05:20.720 then we have what I'm currently working on, 05:20.720 --> 05:23.720 which is a Go Libre's Pot, we can go obviously. 05:23.720 --> 05:26.720 It's an old version in Go, 05:26.720 --> 05:31.720 which is not a developer anymore. 05:31.720 --> 05:33.720 Then there's another project of mine, 05:33.720 --> 05:37.720 which was the original Libre's Pot in not the original, 05:37.720 --> 05:40.720 but my original project in Java. 05:40.720 --> 05:43.720 And then we have the original Libre's Pot, 05:43.720 --> 05:48.720 which is in Rust, and is the longest standing of them all. 05:48.720 --> 05:52.720 The Java version is actually deprecated, 05:52.720 --> 05:55.720 because, yeah. 05:55.720 --> 06:01.720 So we'll have a look into the Spotify infrastructure, 06:01.720 --> 06:06.720 a little bit, or at least what we need to get, 06:06.720 --> 06:09.720 as Spotify connect and point working. 06:09.720 --> 06:13.720 This is the state of things after 2019, 06:13.720 --> 06:16.720 because there was major change in the infrastructure. 06:16.720 --> 06:19.720 And also this is what we, 06:19.720 --> 06:21.720 as a reverse engineering, 06:21.720 --> 06:25.720 in a reverse engineering, we discovered being the infrastructure. 06:25.720 --> 06:27.720 If there was someone from Spotify, 06:27.720 --> 06:31.720 they could probably tell me I'm wrong. 06:31.720 --> 06:35.720 So piece by piece, we have the AP resolve, 06:35.720 --> 06:38.720 which is a service that returns at least 06:38.720 --> 06:41.720 of endpoints for other services. 06:41.720 --> 06:44.720 Then we have the access point, 06:44.720 --> 06:47.720 which was very used, which was the thing 06:47.720 --> 06:50.720 that was very used up until 2019. 06:50.720 --> 06:52.720 It's a custom protocol, 06:52.720 --> 06:54.720 of a TCP connection, 06:54.720 --> 06:56.720 that has some dfial monkey exchange, 06:56.720 --> 06:59.720 and then encrypts using some strange cipher, 06:59.720 --> 07:01.720 called Shannon, 07:01.720 --> 07:03.720 and transports simple data packets, 07:03.720 --> 07:05.720 which are composed of just a packet type, 07:05.720 --> 07:07.720 packet length and packet data. 07:07.720 --> 07:09.720 From this connection, 07:09.720 --> 07:11.720 we get store credentials, 07:11.720 --> 07:13.720 so we log in with the access point, 07:13.720 --> 07:16.720 and we get a set of store credentials, 07:16.720 --> 07:20.720 so we can use them to authenticate other sessions, 07:20.720 --> 07:26.720 of when you close and reopen the decline. 07:26.720 --> 07:30.720 Then we have the log in five service, 07:30.720 --> 07:34.720 which is a new authentication service 07:34.720 --> 07:38.720 that is used for the newer services in infrastructure, 07:38.720 --> 07:41.720 and we have to authenticate that using the credentials 07:41.720 --> 07:43.720 from the access point, 07:43.720 --> 07:46.720 and the most useful thing that we still need, 07:46.720 --> 07:50.720 the access point for is retrieving the encryption keys 07:50.720 --> 07:52.720 for the audio files, 07:52.720 --> 07:56.720 because Spotify serves only encrypted audio files, 07:56.720 --> 07:59.720 and you will need the key to the crypt. 07:59.720 --> 08:02.720 On to the next piece, this is new stuff. 08:02.720 --> 08:04.720 As I said, the login file provides authentication 08:04.720 --> 08:06.720 for other services. 08:06.720 --> 08:11.720 It provides a beer token to authenticate the SP client, 08:11.720 --> 08:14.720 which is just a bunch of rest APIs. 08:14.720 --> 08:18.720 The most interesting ones are those related to the school 08:18.720 --> 08:20.720 of the connect state, 08:20.720 --> 08:22.720 that makes the connect cluster. 08:22.720 --> 08:25.720 So every Spotify connect endpoint, 08:25.720 --> 08:28.720 publishes its state to the server, 08:28.720 --> 08:30.720 which builds the connect cluster 08:30.720 --> 08:34.720 and orchestrates all the endpoints. 08:34.720 --> 08:37.720 This part of the API, 08:37.720 --> 08:40.720 it's just a rest API, 08:40.720 --> 08:44.720 and Spotify uses Protobuff for all the communications 08:44.720 --> 08:46.720 on that API. 08:46.720 --> 08:51.720 Lastly, we have the event part of things, 08:51.720 --> 08:56.720 the dealer is called the web socket API is called, 08:56.720 --> 08:58.720 it is located with login five, 08:58.720 --> 09:02.720 and it uses Protobuff messages wrapped in JSON, 09:02.720 --> 09:04.720 don't ask me why. 09:05.720 --> 09:09.720 And it publishes all the events required 09:09.720 --> 09:13.720 for the client to work to the client. 09:13.720 --> 09:17.720 And also contains a special token, 09:17.720 --> 09:21.720 which is used to synchronize the dealer and the SP client, 09:21.720 --> 09:26.720 so that they work on the same stuff. 09:26.720 --> 09:28.720 Now, we're going to have a look 09:28.720 --> 09:30.720 at some of the technical challenges, 09:30.720 --> 09:33.720 involved with the reverse engineering Spotify. 09:33.720 --> 09:35.720 We'll start with the easiest one, 09:35.720 --> 09:38.720 which is intercepting HTTPS traffic. 09:38.720 --> 09:39.720 Traffic is encrypted, 09:39.720 --> 09:43.720 so there's no really way to intercept it passively. 09:43.720 --> 09:46.720 We will need to do many the middle. 09:46.720 --> 09:50.720 Luckily, Spotify doesn't do setvigate pinning, 09:50.720 --> 09:52.720 so it's even easier. 09:52.720 --> 09:56.720 We can just pull up something like a meeting proxy, 09:56.720 --> 09:59.720 get the setvigate authority, 09:59.720 --> 10:02.720 install it for other trusted certificate authority 10:03.720 --> 10:05.720 for this system, also for Chrome, 10:05.720 --> 10:09.720 because Spotify uses the Chromium and Bedet framework, 10:09.720 --> 10:12.720 which it reads DCA from Chrome. 10:12.720 --> 10:15.720 We set the proxy URL in the desktop client, 10:15.720 --> 10:20.720 and voila, we can see all the traffic unencrypted. 10:20.720 --> 10:25.720 You can see that the client holds with all the services 10:25.720 --> 10:27.720 I was telling you about, 10:27.720 --> 10:30.720 so we have for the APR solve, 10:30.720 --> 10:31.720 but then it does. 10:31.720 --> 10:33.720 Again, five authentication, 10:33.720 --> 10:35.720 it connects to the access point, 10:35.720 --> 10:38.720 it pushes its state to the server, 10:38.720 --> 10:42.720 and then connects to the editor. 10:42.720 --> 10:45.720 We can see the access point connection here, 10:45.720 --> 10:48.720 but it's not actually decrypted. 10:48.720 --> 10:49.720 If we look into it, 10:49.720 --> 10:51.720 it would be still encrypted, 10:51.720 --> 10:54.720 and we've solved this problem later. 10:54.720 --> 10:56.720 This was easy. 10:56.720 --> 10:59.720 Now we get onto the interesting stuff. 11:00.720 --> 11:06.720 So we are going to recover C++ protobuff classes in Gidra. 11:06.720 --> 11:08.720 What does that mean? 11:08.720 --> 11:10.720 C++. 11:10.720 --> 11:13.720 We don't know that protobuff and Gidra. 11:13.720 --> 11:15.720 What are those? 11:15.720 --> 11:20.720 Protobuff is a mechanism for serializing structural data. 11:20.720 --> 11:23.720 It's just like XML or JSON, 11:23.720 --> 11:25.720 but weighs molar. 11:25.720 --> 11:27.720 It's maintained by protobuff. 11:27.720 --> 11:29.720 It has a bunch of other features. 11:29.720 --> 11:34.720 But the interesting part about it is that it's very small. 11:34.720 --> 11:38.720 And for that reason, it requires code generation. 11:38.720 --> 11:42.720 So you cannot simply like to do with JSON decoded, 11:42.720 --> 11:47.720 because there's something missing in the serialized format 11:47.720 --> 11:51.720 that you have in the generated code for your messages. 11:51.720 --> 11:54.720 It's put if I was using it for a while, 11:54.720 --> 11:57.720 at least a single free spot was born. 11:57.720 --> 11:59.720 So how does protobuff works? 11:59.720 --> 12:03.720 As I said, the wire format, the serialization is entirely binary. 12:03.720 --> 12:06.720 As you can see, in the example, 12:06.720 --> 12:08.720 if we have a message called user, 12:08.720 --> 12:10.720 which has three fields, name, favorite number, 12:10.720 --> 12:13.720 and obvious, which are respectively one, two, and three. 12:13.720 --> 12:16.720 As field numbers, on the right, 12:16.720 --> 12:20.720 you see the serialized version of that message. 12:20.720 --> 12:24.720 And you see there's never mentioned of the field name. 12:24.720 --> 12:30.720 So there's never the name, favorite number, or obvious field name. 12:30.720 --> 12:35.720 You only see the field number, which is called field tag inside the slide. 12:35.720 --> 12:38.720 That allows it to be small. 12:38.720 --> 12:42.720 But the other problem is that you cannot recover the name, 12:42.720 --> 12:45.720 unless you have the message type definition. 12:45.720 --> 12:48.720 Spotify uses a lot of protobuff. 12:48.720 --> 12:52.720 So we may want to recover those. 12:52.720 --> 12:56.720 Luckily, there's a special message called the file descriptor proto, 12:56.720 --> 12:59.720 which describes a protophile. 12:59.720 --> 13:03.720 And that message is serialized, 13:03.720 --> 13:10.720 and included in the C++ source code that is generated for your messages. 13:10.720 --> 13:13.720 So you can use a tool like a protobuff toolkit. 13:13.720 --> 13:16.720 We run it against the desktop client binary, 13:16.720 --> 13:21.720 and we get 900 files and 2400 messages, which is quite a lot. 13:21.720 --> 13:26.720 And it's also quite fun, because sometimes you see stuff that hasn't been published yet, 13:26.720 --> 13:28.720 because apparently the ship, 13:28.720 --> 13:34.720 non-production protobuff definitions inside the binary. 13:34.720 --> 13:36.720 This is what it looks like. 13:36.720 --> 13:40.720 On the top left, you see the file descriptor protomessage, 13:40.720 --> 13:43.720 I've removed the stuff that we are not interesting in. 13:43.720 --> 13:46.720 In the bottom left, you see an example message, I've taken, 13:46.720 --> 13:50.720 and I will use for other examples, which is the any message, 13:50.720 --> 13:58.720 which contains only two fields, which are the type URL, and the value. 13:58.720 --> 14:03.720 On the right, you see what this serializing defile descriptor proto 14:03.720 --> 14:07.720 for the any protophile looks like. 14:07.720 --> 14:11.720 And you can see there's many similarities, 14:11.720 --> 14:16.720 and you can see how you could reconstruct the original protophile 14:16.720 --> 14:19.720 from what you see on the right. 14:19.720 --> 14:26.720 And that is what the protobuff toolkit tool I showed you does. 14:26.720 --> 14:30.720 Next piece of tooling we will need is Gidram, 14:30.720 --> 14:32.720 many of you may know it. 14:32.720 --> 14:35.720 It's a version engineering framework created, 14:35.720 --> 14:39.720 and maintained by the NSA search directorate. 14:39.720 --> 14:43.720 We will need it mainly for its compilation scripting, 14:43.720 --> 14:46.720 but it does a bunch of other things. 14:46.720 --> 14:51.720 It helps us understand what is going on inside this Spotify, 14:51.720 --> 14:54.720 desktop client, which is written in C++, 14:54.720 --> 15:00.720 and try to transform it into actual readable code for us humans. 15:00.720 --> 15:04.720 But if we do that, we get 6 million lines of code. 15:04.720 --> 15:09.720 Of course, the binary does not have the back symbols, 15:09.720 --> 15:14.720 so we get no variable names, no function names, anything like that. 15:14.720 --> 15:19.720 So it's simply 6 million of meaningless code. 15:19.720 --> 15:23.720 So how do we find what we are interested in? 15:23.720 --> 15:25.720 There are multiple ways. 15:25.720 --> 15:27.720 You can look for strings. 15:27.720 --> 15:31.720 You can do some fancy code flow analysis. 15:31.720 --> 15:35.720 What I've did and has turned out to be useful for me 15:35.720 --> 15:41.720 is figure out where those protobuff classes generated 15:41.720 --> 15:45.720 by protobuff are being used in the compile code. 15:45.720 --> 15:50.720 For that, we will make use of the fact that those classes 15:50.720 --> 15:55.720 are C++ classes, which extend the virtual 15:55.720 --> 15:59.720 global global protobuff message class. 15:59.720 --> 16:02.720 It's very important that class is virtual, 16:02.720 --> 16:04.720 because virtual classes are virtual tables, 16:04.720 --> 16:09.720 which are basically tables of addresses inside the binary, 16:09.720 --> 16:14.720 and we can use those to trace back to the constructors 16:14.720 --> 16:17.720 and the structures of set classes. 16:17.720 --> 16:19.720 We do that. 16:19.720 --> 16:22.720 We need to do it deterministically and automatically, 16:22.720 --> 16:26.720 because there are a bunch of messages inside the Spotify, 16:26.720 --> 16:29.720 so we can do it by end. 16:29.720 --> 16:31.720 So we'll have a look at an example, 16:31.720 --> 16:35.720 how we can inside the generated C++ code, 16:35.720 --> 16:40.720 trace back the file descriptor protobuff message, 16:40.720 --> 16:42.720 which I showed you earlier, 16:42.720 --> 16:46.720 and we know that works because other tools use it. 16:46.720 --> 16:51.720 How we can trace back this message back to the C++ class. 16:52.720 --> 16:55.720 To do that, we look in generated code. 16:55.720 --> 16:59.720 We see that this message is referenced inside 16:59.720 --> 17:03.720 the internal structure called descriptor table. 17:03.720 --> 17:08.720 Then we see that this is referenced in strange class, 17:08.720 --> 17:10.720 which I don't really know what it does, 17:10.720 --> 17:13.720 but it's sure that it was compilation time. 17:13.720 --> 17:16.720 Luckily for us, this class is finally, 17:16.720 --> 17:21.720 this function is finally used inside the actual C++ class. 17:21.720 --> 17:24.720 We are interested in, which is, for example, 17:24.720 --> 17:26.720 say it's any. 17:26.720 --> 17:30.720 This get-metadata method is also virtual. 17:30.720 --> 17:34.720 And as I said, virtual classes, virtual methods, 17:34.720 --> 17:37.720 and up in the V table. 17:37.720 --> 17:43.720 So we can go from the file descriptor protoback 17:43.720 --> 17:46.720 to the V table of that class, 17:46.720 --> 17:50.720 and from there we can get the constructors and the structures. 17:50.720 --> 17:53.720 Can we automate this course? 17:53.720 --> 17:55.720 All the code for the, 17:55.720 --> 17:59.720 scrolled Gita script is available on my GitHub. 17:59.720 --> 18:02.720 It's quite old, but it still works again, 18:02.720 --> 18:04.720 I guess, Spotify. 18:04.720 --> 18:09.720 It will not work with the latest version of the protobuff generator, 18:09.720 --> 18:12.720 because they changed the generated code quite a bit. 18:13.720 --> 18:15.720 The end result is this. 18:15.720 --> 18:17.720 On the left, you can see that the script 18:17.720 --> 18:22.720 has recovered the where the V tables for some messages are. 18:22.720 --> 18:25.720 I feel the just for one. 18:25.720 --> 18:28.720 You can see it has recognized where the V tables are. 18:28.720 --> 18:32.720 It has renamed the structures. 18:32.720 --> 18:37.720 And on the right, you see what the internal descriptor, 18:37.720 --> 18:41.720 protobter descriptor table structure looks like. 18:41.720 --> 18:44.720 You can see it has, for example, the file name, 18:44.720 --> 18:49.720 and the, the point animation to the descriptor. 18:49.720 --> 18:51.720 So we're done with that. 18:51.720 --> 18:53.720 We did that. 18:53.720 --> 18:55.720 Now onto the last part. 18:55.720 --> 18:59.720 I mentioned, we have the HTTPS traffic. 18:59.720 --> 19:05.720 We know what classes are being used to generate that HTTPS traffic, 19:05.720 --> 19:09.720 because the, the API uses mainly protobuff. 19:09.720 --> 19:14.720 If we see an API call, we can look into Gidra, 19:14.720 --> 19:18.720 find where the message for that call is being created, 19:18.720 --> 19:23.720 and look what the code does and try to figure out 19:23.720 --> 19:27.720 how to do the same in our code. 19:27.720 --> 19:32.720 The last piece we're missing is the access point I mentioned 19:32.720 --> 19:33.720 in the beginning. 19:33.720 --> 19:38.720 We still cannot see what traffic happens there. 19:38.720 --> 19:41.720 And there might be some interesting stuff. 19:41.720 --> 19:47.720 So now we're going to see how we can log the traffic. 19:47.720 --> 19:52.720 And for that, we will need some more tools. 19:52.720 --> 19:54.720 One of those is GDB. 19:54.720 --> 19:59.720 You may know GDB if you've written any C or C++ code. 19:59.720 --> 20:05.720 It's very, it's the standard tool for dynamic analysis. 20:05.720 --> 20:08.720 And bug finding in C and C++ programs, 20:08.720 --> 20:11.720 as far we have done only static analysis. 20:11.720 --> 20:17.720 But you may be used to using GDB with the bug symbols, 20:17.720 --> 20:22.720 because you may use it on the code you are written yourself. 20:22.720 --> 20:28.720 Here we don't have that, because Spotify doesn't ship the bug symbols, 20:28.720 --> 20:29.720 obviously. 20:29.720 --> 20:33.720 So we will use an extension to GDB called the point DBG, 20:34.720 --> 20:38.720 which facilitates a lot of the reverse engineering process. 20:38.720 --> 20:43.720 You can see the staggering difference between running GDB 20:43.720 --> 20:51.720 on an yellow word program with and without and with point DBG. 20:51.720 --> 20:54.720 There's a lot more info, and also there's a lot more commands. 20:54.720 --> 20:58.720 You can use to move faster. 20:58.720 --> 21:01.720 Another tool that we'll use is freedom. 21:02.720 --> 21:05.720 Freedize the dynamic code instrumentation toolkit. 21:05.720 --> 21:10.720 It's real cool because it works on basically any platform. 21:10.720 --> 21:20.720 And it allows you to write some JavaScript code to inject inside the process you want to look into. 21:20.720 --> 21:28.720 So what we will typically do is use GDB to manually verify what you are looking at 21:28.720 --> 21:30.720 if your assumptions are correct. 21:30.720 --> 21:34.720 And then switch to something that is more automated. 21:34.720 --> 21:39.720 You can also automate GDB, but we will do it with freedom. 21:39.720 --> 21:47.720 So you write some JavaScript code and you basically have a script for our use case that logs all the traffic. 21:47.720 --> 21:51.720 Last piece, what is the Cypher? 21:51.720 --> 21:53.720 I was talking about in the beginning. 21:53.720 --> 21:56.720 It's called the Shannon Cypher. 21:56.720 --> 21:59.720 It's part of the sober family, sober family. 21:59.720 --> 22:04.720 It's been developed by Cole Comostralia in 1997. 22:04.720 --> 22:08.720 The original implementation is available only through way back machine. 22:08.720 --> 22:12.720 I have no idea why Spotify use that in the first place. 22:12.720 --> 22:22.720 The only advantage is that it does encryption and message authentication simultaneously, which is, I guess, Andy. 22:22.720 --> 22:25.720 The reference implementation is very simple. 22:25.720 --> 22:35.720 It has a method to set the key, a method to set the nonsense, a method to encrypt, to decrypt and to finish the encryption or encryption process, 22:35.720 --> 22:43.720 and generate the message authentication code for what we have encrypted so far. 22:44.720 --> 22:53.720 How do we find those functions inside the idra because we want to log what goes through the encryption and decryption functions? 22:53.720 --> 22:58.720 Well, one very useful trick is to look for Constance. 22:58.720 --> 23:05.720 For example, this constant is used in the original source code for the Shannon Cypher. 23:05.720 --> 23:09.720 We look for the same constant inside the Spotify binary. 23:09.720 --> 23:18.720 You guys get just reads, just like in the original source code. 23:18.720 --> 23:23.720 So we are looking at the writing with some trial and error. 23:23.720 --> 23:29.720 We will figure out where the functions are defined, in the idra and the compile code. 23:29.720 --> 23:36.720 At that point, we can verify our assumption with GDB. 23:36.720 --> 23:40.720 We've done so far only static analysis by finding the function. 23:40.720 --> 23:44.720 Now we do the dynamic analysis. 23:44.720 --> 23:52.720 So for example, if we look at the Shannon Encryp function, we can set a break point where we think the function is. 23:52.720 --> 23:54.720 It has just three parameters. 23:54.720 --> 23:57.720 We are not interested in the first one. 23:57.720 --> 24:04.720 But if we look at the second one, which is the data buffer, which should contain the data that is being encrypted, 24:04.720 --> 24:12.720 that is in the RSI register, the content of RSI looks like a byte pointer. 24:12.720 --> 24:15.720 So we probably in the right place. 24:15.720 --> 24:19.720 We look at the other parameter, which is EDX. 24:19.720 --> 24:21.720 It contains a reasonable number. 24:21.720 --> 24:31.720 And if we print the content of RSI for 398 bytes, which is the content of EDX, 24:31.720 --> 24:35.720 we get something that looks like an access point packet. 24:35.720 --> 24:39.720 So we have the packet type, which is AB. 24:39.720 --> 24:41.720 We have the packet length and then the data. 24:41.720 --> 24:43.720 The packet length is exactly 395. 24:43.720 --> 24:47.720 So it's 398 minus the three initial bytes. 24:47.720 --> 24:51.720 So we are definitely in the right place. 24:51.720 --> 24:59.720 Now that we know the address of the Encryp function, we can set up freedom. 24:59.720 --> 25:05.720 I run all my tests on virtual machine, because I can restore it. 25:05.720 --> 25:09.720 I can go back and stop it and do whatever. 25:09.720 --> 25:15.720 Luckily, free data supports this use case, because it has a free server, 25:15.720 --> 25:22.720 which I can launch on the VM and then connect it from the host through TCP connection. 25:22.720 --> 25:28.720 So you can see here, for example, in the screenshot that I'm connecting to my VM and launching 25:28.720 --> 25:37.720 this spot 5 binary and getting the base address of where the binary code has been loaded. 25:37.720 --> 25:39.720 We can do better. 25:39.720 --> 25:45.720 This is still very manual, because we are into the free data replica. 25:45.720 --> 25:52.720 And so we finally write the free discrete that I was talking about all along. 25:52.720 --> 26:00.720 We get the base address, and then we use the interceptor module to look into the two functions, 26:00.720 --> 26:05.720 which are the encryption and the encryption functions. 26:05.720 --> 26:10.720 For the encryption function, we can do exactly what I just mentioned, 26:10.720 --> 26:13.720 so we just get the value of the register. 26:13.720 --> 26:21.720 We read some memory from, we read some memory for the length that is set in the register. 26:21.720 --> 26:24.720 And we just log it. 26:24.720 --> 26:31.720 For the decryption function, it's not that simple, because when we enter the decryption function, 26:31.720 --> 26:34.720 obviously the data is still encrypted. 26:34.720 --> 26:39.720 We want to look at the data when we exit from the decryption function. 26:39.720 --> 26:47.720 And for that, we'll use a built in free data function, which is on enter and on leave. 26:47.720 --> 26:50.720 So we save the value of the register when we enter. 26:50.720 --> 26:54.720 And then when we leave, we will dump the content of the memory. 26:54.720 --> 26:57.720 And we'll have the decrypted content. 26:57.720 --> 27:00.720 This is what the discrete looks like. 27:00.720 --> 27:07.720 So we see that the client sends the AB packet, which is the login packet. 27:07.720 --> 27:12.720 Then it receives an AP welcome, an access point, welcome packet, which 27:12.720 --> 27:16.720 significates that the the login was successful. 27:16.720 --> 27:21.720 Then it receives a counter code packet, a product, a team for a packet. 27:21.720 --> 27:27.720 And then it sends a mercury request packet, which is another thing. 27:27.720 --> 27:38.720 The mercury protocol is something that they are using to build essentially HTTP over their own custom protocol, 27:38.720 --> 27:41.720 which no idea what they did in the first place. 27:41.720 --> 27:53.720 Then we receive a mercury event, and we receive another mercury request packet, which is actually the response to the original mercury request packet. 27:53.720 --> 27:57.720 With that, we are done with the technical challenges. 27:57.720 --> 28:04.720 Actually, there are many more technical challenges, but those are, I think, the most interesting ones. 28:04.720 --> 28:07.720 And this was the fun part. 28:07.720 --> 28:16.720 This is, for me, at least, the fun part of reverse engineering, the fun part of maintaining such a project. 28:16.720 --> 28:23.720 But also come the legal challenges involved with maintaining reverse engineering projects. 28:23.720 --> 28:31.720 Because if I suppose many of you have an open source project, and you probably don't have to deal with legal problems, 28:31.720 --> 28:36.720 because it's your own code, it's stuff you have written yourself. 28:36.720 --> 28:44.720 Now, no one can tell you that you cannot have that code. 28:44.720 --> 28:51.720 That is not true for Spotify and in general, reverse engineering projects. 28:51.720 --> 29:00.720 So what happens is that we have a constant problem, which is not making Spotify angry. 29:00.720 --> 29:09.720 Because if we do, they have the legal power, the legal strength to take us down, and we don't. 29:09.720 --> 29:24.720 We, as personal contributors to the open source projects, don't have the power or the willingness of answerback is Spotify was ever to, and they did times. 29:24.720 --> 29:34.720 We don't have to, the power to answer back, and so we have to comply essentially. 29:34.720 --> 29:45.720 I said they did, but fire as many, the MCAs on GitHub, it has emailed many people, even me and other maintainers about things. 29:45.720 --> 29:55.720 I've not ever received an actual legal, legal inquiry from them, but some, for example, have. 29:55.720 --> 30:08.720 For this reason, we will not implement many features that some are either requests, heavily requested, and others are kind of essential and said not to have. 30:08.720 --> 30:18.720 Those are listener reporting, which we will talk about losses playback, which is a recent one, and adds playback and support for free accounts. 30:18.720 --> 30:37.720 So what is listener reporting? When you use Spotify, all your listens are being reported to the server, that is to build your recently listened pool of tracks, that is also to influence the algorithm. 30:37.720 --> 30:44.720 And it is also to credit the artists. 30:44.720 --> 30:50.720 One downside of using the LibreSource projects is that this does not happen. 30:50.720 --> 30:59.720 So your listens will not be accounted for, and effectively artists you listen to will not be credited. 30:59.720 --> 31:05.720 That is very sad, but we really can do it. 31:05.720 --> 31:16.720 Not because of the reverse engineering process, but because they rely on an entirely different system for logging playback. 31:16.720 --> 31:26.720 So it does anything to do with the connect state, it does not have anything to do with downloading from the CDN. 31:26.720 --> 31:37.720 It's purely an entirely separate system, and reverse engineering that and making it public. 31:37.720 --> 31:47.720 To bring a lot of risks to the project, because that is something that many people look into to create so-called the list and bots. 31:47.720 --> 32:02.720 So these are services that you can buy as an artist, and get not a free, but a paid boost to your tracks, and essentially get into the algorithm. 32:02.720 --> 32:06.720 This is something that Spotify tries to combat. 32:06.720 --> 32:20.720 If we make that code open source, then suddenly there will be a lot more of those list and bots, and Spotify will come to us and say, what the fact don't do that. 32:20.720 --> 32:24.720 And that is not something that we want to happen. 32:24.720 --> 32:35.720 Another one is Loser's payback. This is a recent one because Spotify has launched support for flag files quite recently, a couple of months ago. 32:35.720 --> 32:56.720 After two or three years that the protobuff for flag playback started appearing, they actually published it, which is the thing I was talking about that in the protobuff files you see things before they are released. 32:56.720 --> 33:05.720 They joined the Game of Ice 5 streaming providers, because the quality they usually serve is quite low. 33:05.720 --> 33:24.720 And for that reason, Loser's payback became very quickly, an heavily requested feature, because many has had in the beginning have DIY setups, where they want to get the best out of their DIY audio pipeline, 33:24.720 --> 33:35.720 or something like that, or like me, which I'm not an audio nerd, I want to get the best out of my extensive subscription. 33:35.720 --> 33:53.720 I pay for it, I want to use all the features essentially, but sadly we have received quite explicitly a message from Spotify that told us don't do that. 33:53.720 --> 34:06.720 If you keep going, you will be in trouble, because those flag files are protected by a new DRM, which we will call stop stop. 34:06.720 --> 34:13.720 It's not actually called stop stop, but it's kind of a meme. You can probably guess what it's real name is. 34:13.720 --> 34:23.720 And for context, other eye-ficing providers do not have DRMs to protect the eye quality flag files. 34:23.720 --> 34:31.720 So what is this DRM, which is the big problem of supporting flag files? 34:31.720 --> 34:41.720 I said it in the beginning, right now, and originally we can just get the encryption keys for the audio files from the access point. 34:41.720 --> 34:50.720 So there's some back-end service that returns the key, and it returns the key as is. So you can take the key, the key, the data, everything is fine. 34:50.720 --> 35:06.720 But recently, recently as in the past year, but also recently as in the Anas Damp thing, they started cracking down on the usage of this API, first for free accounts. 35:06.720 --> 35:16.720 So free accounts started to become heavily limited in what they could do through this API, all the API. 35:17.720 --> 35:33.720 The fun thing is that they started killing their own products. So many Spotify partners were essentially broken. You could not use them anymore, and that is still the case for many accounts. 35:33.720 --> 35:50.720 There's all, there's an old blog post, not really a blog post, but a forum post with people that are really, really angry because they cannot use their iFi streamers for some reason. 35:50.720 --> 36:06.720 And Spotify doesn't really seem to care, honestly. That also breaks some of the labor spot users because they are targeting these old API use for the encryption keys. 36:06.720 --> 36:23.720 The new DRM, it's something entirely different. I call it the new DRM, but essentially there was no DRM before. The new DRM does not serve the decryption key as is, but it serves an obfuscated decryption key. 36:23.720 --> 36:38.720 And you need the to the obfuscated, please stay forward, and luckily the defuscation code contains some constants and some procedures that they can claim for intellectual property infringement. 36:38.720 --> 36:50.720 So we cannot include that code in our public repositories, or they will finally have a reason to take us down. 36:50.720 --> 37:10.720 Last thing is ads playback and free accounts. This is another good choice to not support those for many reasons. One of them is that, but if I doesn't care as long as you touch them in the pockets. 37:10.720 --> 37:29.720 Supporting free accounts would mean that potentially you are stealing revenue. Even if we implement all the limitations that come with free accounts being ads and you cannot listen to your playlists in order, but you have to shuffle it. 37:29.720 --> 37:44.720 Even if we did that correctly, they would still have a lever to say, well, yeah, correctly, but there was one thing wrong, so it's all broken, you have stolen some revenue. 37:44.720 --> 37:59.720 Then it's not hard to reverse engineer, but they tried to hide it, so it's essentially harder than the other stuff, so we would have to waste a little bit more time on it. 37:59.720 --> 38:13.720 The logic changes frequently, depending on the business side of things, and this is what mothers want to know about. 38:13.720 --> 38:42.720 For advanced or simple mods to the Spotify Android app, starting to support those kinds of things would give mothers an insight on how things work, and then we would get contribution from mothers, which is people that Spotify tries to combat nothing wrong with them, but Spotify surely tries to combat mothers. 38:42.720 --> 38:49.720 And we not want to proceed with them, like you. 38:49.720 --> 39:09.720 So, our got temporarily banned from Spotify, that was meant to happen, honestly. On the 29th of October, we created a private but not so private discourse over with some of the other contributors to work on stop stop. 39:09.720 --> 39:16.720 The third on November, we already had a working implementation of stop stop. 39:16.720 --> 39:23.720 So, on the fourth of November, let's go, we have the implementation, let's do this. 39:23.720 --> 39:38.720 So, in GoLiversPot, I first implemented a flag decoder, which I did not have before, with one pull request, and with another pull request, I implemented support for stop stop. 39:38.720 --> 39:58.720 That did not include any of the updated code, the application code. That only included the API calls required to get the application key, and then you would have to provide your own the application code to plug into the project at compile time. 39:58.720 --> 40:02.720 So, that you could effectively use it. 40:02.720 --> 40:12.720 And I started using, because the code was not public, the code was running on my machine. I was happy with it, the flag files were working flawlessly. 40:12.720 --> 40:18.720 I could not hear the difference, but that doesn't matter. I wanted to use it anyway. 40:19.720 --> 40:30.720 The day after, the fifth of November, we receive, we as the main containers of the LibbersPot projects, we receive an email from Spotify. 40:30.720 --> 40:37.720 Actually, a support ticket they opened towards us, which is quite funny for another reason. 40:37.720 --> 40:59.720 Saying that they have seen what we are doing with the DRM, and we should stop doing it, because they needed to preserve the integrity of their platform, the shareholders value and stuff like that. 40:59.720 --> 41:11.720 So, essentially, we actually stopped talking about it, some containers deleted, some public information, but it all ended there. 41:11.720 --> 41:19.720 Nothing in that email said that I could not use it myself, so I kept using. 41:19.720 --> 41:31.720 The fifth of November, or the 15th of November, I recap, I am logged out of all my devices, and I quickly understood that I was banned. 41:31.720 --> 41:45.720 Luckily, I could appeal to my suspension, and just 48 hours later, I got my account back, but I have not used stopstop since. 41:45.720 --> 41:57.720 How to interpret what happened, most likely, it was a warning. I'm pretty sure it was not an automated system, because, yeah. 41:57.720 --> 42:15.720 And that was what we think was a warning towards me, which I was the only one as far as I know, that was using stopstop on my setup and to stop doing it. 42:15.720 --> 42:33.720 As I said, the fact that they opened a support ticket towards us was quite fun, because when we tried to email them back, when we tried to email them back, they never responded to us. 42:33.720 --> 42:39.720 It's not the first time they write us an email, but they never respond when we asked for clarification. 42:39.720 --> 42:53.720 But some days after, when they apparently closed the ticket on their side, we received the original title, the ticket, which was a time sensitive email to send from supportatspotify.com. 42:53.720 --> 42:59.720 And we were asked to rate our interaction with the customer service. 42:59.720 --> 43:25.720 So, to close, how can you help? Of course, with all of the open source projects, if they not try, use it, maybe it's not for you, but use it, Spotify will know if you use it, that doesn't mean you will get banned. 43:25.720 --> 43:39.720 There's many people across the world that are using deliberate spot clients and no one ever reported to have been banned, but Spotify will know, because we do not try to hide. 43:39.720 --> 43:44.120 We explicitly say that we are, we are, we are, 43:44.120 --> 43:47.560 go lieber spot, we are, lieber spot rust and stuff like that. 43:47.560 --> 43:51.000 So that they can, for example, filter us out from, 43:51.000 --> 43:53.320 from their analytics. 43:53.320 --> 43:57.720 Contribute, bug reports, feature requests are always welcome, 43:57.720 --> 44:01.720 which are our best to make those things happen. 44:01.720 --> 44:04.840 But you may want to make things happen on your own. 44:04.840 --> 44:09.360 So write some code, fix a bug, implement some new feature. 44:09.360 --> 44:14.880 If you work as Spotify, as no one here apparently, 44:14.880 --> 44:18.320 get in touch with us, with the lieber spot maintainers, 44:18.320 --> 44:20.240 you already have our emails. 44:20.240 --> 44:24.160 So you can, you can clearly get in touch. 44:24.160 --> 44:28.720 We know you can, you just don't want, and please don't buy me again, 44:28.720 --> 44:31.120 because I actually use it. 44:31.120 --> 44:32.120 Thank you. 44:32.120 --> 44:45.240 Thank you very much for a very interesting presentation, 44:45.240 --> 44:46.600 which is obviously very popular. 44:46.600 --> 44:49.960 We literally have three minutes for questions, 44:49.960 --> 44:51.960 so you're happy to take them. 44:51.960 --> 44:54.120 Yep, okay. 44:54.120 --> 44:55.720 Questions? 45:02.760 --> 45:09.400 Yes, I'm a little bit concerned about the reporting back 45:09.400 --> 45:12.600 and the artists not getting revenue for it. 45:12.600 --> 45:16.760 Is there any changes or communications? 45:16.760 --> 45:20.120 Because I want to support your artist, not support Spotify. 45:20.120 --> 45:24.360 The limitation is getting the code public. 45:24.360 --> 45:25.400 I have it. 45:25.400 --> 45:27.000 I have the code that does that. 45:27.000 --> 45:29.720 I use it, and it works. 45:29.720 --> 45:33.480 The problem with that, I can share it with the people I know. 45:33.480 --> 45:36.920 I know personally, I have contributed on the project, 45:36.920 --> 45:39.880 and I trust, but I cannot share it publicly, 45:39.880 --> 45:43.640 and I cannot share it with people I don't know. 45:43.640 --> 45:48.360 That's the main problem, because I would get in trouble, essentially. 45:49.400 --> 45:51.400 So yeah, that's the downside. 45:51.400 --> 45:57.640 And that's the reason why we continuously try to get in touch with Spotify 45:57.720 --> 46:03.080 and tell them, give us a way to do things correctly 46:03.080 --> 46:04.840 and not with, like, right now. 46:07.960 --> 46:12.600 I have a question, yeah. 46:12.600 --> 46:17.400 I wonder, how did you know that encryption is shanan? 46:21.880 --> 46:27.000 You can reverse engineer some of the original code 46:27.000 --> 46:31.000 from way back in time, which is a lot simpler, 46:31.000 --> 46:33.320 because it's a lot less code. 46:33.320 --> 46:38.200 And what you do, essentially, when you reverse engineer cryptographic stuff, 46:38.200 --> 46:40.760 is that you look for constants. 46:40.760 --> 46:43.320 So you find some magic numbers in the code. 46:43.320 --> 46:46.200 You Google it, and you find what cipher it is, 46:46.200 --> 46:49.720 and that essentially what led us to understand that it was shanan. 46:53.160 --> 46:56.120 We have one more question here, but are you willing to take more questions 46:56.120 --> 46:57.080 outside? 46:57.080 --> 46:57.880 Yeah, yeah, of course. 46:57.880 --> 47:01.800 Yeah, so one more question here, and then if just outside, 47:01.800 --> 47:04.440 you're kindly carry on the conversation here again. 47:04.440 --> 47:07.560 Hi, have you looked into Deezer? 47:08.760 --> 47:10.200 Oh, no, no, no, no, no, no. 47:10.200 --> 47:11.400 I only use Spotify.