WEBVTT 00:00.000 --> 00:02.000 Thank you. 00:02.000 --> 00:04.000 Thank you. 00:04.000 --> 00:06.000 Thank you. 00:06.000 --> 00:10.000 Thank you. 00:10.000 --> 00:12.000 Thank you. 00:12.000 --> 00:13.000 All right. 00:13.000 --> 00:15.000 So, hi everyone. 00:15.000 --> 00:19.000 Hi everyone and welcome to first 00:19.000 --> 00:20.000 the 2010-06. 00:20.000 --> 00:23.000 My name is Sun and I'm from Huyen Falls. 00:23.000 --> 00:25.000 And today I'm going to talk about my 00:25.000 --> 00:28.000 subject, which is 90-moderate support in 00:28.000 --> 00:30.000 La La. 00:30.000 --> 00:32.000 So, my plan for this talk will be 00:32.000 --> 00:35.000 I will quickly have a self-introduction 00:35.000 --> 00:38.000 and then click both onto the 00:38.000 --> 00:40.000 history of a 90-moderate support. 00:40.000 --> 00:42.000 And then I'll talk about my work, 00:42.000 --> 00:44.000 some future direction. 00:44.000 --> 00:47.000 And lastly, there will be some this for you 00:47.000 --> 00:49.000 if you want to contribute to the project. 00:49.000 --> 00:51.000 And I will try to resolve 00:51.000 --> 00:54.000 like maybe five minutes for you 00:54.000 --> 00:56.000 to have some question. 00:56.000 --> 00:58.000 So, I hope we will have time for 00:58.000 --> 00:59.000 question. 00:59.000 --> 01:00.000 So, let's start. 01:00.000 --> 01:03.000 My name is Sun Sun and I'm 01:03.000 --> 01:05.000 I'm a software engineer at Huyen Falls. 01:05.000 --> 01:07.000 I'm one of the core maintainer 01:07.000 --> 01:09.000 of La Mado CDP. 01:09.000 --> 01:11.000 And you can see my work side, 01:11.000 --> 01:13.000 my sheet hard profile here. 01:13.000 --> 01:15.000 My slogan is that I'm doing a 01:15.000 --> 01:17.000 for fun, not for both of it. 01:17.000 --> 01:19.000 And my fun, I mean the science, 01:20.000 --> 01:22.000 the high machine learning and AI. 01:22.000 --> 01:25.000 And if you are curious about my work, 01:25.000 --> 01:27.000 you can visit my sheet hard profile 01:27.000 --> 01:30.000 I do a bunch of work on La Mado CDP. 01:30.000 --> 01:33.000 So, let's move on to this subject. 01:33.000 --> 01:36.000 So, what gross is that 01:36.000 --> 01:39.000 of multi-moderate support in La Mado CDP? 01:39.000 --> 01:42.000 So, from the very beginning of the project 01:42.000 --> 01:45.000 in 2020, we have initial support 01:46.000 --> 01:48.000 for a model school lover. 01:48.000 --> 01:51.000 And then, we have also had 01:51.000 --> 01:53.000 the support in La Mado CDP. 01:53.000 --> 01:54.000 It was nice. 01:54.000 --> 01:56.000 And then, what happened is, 01:56.000 --> 01:57.000 unfortunately, 01:57.000 --> 01:58.000 we actually should remove it 01:58.000 --> 02:00.000 because it's just too ugly. 02:00.000 --> 02:02.000 So, in the meantime, 02:02.000 --> 02:04.000 there was also new model, 02:04.000 --> 02:05.000 because back then, 02:05.000 --> 02:07.000 the multi-moderate and vision, 02:07.000 --> 02:08.000 especially with the model, 02:08.000 --> 02:10.000 was quite a hot thing, 02:10.000 --> 02:12.000 made me for NOE was a thing. 02:12.000 --> 02:14.000 And then, 02:15.000 --> 02:16.000 so, 02:16.000 --> 02:18.000 the US user experience 02:18.000 --> 02:19.000 was not very good, 02:19.000 --> 02:21.000 because for each new model, 02:21.000 --> 02:23.000 you had to use 02:23.000 --> 02:25.000 its own 02:25.000 --> 02:27.000 called common-line device. 02:27.000 --> 02:30.000 So, in my last year, 02:30.000 --> 02:31.000 I tried to work on this, 02:31.000 --> 02:32.000 I tried to, 02:32.000 --> 02:33.000 as a same time, 02:33.000 --> 02:35.000 improve the user experience 02:35.000 --> 02:36.000 and deliver the experience 02:36.000 --> 02:38.000 by introducing a new thing. 02:38.000 --> 02:41.000 I call this belief empty empty. 02:41.000 --> 02:43.000 And that way, 02:43.000 --> 02:46.000 I was able to bring back 02:46.000 --> 02:47.000 multi-moderate support 02:47.000 --> 02:50.000 to La Mado CDP and La Mado CDP. 02:50.000 --> 02:54.000 So, before my work on 02:54.000 --> 02:55.000 new empty empty, 02:55.000 --> 02:57.000 what is localized 02:57.000 --> 02:58.000 is that we had 02:58.000 --> 03:00.000 kind of this clear 03:00.000 --> 03:02.000 big things in inside La Mado CDP. 03:02.000 --> 03:04.000 We have the new La Mado CDP, 03:04.000 --> 03:05.000 which is the main library. 03:05.000 --> 03:06.000 And then, we had 03:06.000 --> 03:07.000 Clip that CDP, 03:07.000 --> 03:10.000 which is an implementation 03:10.000 --> 03:12.000 of vision transformer. 03:12.000 --> 03:14.000 And then, root La Mado CDP, 03:14.000 --> 03:16.000 it was something specifically 03:16.000 --> 03:18.000 for the La Mado model. 03:18.000 --> 03:19.000 And then, people, 03:19.000 --> 03:21.000 want to enter 03:21.000 --> 03:22.000 a new model, 03:22.000 --> 03:24.000 they tried to hack this library 03:24.000 --> 03:26.000 to add their new model inside. 03:26.000 --> 03:28.000 So, it's not very nice, 03:28.000 --> 03:30.000 it was very costly 03:30.000 --> 03:32.000 to maintain that much 03:32.000 --> 03:33.000 infrastructure. 03:33.000 --> 03:37.000 So, let's go to the empty empty empty. 03:37.000 --> 03:39.000 What is empty empty is 03:39.000 --> 03:40.000 not a model? 03:40.000 --> 03:42.000 So, initially, I propose this 03:42.000 --> 03:44.000 as LITLAVA2. 03:44.000 --> 03:46.000 Then, I realize 03:46.000 --> 03:49.000 I cannot use a model 03:49.000 --> 03:51.000 that is two models, 03:51.000 --> 03:52.000 specific names. 03:52.000 --> 03:53.000 So, let's get rid of that 03:53.000 --> 03:54.000 ring already. 03:54.000 --> 03:56.000 So, the idea is that 03:56.000 --> 03:57.000 now, I will abstract everything 03:57.000 --> 03:59.000 and, like, 03:59.000 --> 04:00.000 and that's where 04:00.000 --> 04:02.000 it is, I want 04:02.000 --> 04:04.000 one LITLABORIC. 04:04.000 --> 04:07.000 So, the core architecture is 04:07.000 --> 04:08.000 that used to have 04:08.000 --> 04:10.000 a clip of CDP, which is 04:10.000 --> 04:11.000 with which content 04:11.000 --> 04:13.000 on the vision model. 04:13.000 --> 04:16.000 So, vision transformer. 04:16.000 --> 04:18.000 And then, on the, 04:18.000 --> 04:19.000 let's say, 04:19.000 --> 04:20.000 pre-processing stop is 04:20.000 --> 04:22.000 doing by 04:22.000 --> 04:23.000 another sub-moder. 04:23.000 --> 04:25.000 So, it's almost, 04:25.000 --> 04:26.000 I will say, 04:26.000 --> 04:27.000 it's almost transparent 04:27.000 --> 04:29.000 to the end user 04:29.000 --> 04:31.000 and the end developer. 04:31.000 --> 04:34.000 So, put more 04:34.000 --> 04:35.000 than just 04:35.000 --> 04:37.000 encapsulates the core library. 04:37.000 --> 04:39.000 So, it's not 04:39.000 --> 04:40.000 that CDP. 04:40.000 --> 04:41.000 But, I also end 04:41.000 --> 04:42.000 to bring 04:42.000 --> 04:44.000 truly multi-moder 04:44.000 --> 04:45.000 support, because we 04:45.000 --> 04:46.000 start with vision. 04:46.000 --> 04:47.000 And now, we have 04:47.000 --> 04:49.000 audio, and then also 04:49.000 --> 04:51.000 video, which is just 04:51.000 --> 04:52.000 image and audio 04:52.000 --> 04:53.000 other things, other things 04:53.000 --> 04:54.000 and by the way. 04:54.000 --> 04:56.000 And so, along the way, 04:56.000 --> 04:57.000 I also provide 04:57.000 --> 04:58.000 sub-moder for 04:58.000 --> 05:00.000 audio input 05:00.000 --> 05:02.000 or audio pre-processing. 05:02.000 --> 05:03.000 And I also end 05:03.000 --> 05:04.000 it to be 05:04.000 --> 05:05.000 very extensible, 05:05.000 --> 05:06.000 because it's now 05:06.000 --> 05:07.000 being more 05:07.000 --> 05:08.000 more 05:08.000 --> 05:09.000 more 05:09.000 --> 05:10.000 optimistic. 05:10.000 --> 05:11.000 So, here, 05:11.000 --> 05:12.000 here is 05:12.000 --> 05:13.000 an example of 05:13.000 --> 05:14.000 one of the 05:14.000 --> 05:15.000 IP icon 05:15.000 --> 05:16.000 that I add to the project. 05:16.000 --> 05:17.000 So, it's going 05:17.000 --> 05:18.000 empty and be 05:18.000 --> 05:19.000 tokenized. 05:19.000 --> 05:20.000 And the way it's 05:20.000 --> 05:21.000 one is that you can see 05:21.000 --> 05:23.000 I had 05:23.000 --> 05:24.000 an input text. 05:24.000 --> 05:25.000 So, inside 05:25.000 --> 05:26.000 input text, 05:26.000 --> 05:27.000 you can specify 05:27.000 --> 05:28.000 this is 05:28.000 --> 05:29.000 the media, and then you 05:29.000 --> 05:30.000 can head text. 05:30.000 --> 05:31.000 And while 05:31.000 --> 05:32.000 this is 05:32.000 --> 05:33.000 marker. 05:33.000 --> 05:34.000 And then, 05:35.000 --> 05:36.000 we can 05:36.000 --> 05:37.000 knit 05:37.000 --> 05:38.000 knots. 05:38.000 --> 05:39.000 So, you can enter 05:39.000 --> 05:41.000 a bit 05:41.000 --> 05:42.000 map, which is 05:42.000 --> 05:44.000 an image on 05:44.000 --> 05:45.000 audio, or maybe 05:45.000 --> 05:47.000 something else in the future. 05:47.000 --> 05:48.000 And it will be 05:48.000 --> 05:49.000 replaced to the 05:49.000 --> 05:50.000 exact 05:50.000 --> 05:51.000 marker. 05:51.000 --> 05:53.000 So, let's see 05:53.000 --> 05:54.000 a little 05:54.000 --> 05:55.000 little 05:55.000 --> 05:56.000 more 05:56.000 --> 05:57.000 here. 05:57.000 --> 05:58.000 So, I'm not going 05:58.000 --> 05:59.000 to 05:59.000 --> 06:00.000 the 06:00.000 --> 06:01.000 developer 06:01.000 --> 06:03.000 and I will only show you 06:03.000 --> 06:04.000 the 06:04.000 --> 06:05.000 view 06:05.000 --> 06:06.000 the experience here. 06:06.000 --> 06:07.000 So, let's start 06:07.000 --> 06:08.000 with 06:08.000 --> 06:09.000 Lamassianai. 06:09.000 --> 06:10.000 And I'm going to 06:10.000 --> 06:11.000 try 06:11.000 --> 06:12.000 model is 06:12.000 --> 06:13.000 congenitary 06:13.000 --> 06:14.000 four billion 06:14.000 --> 06:15.000 per 06:15.000 --> 06:16.000 mirror. 06:16.000 --> 06:17.000 So, right now 06:17.000 --> 06:18.000 I'm using 06:18.000 --> 06:19.000 the 06:19.000 --> 06:20.000 CLI, the 06:20.000 --> 06:21.000 common light 06:21.000 --> 06:22.000 in the face, and the 06:22.000 --> 06:24.000 high 06:24.000 --> 06:26.000 common light 06:26.000 --> 06:27.000 is the 06:27.000 --> 06:29.000 Z-Lama server. 06:29.000 --> 06:30.000 So, when I 06:30.000 --> 06:31.000 touch an image, 06:31.000 --> 06:33.000 let's 06:33.000 --> 06:34.000 try this image. 06:34.000 --> 06:35.000 Then I ask 06:35.000 --> 06:36.000 what it is. 06:36.000 --> 06:37.000 What is this? 06:37.000 --> 06:39.000 What is that? 06:39.000 --> 06:43.000 Yes, 06:43.000 --> 06:44.000 we don't 06:44.000 --> 06:45.000 speak 06:45.000 --> 06:46.000 more paper. 06:46.000 --> 06:47.000 This one, by the way. 06:47.000 --> 06:48.000 Yeah. 06:48.000 --> 06:49.000 All right. 06:49.000 --> 06:51.000 So, how did I go back? 06:51.000 --> 06:52.000 Yes. 06:52.000 --> 06:53.000 So, that's a 06:53.000 --> 06:54.000 little 06:54.000 --> 06:55.000 quick 06:55.000 --> 06:57.000 the 06:57.000 --> 06:59.000 on the Lama 06:59.000 --> 07:00.000 CDP. 07:00.000 --> 07:01.000 What do you 07:01.000 --> 07:02.000 have? 07:02.000 --> 07:04.000 What in the face of that? 07:04.000 --> 07:05.000 So, let's talk about 07:05.000 --> 07:06.000 future 07:06.000 --> 07:07.000 directions 07:07.000 --> 07:08.000 I plan. 07:08.000 --> 07:09.000 So, so far, 07:09.000 --> 07:10.000 I have 07:10.000 --> 07:12.000 planned to support 07:12.000 --> 07:13.000 these two 07:13.000 --> 07:14.000 big things. 07:14.000 --> 07:15.000 Let's 07:15.000 --> 07:16.000 dive into the 07:16.000 --> 07:17.000 the first thing. 07:17.000 --> 07:18.000 So, not 07:18.000 --> 07:19.000 a model 07:19.000 --> 07:20.000 output. 07:20.000 --> 07:21.000 So, recently, we have 07:21.000 --> 07:22.000 a lot of 07:22.000 --> 07:23.000 image, 07:23.000 --> 07:24.000 generation model, 07:24.000 --> 07:25.000 like 07:25.000 --> 07:26.000 something 07:26.000 --> 07:27.000 image, I don't remember 07:27.000 --> 07:28.000 yet. 07:28.000 --> 07:29.000 But, 07:29.000 --> 07:30.000 the main idea 07:30.000 --> 07:31.000 how is what 07:31.000 --> 07:32.000 is that you have to 07:32.000 --> 07:33.000 meet Lama, 07:33.000 --> 07:34.000 which is the main 07:34.000 --> 07:35.000 library. 07:35.000 --> 07:36.000 That will produce 07:36.000 --> 07:37.000 some 07:37.000 --> 07:38.000 embedding output, 07:38.000 --> 07:39.000 and then we have to 07:39.000 --> 07:41.000 decode that 07:41.000 --> 07:42.000 embedding output 07:42.000 --> 07:44.000 into your 07:44.000 --> 07:46.000 audio or image. 07:46.000 --> 07:48.000 So, the 07:48.000 --> 07:49.000 overall idea looks like this. 07:49.000 --> 07:50.000 It's a little bit complicated 07:50.000 --> 07:51.000 why 07:51.000 --> 07:52.000 because some model 07:52.000 --> 07:53.000 right now, some model 07:54.000 --> 07:55.000 generation 07:55.000 --> 07:56.000 effects and 07:56.000 --> 07:57.000 multimodal 07:57.000 --> 07:58.000 interlip. 07:58.000 --> 07:59.000 So, that's 07:59.000 --> 08:00.000 relatively 08:00.000 --> 08:01.000 complicated 08:01.000 --> 08:02.000 because you can 08:02.000 --> 08:03.000 imagine that 08:03.000 --> 08:04.000 the model actually 08:04.000 --> 08:05.000 firstly 08:05.000 --> 08:06.000 generates some effects 08:06.000 --> 08:07.000 saying that 08:07.000 --> 08:08.000 here's the image 08:08.000 --> 08:09.000 that I want you 08:09.000 --> 08:10.000 generated for you. 08:10.000 --> 08:11.000 Or maybe 08:11.000 --> 08:12.000 it 08:12.000 --> 08:13.000 do some 08:13.000 --> 08:14.000 reasoning steps 08:14.000 --> 08:15.000 and then it makes 08:15.000 --> 08:16.000 some 08:16.000 --> 08:17.000 kind of tokens like 08:17.000 --> 08:18.000 touch generation token. 08:18.000 --> 08:19.000 And then when 08:19.000 --> 08:20.000 I get this token, 08:20.000 --> 08:21.000 I had to switch 08:21.000 --> 08:22.000 into using 08:22.000 --> 08:23.000 the lead empty empty 08:23.000 --> 08:24.000 to generate the 08:24.000 --> 08:25.000 image, for example. 08:25.000 --> 08:26.000 And then 08:26.000 --> 08:27.000 at some point 08:27.000 --> 08:28.000 there will be 08:28.000 --> 08:29.000 another token 08:29.000 --> 08:30.000 that says 08:30.000 --> 08:31.000 stop generation 08:31.000 --> 08:32.000 and I had to switch 08:32.000 --> 08:33.000 like. 08:33.000 --> 08:34.000 So, that is one 08:34.000 --> 08:35.000 of the complicated part 08:35.000 --> 08:36.000 on this 08:36.000 --> 08:37.000 system. 08:37.000 --> 08:38.000 The 08:38.000 --> 08:40.000 second complicated part 08:40.000 --> 08:41.000 is the 08:41.000 --> 08:42.000 actual 08:42.000 --> 08:43.000 implementation 08:43.000 --> 08:44.000 under the hood. 08:44.000 --> 08:45.000 So, for audio 08:45.000 --> 08:47.000 decoder, 08:47.000 --> 08:48.000 there is 08:48.000 --> 08:49.000 multiple choice here. 08:49.000 --> 08:50.000 We can use 08:50.000 --> 08:51.000 a transformer based 08:51.000 --> 08:52.000 which is 08:52.000 --> 08:53.000 a little bit higher 08:53.000 --> 08:54.000 or we can use 08:54.000 --> 08:55.000 a division 08:55.000 --> 08:57.000 by a 08:57.000 --> 08:58.000 which is much higher. 08:58.000 --> 09:00.000 And for vision 09:00.000 --> 09:02.000 decoder, so far 09:02.000 --> 09:03.000 I only know about 09:03.000 --> 09:05.000 diffusion based model 09:05.000 --> 09:06.000 and we also have 09:06.000 --> 09:07.000 something 09:07.000 --> 09:08.000 on diffusion 09:08.000 --> 09:09.000 dot CPP 09:09.000 --> 09:10.000 that is 09:10.000 --> 09:11.000 unfortunately 09:11.000 --> 09:12.000 it's not that easy 09:12.000 --> 09:13.000 to interact 09:13.000 --> 09:14.000 in lambda CPP. 09:14.000 --> 09:15.000 It's not 09:15.000 --> 09:16.000 like 09:16.000 --> 09:18.000 I can just 09:18.000 --> 09:20.000 copy 09:20.000 --> 09:21.000 the code 09:21.000 --> 09:22.000 there's 09:22.000 --> 09:23.000 not how it works 09:23.000 --> 09:24.000 unfortunately. 09:24.000 --> 09:25.000 So, yes 09:25.000 --> 09:26.000 the 09:26.000 --> 09:27.000 initial generation 09:27.000 --> 09:28.000 will be a long way 09:28.000 --> 09:29.000 to go 09:29.000 --> 09:30.000 I think. 09:30.000 --> 09:31.000 Another thing 09:31.000 --> 09:33.000 is 09:33.000 --> 09:34.000 so 09:34.000 --> 09:35.000 I also plan to 09:35.000 --> 09:36.000 have video 09:36.000 --> 09:37.000 in code. 09:37.000 --> 09:38.000 So, video 09:38.000 --> 09:39.000 is just 09:39.000 --> 09:40.000 an audio 09:40.000 --> 09:41.000 like 09:41.000 --> 09:42.000 an animation 09:42.000 --> 09:43.000 audio. 09:43.000 --> 09:45.000 So, 09:45.000 --> 09:46.000 see, 09:46.000 --> 09:47.000 the way 09:47.000 --> 09:48.000 she think about 09:48.000 --> 09:49.000 is that 09:49.000 --> 09:50.000 I can just 09:50.000 --> 09:51.000 extract 09:51.000 --> 09:52.000 bunch of image 09:52.000 --> 09:53.000 from an 09:53.000 --> 09:54.000 audio file. 09:54.000 --> 09:55.000 But what happened 09:55.000 --> 09:56.000 if you had an 09:56.000 --> 09:58.000 audio 09:58.000 --> 10:00.000 so a video 10:00.000 --> 10:01.000 plays 10:01.000 --> 10:02.000 like 10:02.000 --> 10:03.000 one hour long 10:03.000 --> 10:04.000 it would be 10:04.000 --> 10:05.000 a lot of memory. 10:05.000 --> 10:06.000 So, 10:06.000 --> 10:07.000 I'm 10:07.000 --> 10:08.000 inventing 10:08.000 --> 10:09.000 quote and quote 10:09.000 --> 10:10.000 inventing 10:10.000 --> 10:11.000 a streaming 10:11.000 --> 10:12.000 API 10:12.000 --> 10:15.000 in 10:15.000 --> 10:16.000 something 10:16.000 --> 10:17.000 not just a single frame 10:17.000 --> 10:18.000 I want to 10:18.000 --> 10:19.000 process 10:19.000 --> 10:20.000 some frame 10:20.000 --> 10:21.000 other time 10:21.000 --> 10:22.000 it's 10:22.000 --> 10:23.000 matching stuff. 10:23.000 --> 10:24.000 Another way 10:24.000 --> 10:25.000 some of the 10:25.000 --> 10:26.000 new 10:26.000 --> 10:28.000 support 10:28.000 --> 10:29.000 like 10:29.000 --> 10:30.000 like 10:30.000 --> 10:31.000 like 10:31.000 --> 10:32.000 a single frame 10:32.000 --> 10:33.000 so I'm inventing this 10:33.000 --> 10:34.000 guy 10:34.000 --> 10:35.000 in 10:35.000 --> 10:36.000 infrastructure 10:36.000 --> 10:37.000 and another 10:37.000 --> 10:38.000 question is 10:38.000 --> 10:39.000 so 10:39.000 --> 10:40.000 which guy 10:40.000 --> 10:41.000 I think 10:41.000 --> 10:42.000 the one she used to 10:42.000 --> 10:43.000 decode 10:44.000 --> 10:46.000 or 10:49.000 --> 10:52.000 bad 10:52.000 --> 10:53.000 because video 10:54.000 --> 10:55.000 code 10:55.000 --> 10:57.000 so 10:57.000 --> 10:58.000 is 10:58.000 --> 11:00.000 kind of 11:00.000 --> 11:01.000 the 11:01.000 --> 11:03.000 long story 11:03.000 --> 11:04.000 but 11:04.000 --> 11:07.000 I think 11:07.000 --> 11:08.000 no 11:08.000 --> 11:09.000 going to be happy 11:09.000 --> 11:10.000 if i 11:10.000 --> 11:12.000 throks 11:12.000 --> 11:16.000 the link here to you to learn more about my plans for the future. 11:17.000 --> 11:25.000 Okay, so now let's talk about some more important things for you to interact with the easy project. 11:25.000 --> 11:32.000 So yes, we are looking for a contributor, mentioned a developer for NTND, and not just NTND, 11:32.000 --> 11:36.000 Lama CP as a whole project. 11:36.000 --> 11:41.000 So if you want to involve in, I have three tips to that you to give to you. 11:41.000 --> 11:46.000 So the first tip is whenever you want to contribute to something new, 11:46.000 --> 11:48.000 that is why interesting to do. 11:48.000 --> 11:54.000 Let's first look around the code by and let's try to reuse what was already there. 11:54.000 --> 12:00.000 So, for example, for most of the multi-models, 12:00.000 --> 12:04.000 most of the vision model or even some audio model, 12:05.000 --> 12:12.000 you have a function is going to be with and VIPs in vision and vision transformers. 12:12.000 --> 12:19.000 So I already abstract most of the vision transformers into one simple function that you can use. 12:19.000 --> 12:23.000 So a lot of model is just like, just be with and this old, 12:23.000 --> 12:30.000 yeah, you see that model, like, quite not something that you just vision transformers inside. 12:31.000 --> 12:34.000 And if you have some, if you need to add something, 12:34.000 --> 12:39.000 there is not just about miles, that you think is going to be a big thing. 12:39.000 --> 12:42.000 So try to make sure it's sorry. 12:42.000 --> 12:48.000 So try to first maybe open a discussion, an issue to discuss with us. 12:48.000 --> 12:58.000 Second tip is, at least for NTND, and maybe I think for positive code by Lama CP. 12:58.000 --> 13:02.000 You can use AI, yeah, of course, to discover a thing. 13:02.000 --> 13:08.000 But I don't think at least for NTND, the code bay is major enough for AI to work out, 13:08.000 --> 13:15.000 because I already saw some here that used AI under the hood to work out the code. 13:15.000 --> 13:26.000 And even some one PR in specific, they tried to re-implement the matrix multiplication inside, 13:26.000 --> 13:29.000 which is not nice at the same time. 13:29.000 --> 13:33.000 Because, yeah, we have a lot of matrix multiplication inside Lama, 13:33.000 --> 13:39.000 not CP, NGM, and for some reasons, there is still need to reinvent their own 13:39.000 --> 13:42.000 method of matrix multiplication. 13:42.000 --> 13:49.000 So to go back to this point, you can, you should use AI to go back to my last, 13:49.000 --> 13:54.000 my first point is to look as a code bay to NTC, what's already there. 13:54.000 --> 13:59.000 And you can also discuss back and forth with the AI, you know it. 13:59.000 --> 14:03.000 What you're going to add could be adding a lot of code or not. 14:03.000 --> 14:07.000 And if we want to be alive with the directional project, 14:07.000 --> 14:10.000 which point should be brought up for discussion. 14:10.000 --> 14:13.000 So AI is very good for that thing. 14:13.000 --> 14:20.000 And last but not least, let's give a simple answer to bits. 14:20.000 --> 14:25.000 So by simple, I mean, the code bay is quite young. 14:25.000 --> 14:31.000 So I'm not trying to add exact support for all of these, all of the model. 14:31.000 --> 14:38.000 I know some of the model, it has somewhat working thing, not exactly the same implementation as, 14:38.000 --> 14:40.000 at an expected level. 14:40.000 --> 14:41.000 Yes, the code bay is young. 14:41.000 --> 14:44.000 So I just want you to have simple things first. 14:44.000 --> 14:49.000 So let's try and, whenever you push a change, let's try to not be, 14:49.000 --> 14:57.000 do you start to use specific or, yeah, to be to model specific. 14:57.000 --> 15:01.000 And let's try not to break our model. 15:01.000 --> 15:04.000 Of course, that's why we don't want you to do. 15:04.000 --> 15:08.000 But yeah, by the way, when you do the AI code, 15:08.000 --> 15:12.000 they tend to break our model, which is not quite nice. 15:12.000 --> 15:16.000 Yeah, so I think that's own for my talk. 15:16.000 --> 15:21.000 And thank you for your listening and this question time for everyone. 15:21.000 --> 15:29.000 APPLAUSE 15:29.000 --> 15:32.000 We've got time for one question. 15:32.000 --> 15:35.000 OK. 15:35.000 --> 15:38.000 I don't see any hands. 15:38.000 --> 15:41.000 Oh, sorry, yeah. 15:41.000 --> 15:48.000 How long, how long until the Lama CPB will have the ability to process the output of the model? 15:48.000 --> 15:51.000 To work on exactly the level of work. 15:51.000 --> 15:53.000 Yeah, so I'll just repeat the job question here, 15:53.000 --> 15:55.000 because you don't have to microwave it. 15:55.000 --> 15:57.000 Can you go there on the table? 15:57.000 --> 15:58.000 OK. 15:58.000 --> 16:03.000 So the job question was how long that is tight in terms of time, 16:03.000 --> 16:06.000 until that we have the first working like, 16:06.000 --> 16:09.000 somewhat working version of multiple of the output. 16:09.000 --> 16:14.000 I don't have a calm lie. I would say maybe I'm planning it for this. 16:14.000 --> 16:20.000 I'm having a demo for tech to fish now right now. 16:20.000 --> 16:23.000 So I think in one or two months, we're going to get it running. 16:23.000 --> 16:27.000 But for emission generation, it's going to take more time. 16:27.000 --> 16:29.000 I don't know. 16:29.000 --> 16:31.000 Maybe another question we have time. 16:31.000 --> 16:33.000 Yes, sorry. 16:33.000 --> 16:41.000 I just wanted to ask you how to test the computer. 16:41.000 --> 16:47.000 Yeah, so the question is how to configure Lama CPB to use multiple users 16:47.000 --> 16:50.000 on only one GPU or one server. 16:50.000 --> 16:52.000 It actually depends, I think, for now. 16:52.000 --> 17:00.000 The best way for you to do is to try to think or ring with the number of parallel requests, 17:00.000 --> 17:03.000 and not end up being requests. 17:03.000 --> 17:08.000 And as a same time, try to balance it with the context layer. 17:08.000 --> 17:12.000 So that's what all I can tell you. 17:12.000 --> 17:16.000 Do you have no time for that? 17:16.000 --> 17:17.000 Yeah. 17:17.000 --> 17:19.000 Oh, sorry. 17:19.000 --> 17:20.000 Yeah. 17:20.000 --> 17:21.000 Yeah. 17:21.000 --> 17:29.000 Yeah. 17:29.000 --> 17:30.000 Yeah. 17:30.000 --> 17:35.000 So the FAP and BAC, it's very famous. 17:35.000 --> 17:37.000 The FAP and BAC library, why I choose this? 17:37.000 --> 17:39.000 Just because it's so famous. 17:39.000 --> 17:40.000 It's nice. 17:40.000 --> 17:41.000 It's top of a list. 17:41.000 --> 17:43.000 Of course, there are some autoconsiduration. 17:43.000 --> 17:44.000 Yeah. 17:44.000 --> 17:47.000 Do you consider to be optional dependency? 17:47.000 --> 17:48.000 Sorry. 17:48.000 --> 17:52.000 Yeah, optional. 17:52.000 --> 17:56.000 So I haven't considered it yet. 17:56.000 --> 17:59.000 I don't have a clear plan on how to do it. 17:59.000 --> 18:02.000 But I try to make it optional. 18:02.000 --> 18:07.000 No more questions, and just people, because man of that will get recorded. 18:07.000 --> 18:11.000 So please do not stand up and walk away. 18:11.000 --> 18:16.000 And until the speaker is done, otherwise the video would be like horrible. 18:16.000 --> 18:21.000 So nobody stands up and leaves until the speaker is done. 18:21.000 --> 18:25.000 There would be like a three minute, five minute break in between the speakers. 18:25.000 --> 18:28.000 When that is allowed, not why all the questions. 18:28.000 --> 18:29.000 Yeah. 18:29.000 --> 18:33.000 Do I have time for one more question here? 18:33.000 --> 18:34.000 Yeah. 18:34.000 --> 18:38.000 So the guy here, I think he's his easy. 18:38.000 --> 18:45.000 So my understanding is that right now, all you're in 18:45.000 --> 18:50.000 support in Lama CBP is already good enough. 18:50.000 --> 18:57.000 So my understanding is that right now, all you're in support in Lama CBP is working and good enough. 18:57.000 --> 19:05.000 What would be the most powerful model right now to get a well effect out of trying all you're in support? 19:05.000 --> 19:06.000 Yeah. 19:06.000 --> 19:09.000 So we have support from Miss Tang. 19:09.000 --> 19:11.000 That's a Miss Tang model. 19:11.000 --> 19:14.000 That's a bottom audio input. 19:14.000 --> 19:16.000 This model is quite big, I think. 19:16.000 --> 19:22.000 There's another model, it's going to be quite a model, quite an audio, which is also very good. 19:22.000 --> 19:24.000 So you can try it. 19:24.000 --> 19:31.000 And by the way, there's quite an army that allows you to input both types and audio as a same path. 19:31.000 --> 19:32.000 Yeah. 19:32.000 --> 19:33.000 Thank you.