{"id":2067,"date":"2024-06-30T22:01:29","date_gmt":"2024-06-30T21:01:29","guid":{"rendered":"https:\/\/www.hutsky.cz\/blog\/?p=2067"},"modified":"2026-02-25T08:59:59","modified_gmt":"2026-02-25T07:59:59","slug":"llama-cpp-on-fedora-40-with-cuda-support","status":"publish","type":"post","link":"https:\/\/www.hutsky.cz\/blog\/2024\/06\/llama-cpp-on-fedora-40-with-cuda-support\/","title":{"rendered":"Llama.cpp on Fedora 40 with cuda support"},"content":{"rendered":"\n<p class=\"has-text-align-left\"><img loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"372\" class=\"wp-image-2083\" style=\"width: 372px; float: right;\" src=\"https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp.png\" alt=\"llama.cpp logo\" srcset=\"https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp.png 856w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-300x130.png 300w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-768x334.png 768w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-100x43.png 100w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-150x65.png 150w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-200x87.png 200w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-450x196.png 450w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp-600x261.png 600w\" sizes=\"auto, (max-width: 856px) 100vw, 856px\" \/>Not long ago, I bought a laptop with Nvidia card. Partially because I wanted to check out some newer, fancy games (but in reality, when I do have some free time finally, I usually end up playing DOS games anyway). The second, more important reason for choosing Nvidia instead of going down the more trodden AMD\/Radeon path (at least that&#8217;s my understanding that Nvidia is still more of a hassle on Fedora) was that I wanted to give CUDA a try for educational purposes.<\/p>\n\n\n\n<p>I have to say that using a laptop that <a href=\"https:\/\/rpmfusion.org\/Howto\/NVIDIA#Optimus\">combines an integrated Intel GPU with an Nvidia GPU<\/a> can be pretty daunting sometimes, especially when HDMI comes into play (because I use an external monitor when at home). But so far, it was always possible to sort it out somehow and I&#8217;m using Nvidia drivers from RPM Fusion without any bigger issues.<\/p>\n\n\n\n<p>When I bought this laptop, I put Fedora 39 on it and managed to get CUDA working when I was experimenting with some LLMs using llama.cpp. However, when I did an upgrade to Fedora 40 the other day, I had to remove some previously installed stuff to get the Nvidia drivers working again. And now, when I wanted to fiddle with a local LLM again, I had to go through the whole process again so I&#8217;m going to write it down here in case I need to remember the steps in the future (it may be futile, though, as everything in this area moves very fast). Anyway, first I installed CUDA development packages, it had to be from Nvidia&#8217;s repo intended for Fedora 39 (at the time of writing this text, there was nothing available for Fedora 40 yet):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># dnf config-manager --add-repo https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/fedora39\/x86_64\/cuda-fedora39.repo\n\n# dnf install cuda<\/code><\/pre>\n\n\n\n<p>I had the <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\">llama.cpp<\/a> repository checked out locally already so I just updated it to the latest. However, when I tried to build the binaries with CUDA support, it complained about <em>gcc<\/em> being too new. Fedora 40 has version 14 and llama.cpp said it needed version 12 at most. Now, it&#8217;s possible to build an older version of gcc but it seemed easier to me to simply use one from an older Fedora release. At the time of writing this, Fedora 37 repositories, which contained gcc version 12, were still available so I just installed the needed stuff into a temporary location using installroot:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># dnf install --releasever=37 --installroot=\/tmp\/just_a_temp_dir gcc gcc-c++<\/code><\/pre>\n\n\n\n<p>Once that was done, I went to where the CUDA binaries were located and created symlinks in this directory so that the older version of gcc would be used:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  cd \/usr\/local\/cuda\/bin\/\n  ln -s \/tmp\/just_a_temp_dir\/bin\/gcc gcc\n  ln -s \/tmp\/just_a_temp_dir\/bin\/c++ c++\n  ln -s \/tmp\/just_a_temp_dir\/bin\/g++ g++<\/code><\/pre>\n\n\n\n<p>Having done this, I was able to build llama.cpp with CUDA support finally:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cd llama.cpp\/\nmake -B GGML_CUDA=1<\/code><\/pre>\n\n\n\n<p>This is what simple use from the CLI can look like:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5.gif\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"547\" src=\"https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-1024x547.gif\" alt=\"\" class=\"wp-image-2078\" srcset=\"https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-1024x547.gif 1024w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-300x160.gif 300w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-768x410.gif 768w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-100x53.gif 100w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-150x80.gif 150w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-200x107.gif 200w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-450x240.gif 450w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-600x321.gif 600w, https:\/\/www.hutsky.cz\/blog\/wp-content\/uploads\/llamacpp5-900x481.gif 900w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>But of course, it&#8217;s more useful to tap into the llama.cpp API using some other tool, be it the Web UI that comes with llama-server, VS Code or something entirely different. In either case, I went ahead and asked the local llama model to summarize this very post:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Can you summarize this text in bullets?<\/p>\n<\/blockquote>\n\n\n\n<p>and this is what I got:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><\/p>\n\n\n\n<p><strong>Llama<\/strong> I&#8217;d be happy to help you summarize the text in bullets!<br><br>Here&#8217;s a summary of the text:<br><br>\u2022 The user bought an Nvidia laptop for educational purposes (CUDA) and wanted to try it out on Fedora 39.<br>\u2022 They installed CUDA development packages from Nvidia&#8217;s repo intended for Fedora 39, as there was no version available for Fedora 40 at that time.<br>\u2022 To build llama.cpp with CUDA support, the user needed a specific older version of GCC (12 or less), which is not compatible with Fedora 40&#8217;s default GCC 14.<br>\u2022 The solution was to install an older version of GCC from Fedora 37 in a temporary directory using installroot and create symlinks to use this older version instead of the system&#8217;s default GCC.<br><br>Let me know if you have any further requests!<\/p>\n<\/blockquote>\n\n\n\n<p>Well, there&#8217;s still some room for improvement but it&#8217;s pretty neat for a local model running on my laptop.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Not long ago, I bought a laptop with Nvidia card. Partially because I wanted to check out some newer, fancy games (but in reality, when I do have some free time finally, I usually end up playing DOS games anyway). &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"more-link\" href=\"https:\/\/www.hutsky.cz\/blog\/2024\/06\/llama-cpp-on-fedora-40-with-cuda-support\/\"> <span class=\"screen-reader-text\">Llama.cpp on Fedora 40 with cuda support<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[52,8],"tags":[62,48,18,64],"class_list":["post-2067","post","type-post","status-publish","format-standard","hentry","category-hardware","category-operatingsystems","tag-fedora","tag-hardware","tag-linux","tag-llama-cpp"],"_links":{"self":[{"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/posts\/2067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/comments?post=2067"}],"version-history":[{"count":24,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/posts\/2067\/revisions"}],"predecessor-version":[{"id":2096,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/posts\/2067\/revisions\/2096"}],"wp:attachment":[{"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/media?parent=2067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/categories?post=2067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hutsky.cz\/blog\/wp-json\/wp\/v2\/tags?post=2067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}