While mainstream large language models (LLMs) certainly have their allure, in their current form they nearly all raise enough ethical concerns to call their use into question: they breach copyright law in most jurisdictions, pose both immediate and potential threats to data confidentiality and privacy and demand disproportionate amounts of electricity and water to power and cool the servers that host them. In this context, local LLMs are emerging as a feasible alternative. Though these small models lack the capabilities of their larger counterparts, they are advancing rapidly and are now at the point where professional use is increasingly viable.
Why local LLMs?
At the time of writing — April 2025 — none of the current mainstream offerings simultaneously meet the criteria of being ethical, secure and sustainable.
The technology behind large language models (LLMs), the form of artificial intelligence that linguists are most likely to interact with, holds undeniable potential. Equally undeniable is the fact that at the time of writing — April 2025 — none of the current mainstream offerings simultaneously meet the criteria of being ethical (defined as adhering to the law, causing no harm, and being transparent, fair and accountable), secure (ensuring both current and future data confidentiality and privacy) and sustainable (in terms of energy and water consumption).
Away from the spotlight, however, several organisations are developing open-source LLMs trained on permissively licensed, copyright-free or synthetic content and designed to run locally on consumer-grade hardware. Initiatives of this type are under way at Pleias (LLMs for government, law and science), openGPT-X (predominantly non-English content covering all 24 European languages), the ALEA Institute (principally for law) and IBM (LLMs for enterprise). While deploying the Pleias, Teuken and KL3M models built by the first three organisations, respectively, at present requires IT expertise, the IBM Granite family can now be installed and operated with third-party open-source software in just a few clicks.
What are local LLMs’ constraints and benefits?
The utility and performance of an LLM running on a consumer device are determined by three factors: the type of content used to train it, the number of parameters (for small models, currently 1–8 billion) and the local hardware. Ethical AI use addresses these apparent constraints and makes them strengths.
Training data
Ethical training data generally spans material permissively licensed under a Creative Commons licence or similar, out-of-copyright text and synthetic data.
Ethical training data generally spans material permissively licensed under a Creative Commons licence or similar, out-of-copyright text (i.e. at least 50 years old, as per the Berne Convention) and synthetic data. The latter tends to come in the form of machine-generated pairs of questions and answers and, though not likely to provide a wealth of linguistic variety, adapts LLMs to particular fields and improves their ability to follow user instructions and perform pre-defined tasks.
Model size
Aware of the limitations inherent in small LLMs, developers create most with retrieval-augmented generation (RAG) and similar approaches in mind.
The number of parameters influences both model performance and size, making a trade-off inevitable since more parameters means larger files, and greater file size requires greater amounts of RAM, which is usually limited on consumer hardware. This is one of the areas where small LLMs are advancing most rapidly and, despite the perennially overstated claims about AI’s capabilities, the latest 8B-parameter models do indeed perform on a par with previous larger iterations. While 7–8B parameters is the minimum an LLM needs at present to carry out a broad range of linguistic operations effectively, models in the 1–3B range can still be used for short, simple tasks. Moreover, aware of the limitations inherent in small LLMs, developers create most with retrieval-augmented generation (RAG) and similar approaches in mind, as they allow users to supplement the baseline content — for example by adding glossaries, style guides, reference material and parallel texts — and so tailor the output to a particular field or content type.
Local hardware
Local hardware is the final part of the performance equation, and the more powerful the device, the better in terms of functional model size and response speed. For instance, a 64-bit Intel i5 CPU and 16 GB of RAM are about the minimum needed to run the 1B-parameter version of the IBM Granite 3.1 model at a speed viable for professional use (20 tokens/second). In contrast, a 64-bit Intel i7 CPU and 16 GB of RAM will achieve twice that response speed with the 1B-parameter model and will generate 20 tokens/second with the 3B-parameter version. Although both those laptops will run the 8B-parameter IBM Granite 3.1 model, the drop in response speed makes using it for interactive tasks impractical. Newer CPUs, more RAM and, ideally, a GPU or similar (at least 8 GB) will all produce faster generation speeds and are factors worth considering when selecting or upgrading hardware.
Protection of confidential and personal data
Data confidentiality is one of local LLMs’ strengths. Since nothing leaves the device, no professional or personal data are compromised.
Data confidentiality is one of local LLMs’ strengths. Since nothing leaves the device, no professional or personal data are compromised. This not only covers the source text, but also any reference material submitted to the LLM when using techniques like RAG to steer its response. While the mainstream online vendors assure user data will not be retained for training or other purposes when connecting under a paid API key (usually via a separate application), at the time of writing that guarantee often does not extend to data submitted through the chatbot interface, even on subscription plans. Data leaks and hacks are other major security threats and, of course, corporate circumstances, policies and terms of use can change overnight.
Sustainability
Given that ethical LLM developers necessarily work with significantly smaller datasets, the energy and water consumed in training them is many orders of magnitude lower.
Environmental impact is the third area where mainstream LLMs present significant challenges. Definitive figures on the amounts of energy and water needed to train the big foundation models or generate each chatbot response are as yet unavailable, but the fact that power demand is great enough to warrant government and industry planning and building dedicated plants to serve AI data centres is telling. Given that ethical LLM developers necessarily work with significantly smaller datasets, the energy and water consumed in training them is many orders of magnitude lower. Nevertheless, it is in deployment that local models’ sustainability really comes to the fore, since running a small LLM on a laptop consumes only marginally more energy than typical daily work usage.
How do we run LLMs locally?
Having established local LLMs’ credentials, how do we use them? In parallel to small models’ progressive advance, various organisations have developed open-source applications to run them. Worth mentioning in this field are Ollama, LM Studio, Jan and Nomic GPT4All. Since Ollama targets developers comfortable with the command line and LM Studio and Jan are designed for power users with high-end hardware, GPT4All is likely the best place to begin. Although the following paragraphs refer to GPT4All, the features described are common to all four applications.
If we plan to test multiple models and customise their capabilities, an external SSD can soon become a necessity.
Getting started with GPT4All involves installing the application and downloading a suitable model. Since the LLMs and associated data are stored locally, they require substantial disk space, and if we plan to test multiple models and customise their capabilities, an external SSD can soon become a necessity.
GPT4All currently offers two model download sources within the application: a selection specifically configured for GPT4All (which does not yet include the Granite family) and HuggingFace (which includes a much wider choice of models, including the Granite LLMs). It likewise features remote providers and a custom option allowing users to connect to private models hosted on cloud services (e.g. a larger ethical model on a secure account). Outside the application, users can also download GGUF-format LLMs directly to the local model folder from HuggingFace or the developer’s website.
As is the case with the big online LLMs, linguists will soon find that although the model’s baseline outputs appear superficially acceptable, closer examination reveals numerous errors and shortcomings. This is where RAG — in the form of the LocalDocs feature in GPT4All — helps. Users can compile a collection of monolingual and bilingual reference material and, once indexed and converted into a vector database (which can take time), that input influences the LLM’s output.
It should be emphasised here that, as a probabilistic model, there are no guarantees how the LLM will use the reference material in its output, or if it even will.
In practice, it means that adding a domain-specific glossary, relevant translation samples and a style guide to a LocalDocs collection increases the chance of the model including that terminology and wording in its response. It should be emphasised here that, as a probabilistic model, there are no guarantees how the LLM will use the reference material in its output, or if it even will.
So far, everything described has been achieved through GPT4All’s chatbot interface. Like its online counterparts, this method is cumbersome to use alongside other software because of the constant need to copy and paste between windows. As an alternative to that, GPT4All has a local server mode. Thus, if our main translation application allows customised LLM connections — not widespread at present, but likely to become more common — we can integrate a local LLM into our usual workflow in the same way we would any other provider.
Conclusions
In closing, it is worth stressing three points:
- Firstly, new users will take time to find the right balance between hardware, local model size, application settings, task, reference material and prompt. There are no shortcuts, as each case is unique; the only way to achieve that balance is through painstaking trial and error.
- Secondly, LLMs are merely another tool. They can provide useful suggestions, but they still require constant and active intervention, which is always more time-consuming than first thought. This fact should not be overlooked, since whether supporting drafting, editing or quality assurance, all productivity aids are ultimately constrained by the speed at which our brains can analyse the issue before us, determine a response and apply the relevant expertise.
- Thirdly, ethical AI is still very much in its infancy, as evidenced by the scarcity of models, and applications capable of running local LLMs are only slightly more mature. Both situations are likely to improve as more ethical models are created, the models themselves become faster and more powerful, users find effective ways of harnessing them and upgrade to more powerful hardware, and translation software developers add features that place experts firmly at the helm.
Running ethical, secure, sustainable and highly customisable AI on a consumer laptop is increasingly feasible.
Running ethical, secure, sustainable and highly customisable AI on a consumer laptop is increasingly feasible, making now the time to start understanding how the technology works, exploring its potential and finding practical professional applications for it.
Bibliografía
Ethical LLMs
Local LLM chatbot interfaces:

Andrew Steel
Andrew Steel (MBA, DipTransIoLET, BA Hons) translates financial, institutional and corporate content from Spanish to English. He is co-founder and managing partner of Veritas Traducción y Comunicación, SL, a quality-certified (ISO 9001 and ISO 17100) boutique translation practice. He follows technology applicable to translation closely and actively explores avenues that empower linguists to produce their best work. He is a long-time member of Asetrad and MET in Spain and of the Chartered Institute of Linguists (MCIL) and the Institute of Translation and Interpreting (MITI) in the UK.