Ever-greater convergence between linguistics and data science is creating new opportunities for language professionals comfortable at the intersection of these two disciplines. In this interview, translation graduate and data scientist Rubén Rodríguez de la Fuente outlines a career path he embarked on more than a decade ago, shares what instigated the transition, and indicates the areas where he believes linguists will be most needed.
Before we start, could you briefly outline your career path from T&I graduate to data scientist?
After a couple of years freelancing, I got hired as in-house translator in an agency specializing in software localization. That got me interested in the technical side of things. Then I did project management for 2–3 years. Although it was an interesting experience, I decided to go back to my roots, taking an in-house linguist position with my current employer. Then in 2011 I was involved as evaluator for a machine translation pilot and things slowly began to change. I was given the chance to work full time as machine translation engineer and I did it for 7 years, and from there I moved to my current role as data scientist.
Would you say that transition was a series of fairly natural steps, or did you completely change field at some point?
When working as MT engineer, I was responsible for building customized engines and that felt a bit like an extension of my work as translator, sort of like teaching computers how to translate.
It felt natural to me. When working as MT engineer, I was responsible for building customized engines and that felt a bit like an extension of my work as translator, sort of like teaching computers how to translate. I think being a translator actually gave me an edge over people from a software engineering background, as I knew where computers would struggle and could think proactively on how to handle those challenges.
Working on MT served as an entry point to two disciplines: natural language processing in the broadest sense (to handle tasks other than MT) and analytics (to be able to systematically assess and improve the performance of the MT engines I built).

What were the main tasks you performed in each role? What do you do now as a data scientist?
As MT engineer, I was responsible for putting together and maintaining linguistic assets— corpora and glossaries—for MT training, for training the engines and retraining them and for measuring and improving their performance over time.
When I moved to a data science role, my focus switched to customer analytics and these days I perform tasks like topic detection in customer feedback (e.g. identifying recurring themes), preparing data visualizations for different reporting services or doing statistical analysis. Lately, I’ve been exploring how LLMs can be used to enhance data science work.
What first led you to computational linguistics? How much of it was innate interest and how much was in response to external forces?
I remember seeing my first localized interface and it sort of struck a chord and I became more interested in learning about programming and automation.
I wasn’t a particularly technical or STEM-oriented person until I started working on software localization. I took Latin and Greek in high school and I didn’t own a computer until I was 21. But then I remember seeing my first localized interface and it sort of struck a chord and I became more interested in learning about programming and automation. I suppose the interest was there, but pretty dormant.
How did you learn what you needed to know? How much of that knowledge did you gain on the job and how necessary was it to take additional qualifications?
It was a mix of learning on the job, self-learning, and online classes. It’s hard to give percentages here, but certainly the on-the-job part is critical because it’s how that knowledge you’re picking up becomes relevant. I prefer short courses that give me the basics and allow me to start testing things in my job quickly, as opposed to more formal programs. Also, I need to point out that I’ve been lucky to have colleagues who were very generous with their time and knowledge. I feel that’s the right approach, as it allows your team to raise its overall performance, as opposed to letting you shine as an individual

In your experience in this segment, is your profile as linguist and data scientist common or a rarity? Why would you say that is?
I know quite a few people who come from a linguistic background and they’re working on computational linguistics or AI.
My feeling is that it’s not super common, but it’s less rare than you’d expect. I know quite a few people who come from a linguistic background and they’re working on computational linguistics or AI. A few years back I was invited to attend a seminar on MT at the UCM and I was pleasantly surprised to hear that a significant share of AI PhD students were Translation or Philology graduates. So, to recap, I think it’s more common than we would expect, but there’s probably still a fear factor when it comes to technical stuff that is putting people off.
What advice would you give to linguists keen to work in natural language processing? What would you say are the keys to entering the field, and what do you see as the biggest obstacles for T&I graduates?
I remember taking a data science class that, referring to basketball, made the argument that to get started you only need to learn to pass, catch and shoot. We often think we need a PhD before we can do data science or machine learning and that’s not the case. You can learn a few fundamentals and that is going to take you a long way. My first machine learning projects were done with Weka, open-source point-and-click software that lets you get started with ML without writing code.
Despite many linguists’ fears, there are still hugely exciting opportunities ahead for us, but it does mean we need to adapt quickly and be proactive.
So, start small, be patient, and keep at it, working bit by bit. Don’t set grandiose goals, just pick something you’re interested in (for instance, coding in Python, regular expressions, or data visualization), start learning, and start applying it to your job as much as you can to build on it. Despite many linguists’ fears, there are still hugely exciting opportunities ahead for us, but it does mean we need to adapt quickly and be proactive, as opposed to letting other stakeholders drag us in unwanted directions.

To close, what do you see happening in the future as developments in machine learning bring ever-greater convergence between linguistics and data science? What are the likely roles for linguists in that future?
The term context engineering has become popular recently. And really, is there any other profession better equipped to deal with context than translators?
It’s an interesting time in that large language models seem to be almighty and are applying a lot of pressure on the labor market. But there are two areas where linguistic expertise is really relevant. One is evaluation: despite the LLM-as-a-judge technique (where a superior LLM evaluates a less capable LLM), human-in-the-loop remains critical, especially for sensitive tasks. The other is the training part: over time, we’ve seen that a well-curated dataset beats brute-forcing with more data. Interestingly, and very related to training, the term context engineering—referring to making sure the LLM has the right context to answer the request—has become popular recently. And really, is there any other profession better equipped to deal with context than translators?

Rubén Rodríguez de la Fuente
Rubén Rodríguez de la Fuente is a translator turned data scientist based in Madrid with a background that bridges languages and technology. Trained at the Universidad de Granada and the European Commission Translation Service, he began his career in translation and localization before moving into data science. Rubén has worked extensively on topics like statistical machine translation and post-editing and shared insights through publications and conference talks (e.g. ATA, GALA, LocWorld), aiming to make complex tools more accessible to translators and teams. His perspective emphasizes the value of human expertise in localization while recognizing the role of technology as a practical aid rather than a substitute.

