Farrokh Mehryary profile picture
Farrokh
Mehryary
Project Researcher, Data analytics
Senior Researcher, Data analytics
PhD in Computer Science
Natural Language Processing, Text mining, Deep learning, BioNLP, Bioinformatics

Contact

Areas of expertise

Development and optimization of deep learning and LLM-based methods for building large-scale and trustworthy NLP/text mining applications. Specialty in Biomedical Natural Language Processing (BioNLP) and text mining
low-resource setups (where no or minimal training data exists)
and in bioinformatics application development (protein function/structure prediction).

Biography

As an NLP and text mining specialist, I have been an active member of TurkuNLP lab (since 2013), and a Silo AI employee (since 2020). With over a decade of experience in academic research and publication (with 3000+ citations), university teaching, and international collaborations, and over 20 years of experience in software engineering (project management, system analysis and design, software development), I am specialized in development and optimization of deep learning and LLM-based methods for large-scale NLP and text mining applications, with particular focus on (1) low-resource setups (where no or minimal training data exists), and (2) the biomedical domain (BioNLP). I am also very capable in bioinformatics (with speciality in protein function/structure prediction).

As a senior researcher at the university, and as part of an international collaboration between TurkuNLP and various research groups across Europe, for the last four years I have worked in “Deep learning for next-generation biomedical text mining” project, designing, optimizing and running an information extraction pipeline for the STRING database, extracting information from millions of PubMed abstracts and PubMed central full text articles. Thus, I am very capable in working with very large datasets, and fine-tuning, optimizing and running LLMs, simultaneously on hundreds of GPUs.

As a Senior AI Scientist (LLMs, NLP, text mining specialist) in Silo AI, I have worked on several LLM projects, including building RAG systems, building information extraction systems, synthetic text generation, optimizing LLM-based systems with DSPy, and extracting information and tables from multilingual PDF documents (MS Document Intelligence, prompt engineering, and GPT models). In addition, I have helped in designing a GenAI/LLM course which will be offered and taught by Silo to the employees of an industrial corporation. Finally, I do a lot of sale's support in Silo AI, attending as an LLM/NLP/Text mining expert in various pre-sales client meetings, to understand and translate their business requirements into practical AI solutions.

Whenever it was possible, I have worked simultaneously in academia and industry, gaining and bringing state-of-the-art knowledge and experience from the university to a company back and forth, and utilizing them in both academic and company/client projects. Personally, I love this approach, since this has allowed me to get the best of both worlds, and grow rapidly in the field.

Teaching


I have been the responsible teacher for the course Algorithms in Bioinformatics, University of Turku, 2015-2020. I have also helped in teaching other NLP courses including Text mining and Deep Learning in Language Technology ​​​​​​​ at the Department of Computing, University of Turku. 


Research

With a strong track record in publication, achieving high ranks in several international text mining and machine learning competitions, and achieving the state-of-the-art results on several important datasets, Farrokh has been specializing in deep learning-based methods for Biomedical Natural Language Processing (BioNLP) and text mining. His research has focused on low-resource setups, where minimal training data is available.  

During 2021, Farrokh has worked as an AI scientist for Silo AI, developing text mining systems for clients, and as a researcher for AI academy, helping in the development of Massive Open Online Courses (MOOC). In 2022, Farrokh received his PhD degree certificate in Computer Science from University of Turku, with his thesis on ‘Optimizing Text Mining Methods for Biomedical Natural Language Processing’. Currently, Farrokh has a senior researcher position in TurkuNLP group, working on biomedical natural language processing and text mining. 

Publications

Sort by:

End-to-End System for Bacteria Habitat Extraction (2017)

Workshop on Biomedical Natural Language Processing
Farrokh Mehryary, Kai Hakala, Suwisa Kaewphan, Jari Björne, Tapio Salakoski, Filip Ginter
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))

An expanded evaluation of protein function prediction methods shows an improvement in accuracy (2016)

Genome Biology
Jiang YX, Oron TR, Clark WT, Bankapur AR, D'Andrea D, Lepore R, Funk CS, Kahanda I, Verspoor KM, Ben-Hur A, Koo DCE, Penfold-Brown D, Shasha D, Youngs N, Bonneau R, Lin A, Sahraeian SME, Martelli PL, Profiti G, Casadio R, Cao RZ, Zhong Z, Cheng JL, Altenhoff A, Skunca N, Dessimoz C, Dogan T, Hakala K, Kaewphan S, Mehryary F, Salakoski T, Ginter F, Fang H, Smithers B, Oates M, Gough J, Toronen P, Koskinen P, Holm L, Chen CT, Hsu WL, Bryson K, Cozzetto D, Minneci F, Jones DT, Chapman S, Dukka BKC, Khan IK, Kihara D, Ofer D, Rappoport N, Stern A, Cibrian-Uhalte E, Denny P, Foulger RE, Hieta R, Legge D, Lovering RC, Magrane M, Melidoni AN, Mutowo-Meullenet P, Pichler K, Shypitsyna A, Li B, Zakeri P, ElShal S, Tranchevent LC, Das S, Dawson NL, Lee D, Lees JG, Sillitoe I, Bhat P, Nepusz T, Romero AE, Sasidharan R, Yang HX, Paccanaro A, Gillis J, Sedeno-Cortes AE, Pavlidis P, Feng S, Cejuela JM, Goldberg T, Hamp T, Richter L, Salamov A, Gabaldon T, Marcet-Houben M, Supek F, Gong QT, Ning W, Zhou YP, Tian WD, Falda M, Fontana P, Lavezzo E, Toppo S, Ferrari C, Giollo M, Piovesan D, Tosatto SCE, del Pozo A, Fernandez JM, Maietta P, Valencia A, Tress ML, Benso A, Di Carlo S, Politano G, Savino A, Rehman HU, Re M, Mesiti M, Valentini G, Bargsten JW, van Dijk ADJ, Gemovic B, Glisic S, Perovic V, Veljkovic V, Veljkovic N, Almeida-e-Silva DC, Vencio RZN, Sharan M, Vogel J, Kansakar L, Zhang S, Vucetic S, Wang Z, Sternberg MJE, Wass MN, Huntley RP, Martin MJ, O'Donovan C, Robinson PN, Moreau Y, Tramontano A, Babbitt PC, Brenner SE, Linial M, Orengo CA, Rost B, Greene CS, Mooney SD, Friedberg I, Radivojac P
(Vertaisarvioitu alkuperäisartikkeli tai data-artikkeli tieteellisessä aikakauslehdessä (A1))