Arun S. Maiya

arun [at] maiya [dot] net
CV | LinkedIn | GitHub

Basics

I am a computer scientist at the Institute for Defense Analyses (IDA), a federally-funded think tank in the Washington D.C. metro area. My research broadly focuses on the study of computational methods to extract meaning from raw data and includes the areas of natural language processing, machine learning, data mining, computer vision, and network science (e.g., social network analysis). I like building tools to make machine learning easier to apply in new ways and new areas. Through my work, I have contributed to national-level strategic-planning activities and R&D roadmaps. I completed a Ph.D. in Computer Science at the Laboratory for Computational Population Biology, which is within the Department of Computer Science at the University of Illinois at Chicago (UIC). My CV is here.

Software

OnPrem.LLM is a privacy-conscious toolkit for generative AI that makes it easier to run large language models (LLMs) on your own machine using non-public data.
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. With support for many different data types including text, images, and graphs, ktrain has been used for a wide range of use cases in industry, government, and academia. Examples include analyses for the U.S. Economic Census, financial crime analytics at Big 4 accounting firms, intelligence analyses, and CoronaCentral.ai, a machine-learning-enhanced search engine for coronavirus publications at Stanford University.
CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.
IDATA is a suite of software capabilities designed to facilitate search, exploration, and analyses of very large document sets using state-of-the-art machine learning, NLP, and information retrieval. It has been used for a variety of different application in the DoD including cyber damage assessments, biosurveillance, and policy analyses.

Publications

Generative AI for FFRDCs
A.S. Maiya
IEEE International Conference on Data Mining (ICDM '25). November 2025.

ktrain: A Low-Code Library for Augmented Machine Learning
A.S. Maiya
Journal of Machine Learning Research (JMLR). May 2022.

CausalNLP: A Practical Toolkit for Causal Inference with Text
A.S. Maiya
arXiv preprint arXiv:2106.08043 . Jun 2021. [arXiv only]

A Framework for Comparing Groups of Documents
A.S. Maiya
Proc. 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15). Lisbon, Portugal. Sep 2015.

Mining Measured Information from Text
A.S. Maiya, D. Visser, and A. Wan
Proc. 38th Annual ACM SIGIR Conference (SIGIR '15). Santiago, Chile. Aug 2015.

Topic Similarity Networks: Visual Analytics for Large Document Sets
A.S. Maiya and R.M. Rolfe
Proc. 2014 IEEE International Conference on Big Data (IEEE BigData '14). Washington, D.C., Oct 2014.

Exploratory Analysis of Highly Heterogeneous Document Collections
A.S. Maiya, J.P. Thompson, F. Loaiza-Lemos, and R.M. Rolfe
Proc. 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '13). Chicago, IL, Aug 2013.

Expansion and Decentralized Search in Complex Networks
A.S. Maiya and T.Y. Berger-Wolf
Journal of Knowledge and Information Systems. First published online January 2013.

Supervised Learning in the Wild: Text Classification for Critical Technologies
A.S. Maiya, F. Loaiza-Lemos, and R.M. Rolfe
Proc. IEEE Military Communications Conference (MILCOM '12). Orlando, FL, Oct 2012.

Benefits of Bias: Towards Better Characterization of Network Sampling
A.S. Maiya and T.Y. Berger-Wolf
Proc. 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '11). San Diego, CA, Aug 2011.

Aggression, Grooming, and Group-level Cooperation in White-faced Capuchins: Insights from Social Networks
M.C. Crofoot, D.I. Rubenstein, A.S. Maiya, and T.Y. Berger-Wolf
American Journal of Primatology. First published online May 2011.

Sampling and Inference in Complex Networks
A.S. Maiya
Ph.D. Dissertation, University of Illinois at Chicago (UIC). Chicago, IL, Apr 2011.

Expansion and Search in Networks
A.S. Maiya and T.Y. Berger-Wolf
Proc. 19th ACM Intl. Conference on Information and Knowledge Management (CIKM '10). Toronto, Canada, Oct 2010.

Online Sampling of High Centrality Individuals in Social Networks
A.S. Maiya and T.Y. Berger-Wolf
Proc. 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD '10). Hyderabad, India, Jun 2010.

Sampling Community Structure
A.S. Maiya and T.Y. Berger-Wolf
Proc. 19th ACM Intl. Conference on the World Wide Web (WWW '10). Raleigh, NC, Apr 2010.

Inferring the Maximum Likelihood Hierarchy in Social Networks
A.S. Maiya and T.Y. Berger-Wolf
Proc. 12th IEEE Intl. Conference on Computational Science and Engineering (CSE '09). Vancouver, Canada, Aug 2009.

The Impact of Structural Changes on Predictions of Diffusion in Networks
M. Lahiri, A.S. Maiya, R. Sulo, Habiba and T.Y. Berger-Wolf
ICDM '08 Workshop on Analysis of Dynamic Networks. Pisa, Italy, Dec 2008.

Honors and Awards

Goodpaster Award for Excellence in Research, Institute for Defense Analyses, 2021 [News Release]

This prize is named for Gen. Andrew J. Goodpaster (USA, retired) and is awarded to an individual demonstrating "research excellence, exceptional analytic achievement and intellectual leadership."

Welch Award for Best External Research, Institute for Defense Analyses, 2016

Named in honor of General Larry D. Welch (USAF, ret.), this award "honors individuals whose external research publications exemplify General Welch's high standards of analytic excellence and relevance."

AFEI Award for Excellence in Enterprise Information, NDIA (formerly Association for Enterprise Information), 2015

This award is to "recognize and reward the contributions and achievements of project teams that exemplify excellence in achieving integrated enterprises. Winning teams are models of the best applications of technology and leadership to improve enterprise performance."