Naga Harshita Marupaka

Naga Harshita Marupaka

Machine Learning Engineer | AI/ML & Data Specialist
Los Angeles, US.

About

Highly accomplished Machine Learning Engineer and Data Specialist with a proven track record of developing and deploying advanced AI/ML solutions, optimizing data pipelines, and enhancing software systems. Expertise spans across large language models (LLMs), generative AI, speech recognition, and complex data analytics. Adept at leveraging Python, Java, and cloud platforms (AWS, GCP) to drive impactful results and improve system performance by significant margins. Seeking to apply innovative technical skills to challenging roles in AI/ML engineering and data science.

Work

In The Loop
|

Machine Learning Engineer, LLM

Summary

Developing and scaling advanced LLM-based solutions for apparel attribute prediction and multimodal inference services.

Highlights

Spearheaded the implementation of a post-inference mapping layer for apparel attribute prediction, achieving over 95% label match accuracy across 3,000+ SKUs by accurately translating model predictions into client-specific taxonomies.

Engineered and scaled multimodal Large Language Model (LLM) inference services on Kubernetes, significantly improving concurrent request handling efficiency by 20%.

Easley-Dunn Productions
|

Software Engineer, Data

Summary

Focused on integrating player telemetry, optimizing game performance, and enhancing player engagement through data-driven insights and A/B testing.

Highlights

Integrated advanced player telemetry and critical gameplay features for the Spurpunk game (Unity/C#), which reduced data tracking delays by 25% and accelerated performance tuning across production and test environments.

Conducted in-depth analysis of player behavior to pinpoint resource imbalances, then designed and executed targeted A/B tests that directly informed level design enhancements, resulting in a 15% improvement in player engagement.

Alzheimer's Therapeutic Research Institute
|

Software Engineer, ML

Summary

Developed and optimized AI/ML-driven solutions for speech recognition and data visualization, contributing to research initiatives.

Highlights

Authored a comprehensive comparison report evaluating over 15 open-source speech recognition libraries, focusing on performance metrics, resource allocation, and conducting benchmark analysis across multiple languages to inform strategic technology decisions.

Developed and optimized a robust video transcription pipeline incorporating speaker distinction, which successfully reduced timestamp discrepancies by 20% and improved transcription accuracy.

Designed and implemented scalable REST APIs using Django Rest Framework and an interactive LIMS Dashboard with D3.js, adhering to stringent software design patterns and test-driven development principles.

JoshTalks
|

Software Engineer

Summary

Engineered real-time communication features, optimized API performance, and streamlined deployment processes.

Highlights

Led the integration of speaker settings for a real-time group voice call feature utilizing Pub/Sub architecture, successfully piloting the solution with over 100 users.

Optimized API performance by integrating Redis caching, resulting in faster data access and a significant 60% reduction in latency from 2 seconds to 0.8 seconds.

Automated and streamlined CI/CD deployment pipelines using Cloud Build triggers, which reduced release time by 20% and enhanced development efficiency.

Education

University of Southern California

Master's

Computer Science

Courses

Analysis of Algorithms

Machine Learning

Natural Language Processing

Multimedia Systems Design

Indian Institute of Information Technology

Bachelor's

Computer Science

Courses

Data Structures and Algorithms

Information Retrieval

Deep Learning

Distributed Computing

Skills

Programming Languages

Python, Java, C/C++, SQL, JavaScript, TypeScript, Go.

Machine Learning

PyTorch, Scikit-Learn, Pandas, NumPy, Langchain, Hugging Face, Open CV.

Generative AI & Models

LLMs, Speech (TTS/ASR), Image (ViT), Multimodal Models, GPT-4V, Gemini.

Web Frameworks & Libraries

FastAPI, Django, React.js, Next.js, Node.js, D3.js.

Cloud & DevOps

AWS (EC2, S3), GCP, Docker, Kubernetes, Firebase, Apache Airflow.

Projects

Named Entity Recognition (NER)

Summary

Trained a robust BiLSTM-CRF model for Named Entity Recognition, addressing class imbalance and achieving high accuracy.

Image Inpainting using Deep Learning

Summary

Explored and implemented advanced deep learning techniques for image inpainting and spatial reconstruction.

Publications

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling

Published by

ACL 2025, SDP Workshop

Summary

Developed a multimodal Chain-of-Thought (CoT) reasoning pipeline for Scientific Visual QA, achieving high performance metrics and significantly accelerating inference speed.