Hi, I'm Harshini Murali

Aspiring PhD Researcher | AI & Large Language Models for Healthcare

Research interests: LLMs, Explainable AI, Causal ML, Health Informatics

View My Work

About Me & Research Focus

I am a Data Science graduate with a strong research orientation, seeking to pursue doctoral studies in Artificial Intelligence with a focus on Large Language Models and healthcare applications.

My academic work spans machine learning, deep learning, explainable AI, and generative models, with multiple research-oriented projects involving healthcare datasets, clinical prediction tasks, and model interpretability.

Through my MSc research and independent projects, I have developed experience in formulating research questions, designing experiments, evaluating models rigorously, and communicating findings clearly. I am particularly interested in responsible and privacy-aware AI systems for real-world healthcare use.

Your Photo

Research Interests

Education

MSc in Data Science

University of Greenwich

UOG Logo
SRM Logo

Bachelor of Computer Application

SRM Arts and Science College

Class XII (CBSE)

NSN Memorial School

NSN Logo

My Skills

Python

Python

Expert in Python programming for data science and automation.

R

R

Proficient in R for statistical analysis, data manipulation, and visualization.

SQL

SQL

Skilled in SQL for querying and managing relational databases efficiently.

JavaScript

JavaScript

Proficient in JavaScript for web development and interactive interfaces.

Tableau

Tableau

Experienced in creating dynamic visualizations and dashboards.

Power BI

Power BI

Proficient in Power BI for business intelligence and data storytelling.

Data Storytelling

Data Storytelling

Crafting compelling narratives using data to drive business decisions.

Interactive Visualizations

Interactive Visualizations

Creating dynamic and interactive charts for impactful presentations.

Advanced Charting

Advanced Charting

Proficient in creating advanced visualizations such as heatmaps, scatter matrix plots, and multi-dimensional charts for better insights.

Supervised Learning

Supervised Learning

Developing classification and regression models using supervised techniques.

Unsupervised Learning

Unsupervised Learning

Expert in clustering techniques and dimensionality reduction methods.

Clustering

Clustering

Skilled in clustering algorithms like K-Means, DBSCAN, and hierarchical clustering for grouping datasets.

Deep Learning

Deep Learning

Proficient in deep learning using TensorFlow and PyTorch for building advanced neural networks.

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Experienced in processing and analyzing text data using NLP techniques like tokenization, stemming, and sentiment analysis.

Model Optimization

Model Optimization

Expert in optimizing machine learning models using hyperparameter tuning and regularization techniques.

Model Deployment

Model Deployment

Deploying machine learning models using , Heroku, Flask, Docker, and cloud services.

Research Projects & Theses

A selection of my research-oriented work focusing on healthcare AI, large language models, explainable machine learning, and generative modeling.

MSc Thesis — AI-based Knee Osteoarthritis Severity Classification

This dissertation investigated the application of deep learning models for automated severity classification of knee osteoarthritis from medical imaging data. The study emphasized explainable AI to support clinical interpretability and responsible deployment.

Transfer learning was applied using convolutional neural networks, and model decisions were visualized through Grad-CAM to highlight clinically relevant regions. The final system was deployed as an interactive web application.

Deep learning based knee osteoarthritis severity detection interface

Figure: Web-based deep learning system for knee osteoarthritis severity detection with confidence score.

  • Domain: Healthcare AI, Medical Imaging
  • Models: EfficientNetB3, ResNet50
  • Explainability: Grad-CAM
  • Deployment: Hugging Face, Gradio
View Full Thesis (PDF)

Multimodal Generative Modelling — GPT-2, GANs & Diffusion Models

This research project explored generative modelling techniques across text and image domains, comparing large language models with deep generative architectures.

GPT-2 was fine-tuned for text generation tasks, while GANs, diffusion models, and CTGAN were evaluated for synthetic data generation on benchmark datasets. The project examined model performance, diversity, and scalability across modalities.

CIFAR-10 GAN generated image samples

Figure: Synthetic CIFAR-10-like images generated using a DCGAN trained on image data.

  • Focus: Generative AI & Large Language Models
  • Models: GPT-2, GANs, Diffusion Models, CTGAN
  • Datasets: MNIST, CIFAR-10, Tiny ImageNet
  • Tools: PyTorch, TorchVision
View Generative Modelling Report (PDF)

Explainable Machine Learning for Chronic Kidney Disease Prediction

This project applied machine learning techniques to analyze chronic kidney disease datasets, focusing on interpretability and clinical relevance.

Classification models were developed and analyzed using SHAP to explain feature importance and decision behavior, supporting transparent and trustworthy AI systems for healthcare.

SHAP waterfall plot for chronic kidney disease prediction

Figure: SHAP waterfall plot illustrating feature-level contributions in chronic kidney disease prediction.

  • Domain: Healthcare Analytics
  • Methods: Supervised Learning, Feature Importance
  • Explainability: SHAP
  • Tools: Python, scikit-learn
View CKD Explainable ML Report (PDF)

Publications & Manuscripts

Publications are currently in preparation. Research outputs from my MSc thesis and ongoing projects are being developed for submission to peer-reviewed venues.

Applied & Exploratory Projects

Applied machine learning and exploratory projects demonstrating practical implementation of data science and NLP techniques.

Sentiment Analysis on Movie Reviews

This project leverages natural language processing (NLP) to classify sentiments in movie reviews as positive, neutral, or negative using machine learning techniques.

Sentiment Analysis Visualization

Key Features

  • Data Preprocessing (Tokenization, Stemming, Removal of Stopwords).
  • Supervised ML Training on Labeled Datasets.
  • Dynamic Visualizations for Sentiment Insights.
  • Deployment-Ready for Real-Time Analysis.

Technologies Used

  • Python
  • TextBlob
  • Scikit-learn
  • Matplotlib
  • Seaborn

Project Overview

Analyze sentiments of movie review data in real-time using advanced NLP techniques. This project showcases powerful data visualizations and easy-to-deploy pipelines.

View on GitHub

Product Classification Using Machine Learning

This project applies machine learning techniques to classify products into various categories based on their attributes. By analyzing product data, the system improves inventory management and customer experience.

Product Classification Visualization

Key Features

  • Data preprocessing to clean and normalize product data.
  • Feature engineering to derive meaningful insights.
  • Training models using supervised learning techniques.
  • Performance evaluation with precision, recall, and accuracy metrics.
  • Easy integration into e-commerce platforms.

Technologies Used

  • Python
  • Pandas
  • Scikit-Learn
  • NumPy
  • Matplotlib

Project Overview

This project demonstrates the application of machine learning in product classification to enhance e-commerce functionalities. It includes preprocessing, model training, and deployment-ready pipelines.

View on GitHub

Customer Churn Prediction

This project aims to predict customer churn by analyzing historical data patterns. By leveraging machine learning algorithms, businesses can identify at-risk customers and implement strategies to improve retention.

Customer Churn Visualization

Key Features

  • Data Preprocessing: Data cleaning and preparation for accurate analysis.
  • Exploratory Data Analysis: Uncover patterns and trends using visualizations.
  • Predictive Modeling: Classification models to predict churn probability.
  • Feature Importance: Identify key drivers of customer churn.

Technologies Used

  • Python
  • Power BI
  • XGBoost
  • Scikit-Learn
  • NumPy
  • Matplotlib
View on GitHub

Analyzing Chronic Kidney Disease using Clustering

This project focuses on exploring and analyzing Chronic Kidney Disease (CKD) data using unsupervised learning techniques. Clustering algorithms like K-Means and Hierarchical Clustering are applied to identify patterns and group similar cases, providing valuable insights for early diagnosis and targeted interventions.

Clustering Visualization

Key Features

  • Data Preprocessing: Handled missing values and cleaned the dataset for accurate analysis.
  • Exploratory Data Analysis: Visualized correlations and trends in the data to uncover patterns.
  • Clustering Algorithms: Applied K-Means and Hierarchical Clustering for patient grouping.
  • Cluster Insights: Identified high-risk groups based on cluster analysis.

Technologies Used

  • Python
  • Clustering
  • Scikit-Learn
  • NumPy
  • Matplotlib
View on GitHub

Research Experience

Data Analysis

Data Analysis

Performing detailed exploratory data analysis to uncover trends and insights. Cleaning and preprocessing data to ensure its accuracy and reliability for analysis. Conducting statistical analysis to evaluate relationships between variables. Creating insightful visualizations like histograms, scatter plots, and heatmaps. Using feature engineering techniques to prepare datasets for advanced modeling. Helping clients make informed decisions through in-depth data understanding.

Machine Learning Models

Machine Learning Models

Designing and developing predictive models to extract actionable insights from complex datasets. Specializing in supervised and unsupervised learning techniques to solve classification, regression, and clustering problems. Expertise in libraries like TensorFlow, PyTorch, and Scikit-learn. Leveraging hyperparameter tuning and model optimization to improve performance. Implementing scalable ML pipelines for real-world applications in diverse domains. Delivering solutions tailored to meet client requirements and business goals..

Data Visualization

Data Visualization

Crafting visually compelling dashboards and charts to represent complex data effectively. Utilizing tools like Matplotlib, Seaborn, and Tableau to uncover insights. Developing interactive visuals for better data storytelling and decision-making. Simplifying trends and relationships in data through meaningful visual summaries. Focusing on creating user-friendly and intuitive designs tailored to audience needs. Delivering actionable insights for stakeholders by making data accessible and comprehensible.

AI Integration

Product Recommendation

Building recommendation engines to enhance user engagement and retention. Leveraging collaborative and content-based filtering techniques for accurate predictions. Applying matrix factorization methods for handling sparse datasets efficiently. Tailoring algorithms to personalize product, content, or service recommendations. Measuring success through precision, recall, and other evaluation metrics. Delivering scalable solutions to meet user preferences across industries.

Contact

Have a question or want to work together? Reach out to me!

Email: harshinihachu6@gmail.com