_
Close
Hi!

Welcome. I'm

KSV
Muralidhar

Data Scientist &

Azure 3x Certified

LinkedIn HuggingFace Kaggle Tableau Public PyPi
BuiltIn Docker Hub Credly

About Me

I'm a data scientist who's been working with data for more than seven years. I love to solve business problems and automate business processes by leveraging data and technology. I love to share my data science knowledge through platforms like Kaggle, Medium, Tableau Public and GitHub. My interests lie in Machine learning, Deep learning, NLP, Statistics, Time series forecasting, Data analysis, Data visualization, Data driven decision making, Web scraping, Process automation and big data analytics with PySpark. Currently I'm exploring advanced NLP, MLOps and Big Data Engineering.

Projects

News Aggregator

News Aggregator

News aggregator is an AI-powered app that aggregates news from a few selected RSS news feeds. The details of the backend services are given below:
- News indexing service: Extracts news items from a list of RSS feeds, computes the sentence embeddings using sentence transformers and stores the embeddings in Milvus vector database for content-based recommendations using semantic search.
- Embedding deletion service: Articles and their embeddings older than a month are deleted from Milvus vector database.
- ETL service: Extracts news items from a list of RSS feeds in parallel using multiprocessing. The news items are processed and the categories of the news items are predicted using DistilBERT model that is fine-tuned (full fine-tuning) on news article headline classification task, the model is then quantized using TFLite. Top 5 similar news articles are identified from Milvus vector database using cosine similarity. The similar articles are then reranked using cross encoders. News articles are summarized using BART model that is fine-tuned on CNN and Daily Mail news articles. The transformed data is loaded into MongoDB Atlas (MongoDB-as-a-service).

The backend services are periodically triggered using a CRON job scheduled using GitHub Actions. The front end service reads the data from MongoDB Atlas and renders it in HTML when a user invokes it.

  • Python
  • NLP
  • Deep Learning
  • ETL
  • Text Classification
  • Text Summarization
  • Content-based Recommendation
  • BeautifulSoup
  • Hugging Face
  • DistilBERT
  • BART
  • MongoDB
  • Milvus Vector DB
  • Redis
  • Quart
  • Docker
  • CRON Job
  • GitHub Actions
  • Multiprocessing

Front-end code       Architecture       Try itApp may be in sleep mode. It may take time to load. Your patience is appreciated.
Income Range Predictor

Income Range Predictor

This project demonstrates the development of a production-grade end-to-end binary classification application. The process of developing the application is as follows:
* Data is ingested from a database, validated and cleaned.
* Data is preprocessed by removing id variables, variables with low variance, winsorizing outliers, imputing missing values, removing correlated features, marking rare categories, one-hot encoding, ordinal encoding and scaling. The hyperparameters of preprocessing steps are tuned using Hyperopt while training.
* New features are constructed as part of feature construction step.
* Data is clustered and clusters are added as features to data. Best 'k' is found using silhouette score.

Read more

  • Python
  • Machine Learning
  • FastAPI
  • Flask
  • MLflow Tracking
  • Hyperopt
  • Bitbucket
  • Unit Testing
  • Pytest
  • CI/CD
  • Docker

API code       Front-end code       Data Drift Monitor       Try itApp may be in sleep mode. It may take time to load. Your patience is appreciated. Available shared RAM is 512 MB and the app may face out of memory issue. Hence, availability is not guaranteed.
News Article Classifier

News Article Classifier

This project demonstrates the development of a production-grade end-to-end multi-class text classification application using a small and imbalanced dataset. News Article Classifier classifies a news article into sport, entertainment, politics, tech and business classes. The model is trained on 1,500 BBC news articles. The data is augmented (using nlpaug) to increase the size of the dataset by 35X. The text is vectorized using Google Word2Vec and its dimension is reduced using PCA. Multiple machine learning models were auto tuned using Hyperopt to find the best performing model. The ML model is served as an API developed using FastAPI. The front-end application makes an API call upon a user request. Both API and front-end application are dockerized and deployed as individual web services to render.com.

  • Python
  • NLP
  • Word2Vec
  • Gensim
  • nlpaug
  • FastAPI
  • Flask
  • Hyperopt
  • Bitbucket
  • Unit Testing
  • MLflow Tracking
  • Pytest
  • CI/CD
  • Docker

API code         Front-end code         Try itApp may be in sleep mode. It may take time to load. Your patience is appreciated. Available shared RAM is 512 MB and the app may face out of memory issue. Hence, availability is not guaranteed.
Fine-Tuning HuggingFace Models

Fine-Tuning HuggingFace Models

The project uses TensorFlow to fine-tune HuggingFace models for various tasks like text classification, text summarization and text translation. The models are fine-tuned for the following tasks:

- Text classification using DistilBERT.
- Text summarization using T5.
- Translation from English to French using T5.
- Translation from English to Hindi using ByT5.
- Text embedding extraction from DistilBERT and RoBERTa.
- Text classification using RoBERTa.
- Text classification using ALBERT & XAI with LIME.
- Hindi text summarization using mT5.
- Text summarization using Bard.
- Named entity recognition using DistilBERT and DeBERTa.

  • Python
  • NLP
  • TensorFlow
  • HuggingFace
  • Fine-Tuning

View Project         View HF Spaces
Fine-Tuning YOLO

Fine-Tuning YOLO For Object Detection

The project fine-tunes YOLO for object detction tasks like:

- Vehicle Number Plate Recognition
- Credit Card Detection & OCR
- Face Detection

  • Python
  • Computer Vision
  • Object Detection
  • YOLO
  • EasyOCR
  • Fine-Tuning
Data Analyzer

Data Analyzer

It is an auto EDA package to perform basic exploratory data analysis. The package enables user to derive data structure summary, get summary of categorical and numeric attributes, plot correlation matrix, derive chi-square test results, generate plots between different attribute types, identify mutual information and multicollinearity.

  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Auto EDA

View Code         Example         View in pypi.org
Vector Databases

Vector Databases

This project demonstrates the process of converting images/text into vectors. Inserting the image/text vectors into Milvus open-source vector database. Querying the vector database to find similar images/text.

-   Insert and query images using vector database
-   Insert and query text using vector database

  • Python
  • Vector database
  • Milvus
  • Text embeddings
  • Image embeddings
Machine Learning From Scratch

Machine Learning From Scratch

Machine learning algorithms, data preprocessing functions, cross-validation functions developed from scratch using Python.

-   ML algorithms from scratch
-   Data preprocessing functions from scratch
-   Cross-validation functions from scratch

  • Python
  • Machine Learning
  • Cross-validation
  • Data Preprocessing
Incremental Machine Learning

Incremental Machine Learning

This project demonstrates the process of preprocessing and incrementally training SGD classifer on a dataset having ~33.5M samples and 23 features (~9.5 GB in size). Parallelizing the EDA and data preprocessing tasks using multiprocessing package. Reducing the dimensionality using IncrementalPCA.

  • Python
  • Machine Learning
  • Multiprocessing
  • Pandas
  • EDA
  • IncrementalPCA

View Project
Vegetable Image Classifier

Vegetable Image Classifier

This project classifies the uploaded color images of vegetables into 15 classes. The model is trained using transfer learning technique (MobileNet-v2) and is quantized using TensorFlow Lite which resulted in 5x reduction in model size. The training dataset contained 15,000 (1,000 per class) vegetable images captured by a mobile phone camera.

  • Python
  • Deep Learning
  • Image Classification
  • CNN
  • TensorFlow
  • TensorFlow Lite
  • Distributed Training
  • Transfer Learning
  • MobileNet-v2

View Project         Try itApp may be in sleep mode. It may take time to load. Your patience is appreciated.
Time Series Forecasting

Time Series Forecasting

This project forecasts the hourly electricity consumption using LSTM model.

  • Python
  • Deep Learning
  • Time Series Forecasting
  • LSTM
  • TensorFlow

View Project
Export Excel/CSV to MySQL

Export Excel/CSV to MySQL

This package enables a user to export Excel or CSV files to MySQL. This package automatically parses the data types. Creates a new database with the specified name or uses the existing one. Creates a new table with the specified name or uses the existing one. Inserts all the records from the specified CSV/Excel into the table.

  • Python
  • Pandas
  • MySQL

View Code         View in pypi.org

Skills

Certifications