Case Study

Urdu Sentiment Analysis — FYP & Publication

CNNRNNPythonNLPTensorFlow

The Challenge

Urdu is one of the world's most spoken languages but severely underrepresented in NLP research. Existing sentiment analysis tools perform poorly on Urdu text due to limited labeled datasets, the language's morphological complexity, and right-to-left script handling requirements.

The Solution

  • Hybrid CNN-RNN Architecture: Combined convolutional layers for local feature extraction with recurrent layers for sequential context — outperforming either architecture alone on Urdu text classification.
  • Custom Preprocessing Pipeline: Handled Urdu-specific tokenization, normalization, and script encoding to address the unique challenges of right-to-left text.
  • Evaluation Rigor: Evaluated on a curated Urdu sentiment dataset; results submitted and accepted for peer review at the Journal on Artificial Intelligence, 2024.

Key Results

  • 96% F1 score on the Urdu sentiment classification benchmark.
  • Published in the Journal on Artificial Intelligence, 2024.
  • Final Year Project — Bachelor of Science in Software Engineering, City University Peshawar.

Project Details

CategoryDeep Learning / NLP
RoleLead Developer