The data is in the following format

Comment label

Intha padam vantha piragu yellarum Thala ya kondaduvanga positive

Tamil-English: 15744 comments, Train: 11,335 Validation: 1,260 and Test: 3,149

Malayalam-English: 6,739 comments, Train: 4,851 Validation: 541 and Test: 1,348

We present Tamil-English and Malayalam-English, a dataset of YouTube video comments. The dataset contains all the three types of code-mixed sentences -- Inter-Sentential switch, Intra-Sentential switch and Tag switching. Most comments were written in Roman script with either Tamil / Malayalam grammar with English lexicon or English grammar with Tamil / Malayalam lexicon. Some comments were written in Tamil / Malayalam script with English expressions in between.

Malayalam trail data: https://drive.google.com/file/d/1a7oq6rUMsjIMbBzwsN2jQfCZYcfmhc6_/view?usp=sharing

Tamil trail data: https://drive.google.com/file/d/1XWCVWKGFEhdQQ1S87kAegkPL36NZIAV5/view?usp=sharing

To get full data register at Codalab link: CodaLab link

More details about the dataset are in the papers "A Sentiment Analysis Dataset for Code-Mixed Malayalam-English" and "Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text"