The data is in the following format
Comment label
Intha padam vantha piragu yellarum Thala ya kondaduvanga positive
Tamil-English: Train: 35,657 Validation: 3,963 and Test: 4,403
Malayalam-English: Train: 15,889 Validation: 1,767 and Test: 1,963
Kannada-English: Train:6213 Validation:692 and Test: 768
We present Tamil-English, Kannada-English and Malayalam-English, a dataset of YouTube video comments. The dataset contains all the three types of code-mixed sentences Inter-Sentential switch, Intra-Sentential switch and Tag switching. Most comments were written in native script and Roman script with either Tamil / Malayalam / Kannada grammar with English lexicon or English grammar with Tamil / Malayalam / Kannada lexicon. Some comments were written in Tamil / Malayalam / Kannada script with English expressions in between.