Dataset

The data is in the following format:

Comment label
actor na pudar surendra bantwal.thulunaddha maryadi depwer non-offensive

Languages

Language Train Development Test Total
Tamil 35,139 4,388 4,392 43,919
Malayalam 16,010 1,999 2,001 20,010
Kannada 6,217 777 778 7,772
Tulu 2,692 577 576 3,845

Evaluation Plan

The classification systems’ performance will be measured in terms of macro averaged precision, macro averaged recall, and macro averaged F-Score across all the classes. Participants are encouraged to check their system with Scikit-learn's classification report:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Participants are required to submit the predicted data in a tab-separated single file named predictions.tsv. The predictions.tsv file should have two columns named Comment (text) and class label.

To get the full data, register at Codalab: To be announced