The data is in the following format:
Comment label
actor na pudar surendra bantwal.thulunaddha maryadi depwer non-offensive
Language | Train | Development | Test | Total |
---|---|---|---|---|
Tamil | 35,139 | 4,388 | 4,392 | 43,919 |
Malayalam | 16,010 | 1,999 | 2,001 | 20,010 |
Kannada | 6,217 | 777 | 778 | 7,772 |
Tulu | 2,692 | 577 | 576 | 3,845 |
The classification systems’ performance will be measured in terms of macro averaged precision, macro averaged recall, and macro averaged F-Score across all the classes. Participants are encouraged to check their system with Scikit-learn's classification report:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
Participants are required to submit the predicted data in a tab-separated single file named predictions.tsv
. The predictions.tsv
file should have two columns named Comment (text) and class label.
To get the full data, register at Codalab: To be announced