The data is in the following format:
| Comment | Label |
|---|---|
| actor na pudar surendra bantwal.thulunaddha maryadi depwer | non-offensive |
| Language | Train | Development | Test | Total |
|---|---|---|---|---|
| Tamil | 35,139 | 4,388 | 4,392 | 43,919 |
| Malayalam | 16,010 | 1,999 | 2,001 | 20,010 |
| Kannada | 6,217 | 777 | 778 | 7,772 |
| Tulu | 2,692 | 577 | 576 | 3,845 |
The classification systems’ performance will be measured in terms of macro averaged precision, macro averaged recall, and macro averaged F-Score across all the classes. Participants are encouraged to check their system with Scikit-learn's classification report:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
Participants are required to submit the predicted data in a tab-separated single file named predictions.csv. The predictions.csv file should have columns named ID (if it is there in the dataset) and class label (predictions).
To get the full data, register at Codabench: https://www.codabench.org/competitions/8494/