新闻类别数据集根据标题和简短说明识别新闻类型
该数据集包含从HuffPost获得的2012年至2018年的大约20万条新闻标题。在该数据集上训练的模型可用于识别未跟踪新闻文章的标签或识别不同新闻文章中使用的语言类型。
内容
每个新闻标题都有一个相应的类别。类别和相应的文章计数如下:
-
POLITICS
:32739
-
WELLNESS
:17827
-
ENTERTAINMENT
:16058
-
TRAVEL
:9887
-
STYLE & BEAUTY
:9649
-
PARENTING
:8677
-
HEALTHY LIVING
:6694
-
QUEER VOICES
:6314
-
FOOD & DRINK
:6226
-
BUSINESS
:5937
-
COMEDY
:5175
-
SPORTS
:4884
-
BLACK VOICES
:4528
-
HOME & LIVING
:4195
-
PARENTS
:3955
-
THE WORLDPOST
:3664
-
WEDDINGS
:3651
-
WOMEN
:3490
-
IMPACT
:3459
-
DIVORCE
:3426
-
CRIME
:3405
-
MEDIA
:2815
-
WEIRD NEWS
:2670
-
GREEN
:2622
-
WORLDPOST
:2579
-
RELIGION
:2556
-
STYLE
:2254
-
SCIENCE
:2178
-
WORLD NEWS
:2177
-
TASTE
:2096
-
TECH
:2082
-
MONEY
:1707
-
ARTS
:1509
-
FIFTY
:1401
-
GOOD NEWS
:1398
-
ARTS & CULTURE
:1339
-
ENVIRONMENT
:1323
-
COLLEGE
:1144
-
LATINO VOICES
:1129
-
CULTURE & ARTS
:1030
-
EDUCATION
:1004