Credibility Corpus with several datasets (Twitter, Web database) in French and English

Name: Credibility Corpus with several datasets (Twitter, Web database) in French and English
Creator: nicolas turenne
License: http://www.opendefinition.org/licenses/cc-by

Descripción

Description of the corpora

The set of these datasets are made to analyze ifnormation credibility in general
(rumor and disinformation for English and French documents),
and occuring on the social web.
Target databases about rumor, hoax and disinformation helped to
collect obviously misinformation. Some topic (with keywords) helps us to made corpora from the micrroblogging
platform Twitter, great provider of rumors and disinformation.

1 corpus describes Texts from the web database about rumors and disinformation.
4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).
4 corpora from Social Media Twitter randomly built (2 in English, 2 in French).
4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).

Size of different corpora :

Social Web Rumorous corpus: 1,612

French Hollande Rumorous corpus (Twitter): 371
French Lemon Rumorous corpus (Twitter): 270
English Pin Rumorous corpus (Twitter): 679
English Swine Rumorous corpus (Twitter): 1024

French 1st Random corpus (Twitter): 1000
French 2st Random corpus (Twitter): 1000
English 3st Random corpus (Twitter): 1000
English 4st Random corpus (Twitter): 1000

French Rihanna Event corpus (Twitter): 543
English Rihanna Event corpus (Twitter): 1000
French Euro2016 Event corpus (Twitter): 1000
English Euro2016 Event corpus (Twitter): 1000

A matrix links tweets with most 50 frequent words

Text data :

_id : message id
body text : string text data

Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain)
11,102 lines (each line is a message)

Hidalgo corpus: lines range 1:75
Lemon corpus : lines range 76:467
Pin rumor : lines range 468:656
swine : lines range 657:1311

random messages : lines range 1312:11103

Sample contains :
French Pin Rumorous corpus (Twitter): 679
Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain)
189 lines (each line is a message)

Author

nicolas turenne

This dataset has been published on the initiative and under the responsibility of nicolas turenne
Published on 1 de diciembre de 2016 and updated on 1 de diciembre de 2016

Latest update

1 de diciembre de 2016

Licencia

Creative Commons Attribution

Metadata quality

66.66666666666666/100

Update frequency not set

Spatial coverage not set

6 Main files

Matrix Twitter Rumorous Blogs - Words

Updated on 1 de diciembre de 2016

rar (39.7KB)

105 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-122408/matrix.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/554da648-f51f-4418-91f1-1b7d2a7a7113
sha1: f314b136154b1c0082372a8dd3b8a6b4aed52b37
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 39.7KB

Description of the corpora

Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain)
11,102 lines (each line is a message)

Hidalgo corpus: lines range 1:75
Lemon corpus : lines range 76:467
Pin rumor : lines range 468:656
swine : lines range 657:1311

random messages : lines range 1312:11103

Event corpora (Twitter)

Updated on 1 de diciembre de 2016

rar (75.3KB)

72 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-121959/CorpusEventTwitter.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/685b5bee-1215-4c38-b599-438657116522
sha1: 73309f436219a12ea969117401eb26cccbf255b3
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 75.3KB

Description of the corpora

Random corpora (Twitter)

Updated on 1 de diciembre de 2016

rar (207.3KB)

92 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-121924/CorpusRandomTwitter.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/b2a188f6-e618-4c42-a5c5-59002f986d3f
sha1: 912830986f75b9ddf5cc6e9765b6102e5211c531
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 207.3KB

Description of the corpora

Credibility Rumorous corpora (Twitter)

Updated on 1 de diciembre de 2016

rar (100.0KB)

87 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-121850/CorpusRumorTwitter.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/0cd1cc7e-5ea3-42f6-a7b6-a5df2790199e
sha1: b05dfc40f77dcc105678706c58d9bd839f9c471d
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 100.0KB

Description of the corpora

Credibility Web Database Rumorous corpus

Updated on 1 de diciembre de 2016

rar (664.4KB)

71 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-121756/CorpusRumorDatabase.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/7f14efd9-0016-4b45-9ff5-519f70e4dcf9
sha1: eeabab9d06afe46ab8ef74d1bbfbbec94f420e93
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 664.4KB

Description of the corpora

Sample Corpus of credibility (Twitter)

Updated on 1 de diciembre de 2016

rar (32.5KB)

70 downloads

URL: https://static.data.gouv.fr/resources/credibility-corpus-with-several-datasets-twitter-web-database-in-french-and-english/20161201-121642/Corpus-credibility.rar
Permalink: https://www.data.gouv.fr/es/datasets/r/e64044af-a3bf-4be5-8295-ac6811fd72a5
sha1: 1351be262594ec2ab0949017c62b1e83c309ae11
MIME Type: application/rar

Created on: 1 de diciembre de 2016
Modified on: 1 de diciembre de 2016

Tamaño: 32.5KB

Description of the corpora

Sample contains :
French Pin Rumorous corpus (Twitter): 679
Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain)
189 lines (each line is a message)

There are no reuses for this dataset yet.

Publish a reuse What's a reuse ?

There are no community resources for this dataset yet.

Share your resources Learn more about the community

Information

Licencia

Creative Commons Attribution

ID

5840066288ee38426dc65bb3

Temporality

Creation

1 de diciembre de 2016

Frequency

Desconocido

Cobertura temporal

2006/01/01 to 2015/07/01

Latest update

1 de diciembre de 2016

Actions

Embed

<div data-udata-dataset="5840066288ee38426dc65bb3"></div><script data-udata="https://www.data.gouv.fr/" src="https://static.data.gouv.fr/static/oembed.js" async defer></script>

Statistics for the year

Download traffic metrics as CSV

Views

108 in abr 2024

Downloads

220

14 in abr 2024

Credibility Corpus with several datasets (Twitter, Web database) in French and English

Descripción

Author

Latest update

Licencia

Metadata quality:

Metadata quality

Information

Etiquetas

Licencia

ID

Temporality

Creation

Frequency

Cobertura temporal

Latest update

Actions

Embed

Statistics for the year

Views

Downloads

Reuses of this dataset

Followers