Open Bilbio corpus for content analysis

This dataset has been published on the initiative and under the responsibility of nicolas turenne
Published on 1 de diciembre de 2016 and updated on 1 de diciembre de 2016

nicolas turenne

Informations

Licencia
Creative Commons Attribution
Cobertura temporal
1994/01/01 to 2014/07/01
Frequency
Bianual
Fecha de creación
1 de diciembre de 2016
Modification date
2 de diciembre de 2016
Latest resource update
1 de diciembre de 2016

Extras

ID
5840026288ee383a2cc65bb3
Fecha de creación
1 de diciembre de 2016
Modification date
2 de diciembre de 2016

Description of the corpus

The corpus describes fulltexts publication in sciences (mathemtaics, computing, statistics) in LATEX or TXT format.
They are published in open access.

Purprose to use this corpus is twice :

  • information extraction (for instance: extract all collocations around a target word, or extract methods names)
  • comparison of abstract and body text

size of publication corpus : 650,000
size of publication sample : 20

data :

body string text data

Resources 2

See also: community resources
8 downloads

Open Biblio corpus - Science publication (whole dataset)

Disponible
zip (28.6Mo)

Description of the corpus

The corpus describes fulltexts publication in sciences (mathemtaics, computing, statistics) in LATEX or TXT format.
They are published in open access.

Purprose to use this corpus is twice :

  • information extraction (for instance: extract all collocations around a target word, or extract methods names)
  • comparison of abstract and body text

size of publication corpus : 650,000

Tipo
Main file
MIME Type
None
Created on
2 de diciembre de 2016
Modified on
2 de diciembre de 2016
Published on
1 de diciembre de 2016
1 downloads

Open Biblio corpus - scientific full-texts publications (sample)

Disponible
rar (14.6Mo)

Description of the corpus

The corpus describes fulltexts publication in sciences (mathemtaics, computing, statistics) in LATEX or TXT format.
They are published in open access.

Purprose to use this corpus is twice :

  • information extraction (for instance: extract all collocations around a target word, or extract methods names)
  • comparison of abstract and body text

size of publication corpus : 650,000
size of publication sample : 10

data :

body string text data

Sample corpus :
date range: 1981-1998
corpus type: scientific publications
publisher: elsevier
domains: Statistics, Chemistry, Environment, Biology, Computing
format: pdf, txt

sizeSample: 10

PageSample: 177

Language: English

Tipo
Main file
MIME Type
application/rar
sha1
8a46f7b07b743f2578746b799c414096fb93c454
Created on
1 de diciembre de 2016
Modified on
2 de diciembre de 2016
Published on
1 de diciembre de 2016

Embed

You can easily embed this dataset on your website by pasting this snippet in your html page.

Community resources 0

You have built a more comprehensive database than those presented here? This is the time to share it!

Reutilizaciones 0

You reused these data and published an article, a computer graphics, or an application? It's time to let you know! Reference your work in just a few clicks and increase your visibility.

Discussions 0

Discussion between the organization and the community about this dataset.