Open Bilbio corpus for content analysis

Description

Description of the corpus

The corpus describes fulltexts publication in sciences (mathemtaics, computing, statistics) in LATEX or TXT format.
They are published in open access.

Purprose to use this corpus is twice :

  • information extraction (for instance: extract all collocations around a target word, or extract methods names)
  • comparison of abstract and body text

size of publication corpus : 650,000
size of publication sample : 20

data :

body string text data

Author

This dataset has been published on the initiative and under the responsibility of nicolas turenne
Published on December 1, 2016 and updated on December 2, 2016

Latest update

October 12, 2023

License

Creative Commons Attribution

Metadata quality
77.77777777777779/100

Spatial coverage not set

Some files are unavailable

There are no reuses for this dataset yet.

Publish a reuse What's a reuse ?

There are no discussions for this dataset yet.

There are no community resources for this dataset yet.

Share your resources Learn more about the community

Information

Tags

ID

5840026288ee383a2cc65bb3

Temporality

Creation

December 1, 2016

Frequency

Biannual

Temporal coverage

1994/01/01 to 2014/07/01

Latest update

October 12, 2023

Actions

Embed

Statistics for the year

Views

554

34 in Apr 2024

Downloads

37

1 in Apr 2024

Reuses of this dataset

0

Followers

0