The corpus describes damage of insects and diseases on crops (wheat, wine...).
corpus contains 41,000 documents. 17,000 were published from 1960 till 2000 of medium quality about text recognition.
Each file contains level of risk about crop from a region of France. Texts are in French

size of document corpus : 40,899
size of document sample : 37 (from different region of France, with different crops)

size of the corpus (txt format) in octets: 457 Mb
size of the corpus (pdf format) in octets: 37 Gb

metadata for each file:

_id: name of the file
region: name of a French region (example: Alsace)
crops: list of crop names (example: wheat)
diseases: list of diseases names (example: oidium)
insects: list of insects names (for example: puceron noir)
risk: patterns of risk (example : "12% of fields")
town: list of cities (example: Dijon)
date: date of publication of the document
pesticides: list of pesticides (exmple: d.d.t.)

The database contains :
cited areas: 27
cited insects: 389
cited diseases: 279
cited pesticids: 727
cited crops: 122

Recursos

Ecology Crop Disease Newsletter Corpus - PDF format

Description of the corpus The corpus describes damage of insects and diseases on crops (wheat, wine...). corpus contains 41,000 documents. 17,000 were published from 1960 till…

EcologySample.rar

File contains 37 documents (txt and pdf format for each one) and one file of extracted entities for each file

Expedientes comunitarios

¿Usted ha constuído un conjunto de datos más comprensible que los que aparecen aquí? ¡Es momento de compartilo!

Reutilizaciones

¿Usted reutiliza estos datos y publica un artículo, una infografía o una aplicación? ¡Es hora de darlo a conocer! Referencie su trabajo en tan sólo unos clicks y aumene su visibilidad.

Discusiones

Discusión entre la organización y la comunidad sobre este conjunto de datos.