imageLarge

The LASLA Latin corpus is published Open Access under a CC-BY-NC-SA 4.0 license. The portion of the LASLA corpus published comprises ca 1.7 million tokens of works from the Classical period, manually annotated with the following information: lemma, Part-of-Speech, morphological features, partial syntactic information, and metadata.  The LASLA has ongoing annotation projects, the results of which will be uploaded to the Dataverses when they are finalised.

The corpus can be accessed in three Dataverses, each containing one specific format. We recommend using the “Tree View” to have an idea of what files can be found in the Dataverse.

  • DAT and APN are published with detailed documentation on the codes used and all the annotation choices implemented by the LASLA across the years. We hope that such documentation can support an optimal exploitation of the data by external researchers.
  • BPN files, which were previously shared with Data Transfer Agreements with external partners. Beyond documentation purposes, this  Dataverse also provides the original version on which the CoNLL-U format was based (see below).

The LASLA files can be exploited via (free) online interfaces: Opera Latina, which enables structured searches through the files; HyperbaseWeb (Latin bases), for which you find documentation here and here, and that does not require an account. HyperbaseWeb allows for the carrying out of complex statistical queries.

Following the Data Transfer Agreement for BPNs, an intense collaboration with the LiLa ERC project started. The output of this collaboration is the following:

  • The LASLA corpus is linked to the LiLa Knowledge Base and can be queried, jointly with all the other resources linked, via the LiLa Interactive Search Platform and SPARQL endpoint. The triples of the linking are published openly here.
  • The LiLa team has converted the BPN files into CoNLL-U files, enriching the annotation with the URIs of tokens and lemmas as they are found in the LiLa Knowledge Base. This version of the corpus can be found on Zenodo and Github.
modifié le 05/04/2024

Partagez cette page