The eurlex
R package reduces the
overhead associated with using SPARQL and REST APIs made available by
the EU Publication Office and other EU institutions. Compared to pure
web-scraping, the package provides more efficient and transparent access
to data on European Union laws and policies.
See the vignette for a basic walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this paper.
You can use eurlex
to create automatically updated
overviews of EU decision-making activity, as shown here.
Install from CRAN via install.packages("eurlex")
.
The development version is available via
remotes::install_github("michalovadek/eurlex")
.
Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150
@article{ovadek2021facilitating,
= {Ovádek, Michal},
author = {Facilitating access to data on European Union laws},
title = {2021},
year = {Political Research Exchange},
journal = {3},
volume = {1},
number = {Article No. 1870150},
pages = {https://doi.org/10.1080/2474736X.2020.1870150}
url }
The eurlex
package currently envisions the typical
use-case to consist of getting bulk information about EU legislation
into R as fast as possible. The package contains three core functions to
achieve that objective: elx_make_query()
to create
pre-defined or customized SPARQL queries; elx_run_query()
to execute the pre-made or any other manually input query; and
elx_fetch_data()
to fire GET requests for certain metadata
to the REST API.
The function elx_make_query
takes as its first argument
the type of resource to be retrieved (such as “directive” or “any”) from
the semantic database that powers Eur-Lex (and other publications)
called Cellar. If you are familiar with SPARQL, you can always specify
your own queries and execute them with elx_run_query()
.
elx_run_query()
executes SPARQL queries on a
pre-specified endpoint of the EU Publication Office. It outputs a
data.frame
where each column corresponds to one of the
requested variables, while the rows accumulate observations of the
resource type satisfying the query criteria. Obviously, the more data is
to be returned, the longer the execution time, varying from a few
seconds to several hours, depending also on your connection. The first
column always contains the unique URI of a “work” (usually legislative
act or court judgment) which identifies each resource in Cellar. Several
human-readable identifiers are normally associated with each “work” but
the most useful one tends to be CELEX,
retrieved by default.
# load library
library(eurlex)
# create query
<- elx_make_query("directive", include_date_transpos = TRUE)
query
# execute query
<- elx_run_query(query) results
One of the most useful things about the API is that we obtain a
comprehensive list of identifiers that we can subsequently use to obtain
more data relating to the document in question. While the results of the
SPARQL queries can also be useful for web-scraping, the function
elx_fetch_data()
makes it possible to fire GET requests to
retrieve data on documents with known identifiers (including Cellar
URI). The function for example enables downloading the title and the
full text of a document in all available languages.
This package nor its author are in any way affiliated with the EU, its institutions, offices or agencies. Please refer to the applicable data reuse policies.
Please consider contributing to the maintenance and development of the package by reporting bugs or suggesting new features.
elx_run_query()
now strips URIs
(except Eurovoc ones) by default and keeps only the identifier to reduce
object sizeelx_fetch_data()
is used to retrieve texts from
an html document, it now uses by default
rvest::html_text2()
instead of
rvest::html_text()
. This is slower but more resembling of
how the page renders in some cases. New argument
html_text = "text2"
controls the setting.elx_make_query(..., include_court_origin = TRUE)
retrieves
the country of origin of a court case. As per Eur-Lex documentation,
this is primarily intended to be the country of the national court
referring a preliminary question, but other countries are present in the
data as well at the moment. Recommended to interact with court
procedureelx_make_query(..., include_original_language = TRUE)
retrieves the authentic language of a document, typically a court
caseelx_make_query(include_... = TRUE)
are now properly
namedelx_make_query(include_citations_detailed = TRUE)
retrieves
additional details about the citation where available; the retrieval is
currently slowelx_make_query(include_directory = TRUE)
now retrieves
the directory code instead of URIelx_make_query(include_proposal = TRUE)
retrieves the CELEX of a proposal of a requested legal actelx_make_query()
no longer
include previous versions of the same record (new versions typically fix
incorrect or missing metadata)elx_fetch_data(type = "notice", notice = c("tree","branch", "object"))
now mirrors the behaviour of elx_download_xml()
but instead
of saving to path gives access to XML notice in Rinclude_
options in elx_make_query()
elx_download_xml()
parameter checkingelx_download_xml(notice = "object")
now retrieves
metadata correctlyGuide to CELEX numbers: https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
List of resource types in Cellar (NAL): http://publications.europa.eu/resource/authority/resource-type
NAL of corporate bodies: http://publications.europa.eu/resource/authority/corporate-body
Query builder: https://op.europa.eu/en/advanced-sparql-query-editor
Common data model: https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/cdm
SPARQL endpoint: http://publications.europa.eu/webapi/rdf/sparql