Articles Database Analysis#
Script written to query, analyze, and plot different research related databases.
This module presents functions for querying the number of published papers , or google searches, containing a user defined keyword, over a custom range of time, with subsequent plotting of this data, for demonstrating the interest evolution on this topic over the years.
The module workflow is based on my personal preferences, managing most of the datas as pandas dataframes, with the possibility of saving it as a csv, as well as reading a stored csv for analysis and plotting.
The code cells were designed based on a non-paid API key user, not being the most optimized case for the user possesing one of these paids APIs.
More specifically, this notebook presents interactions with:
Google Trends
Scopus Database
PubMed Database
API_Keys#
PubMed NCBI API key is optional, and the script was developed aiming to a workflow without the key. Feel free to dig further the API if it interests you.
PubMed NCBI Database
Fetch PubMed NCBI database, providing alternatives to analyze scientific articles interest over the time, on a specific subject.
The Scopus database requires the user to generate a free api, which requires an elsevier account.
Elsevier Scopus Database
Fetch Elsevier articles Scopus database, aiming data analysis of articles time evolution.
Springernature free api is required to use this database functions.
Springernature Database
Fetch Springernature database, aiming data analysis of articles subjects interest evolution.
Packages Installation#
All interactions with PubMed NCBI Database is done through the python package Metapub.
This package can be installed running the following on a terminal:
python -m pip install metapub
Other important requirements are present in the database_analysis module folder requirements.txt.
All interactions with Scopus Database is done through the python package Pybliometrics.
This DataBase interaction is done through the API key and the requests python package.
Client usage#
To generate a csv with published articles containing a keyword from 2000 t0 2023, with number of articles per month, run:
python cli.py --pubmed '<keyword>' 2000 2023 -o <output_path>
Note: PubMed database was created on january 1996
Default qurying interval if not inputted will be 1996-2023.
This notebook require the following packages to be installed to be fully executed:
python -m pip install pandas pytrends metapub matplotlib pybiometrics numpy
Basic workflow is executed by the client module, linke the following examples: