Quick Tutorial

Installation

Required Dependencies

!pip install rispy
!pip install pandas
!pip install matplotlib
!pip install seaborn
Requirement already satisfied: rispy in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (0.7.1)
Requirement already satisfied: pandas in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (1.3.2)
Requirement already satisfied: numpy>=1.17.3 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from pandas) (1.21.2)
Requirement already satisfied: pytz>=2017.3 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from pandas) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
Requirement already satisfied: matplotlib in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (3.4.3)
Requirement already satisfied: numpy>=1.16 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (1.21.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (1.3.2)
Requirement already satisfied: pillow>=6.2.0 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (8.4.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: pyparsing>=2.2.1 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: six>=1.5 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Requirement already satisfied: seaborn in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (0.11.2)
Requirement already satisfied: matplotlib>=2.2 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from seaborn) (3.4.3)
Requirement already satisfied: scipy>=1.0 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from seaborn) (1.7.2)
Requirement already satisfied: numpy>=1.15 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from seaborn) (1.21.2)
Requirement already satisfied: pandas>=0.23 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from seaborn) (1.3.2)
Requirement already satisfied: pillow>=6.2.0 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (2.8.2)
Requirement already satisfied: pyparsing>=2.2.1 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn) (2.4.7)
Requirement already satisfied: pytz>=2017.3 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from pandas>=0.23->seaborn) (2021.1)
Requirement already satisfied: six>=1.5 in /home/chaudharyubuntu/PycharmProjects/systematic-reviewpy/venv/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn) (1.16.0)

installing the systematic-reviewpy

!python3 -m pip install systematic-reviewpy
ERROR: Could not find a version that satisfies the requirement systematic-reviewpy (from versions: none)
ERROR: No matching distribution found for systematic-reviewpy

google colab Jupyter notebook Instruction :
Ctrl m m will convert a code cell to a text cell.
Ctrl m y will convert a text cell to a code cell.

install pdftotext dependencies: Installing needed python pdf readers for validation and search count of pdf text.

Please run cell based on your OS and keep other cells as markdown.

##### Debian, Ubuntu, and friends
!sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
[sudo] password for chaudharyubuntu: 

Fedora, Red Hat, and friends

!sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

!brew install pkg-config poppler python

Windows using conda

!conda install -c conda-forge poppler

Install python pdf readers

## https://pypi.org/project/PyMuPDF/
!python -m pip install --upgrade pip
!python -m pip install --upgrade pymupdf
## https://pypi.org/project/pdftotext/
!pip install pdftotext

importing the systematic-reviewpy

import systematic_review

Most of the object contains methods like to_csv and to_excel to output files

Check documentation for more string manipulation methods :

  • preprocess_string (default and applied before all other implemented functions)

  • custom_text_manipulation_function : for putting your custom_text_manipulation_function function to preprocess the text

  • nltk_remove_stopwords

  • pattern_lemma_or_lemmatize_text

  • nltk_word_net_lemmatizer

  • nltk_porter_stemmer

  • nltk_lancaster_stemmer

  • spacy_lemma

  • nltk_remove_stopwords_spacy_lemma

  • convert_string_to_lowercase

  • preprocess_string_to_space_separated_words

Please provide name of string manipulation method.

string_manipulation_method = 'convert_string_to_lowercase'

Optional Converting and wrangling citation files

wrangling or modification of the citation files is required if there is format error while uploading files into reference manager.

#citation.csv_citations_to_ris_converter("./Data files and Python Code/Downloaded files/springer.csv", "./Data files and Python Code/Modified files/springer.ris")
#citation.remove_empty_lines("./Data files and Python Code/Downloaded files/entropy-v12-i12_20210610.ris", "./Data files and Python Code/Modified files/MDPI.ris")
#citation.edit_ris_citation_paste_values_after_regex_pattern("./Data files and Python Code/Modified files/MDPI.ris", "./Data files and Python Code/Modified files/mdpi.ris")
#import os
#os.remove("./Data files and Python Code/Modified files/MDPI.ris")

Citations

All files are uploaded to mendeley reference manager, updated using mendeley database, and downloaded in ris format.

Please provide the path of the folder that contains all citations ris files.

CITATIONS_FILES_PARENT_DIR_PATH = "./Data files and Python Code/Articles_by_sources"
citations = systematic_review.citation.Citations(CITATIONS_FILES_PARENT_DIR_PATH)
citations_df = citations.get_dataframe()
citations_df

Search Words

Please provide the path of search_words.json or make keyword dictionary.

systematic_review.search_count.SearchWords().get_sample_keywords_json()

Edit the template based on your need and provide the file path in cell below. if filename and location is not changed no need to change anything.

#KEYWORDS_JSON_FILE_PATH = "./Data files and Python Code/keywords.json"
SEARCH_WORDS_JSON_FILE_PATH = "./sample_search_words_template.json"
search_words = systematic_review.search_count.SearchWords(SEARCH_WORDS_JSON_FILE_PATH, string_manipulation_method)
print(search_words.value)

Search and count words in citations

citations_search_words_count = systematic_review.search_count.SearchCount(citations_df, search_words, string_manipulation_method)
citations_search_words_count_df = citations_search_words_count.get_dataframe()
citations_search_words_count_df

citations_search_words_count.to_csv(“./Data files and Python Code/OutputFiles/citations_keywords_count_df.csv”)

Sort and Filter the citations

Please provide how many research papers needed.

# Filter the citations to required number
required_citations_number = 500
filter_sorted_citations = systematic_review.filter_sort.FilterSort(citations_search_words_count_df, search_words, required_citations_number)
filter_sorted_citations_df = filter_sorted_citations.get_dataframe()
print(len(filter_sorted_citations_df))

filter_sorted_citations.to_csv(“./Data files and Python Code/OutputFiles/filter_sorted_citations_df.csv”)

Research paper files

Downloading above selected pdf from databases.

This is completed with browser-automationpy

Validating the downloaded articles

Please provide parent directory path of all downloaded research papers.

DOWNLOADED_ARTICLES_PATH = "./Data files and Python Code/downloadedArticles"

Please provide path of text file containing names of research papers separated by new line OR write None.

IN_ACCESSIBLE_ARTICLES_TEXT_FILE_PATH = "./Data files and Python Code/not_accessible_articles.txt"
validation = systematic_review.validation.Validation(filter_sorted_citations_df, DOWNLOADED_ARTICLES_PATH, IN_ACCESSIBLE_ARTICLES_TEXT_FILE_PATH)
validated_research_papers = validation.get_dataframe()
validation.info()

validation.to_csv(“validation.csv”)

Search and count the research papers files.

research_paper_search_words_count = systematic_review.search_count.SearchCount(validated_research_papers, search_words, string_manipulation_method)
research_paper_search_words_count_df = research_paper_search_words_count.get_dataframe()

research_paper_search_words_count.to_csv(“./Data files and Python Code/OutputFiles/pdf_keywords_count_df.csv”)

Filter and sort pdf counted df

Please provide how many research papers needed.

required_full_text_documents = 100
filter_sorted_research_papers = systematic_review.filter_sort.FilterSort(research_paper_search_words_count_df, search_words, required_full_text_documents)
selected_review_articles_df = filter_sorted_research_papers.get_dataframe()

filter_sorted_research_papers.to_csv(“./Data files and Python Code/OutputFiles/selected_review_articles_df.csv”)

Generating research papers review files:

choose any of following

  • sorted based on sources: to make it easier to find articles in folder.

sorted_Finaldf = systematic_review.filter_sort.sort_dataframe_based_on_column(selected_review_articles_df, 'source')
#sorted_Finaldf.to_csv("./Data files and Python Code/OutputFiles/sorted_Finaldf.csv")
  • Creating the sample literature review file:
    by adding review columns to enter details manually. The keywords counts are not required at this point of the time, so they are dropped.

selected_citation = systematic_review.citation.drop_search_words_count_columns(sorted_Finaldf, search_words)
selected_citation_review = systematic_review.analysis.creating_sample_review_file(selected_citation)

selected_citation_review.to_csv(“./Data files and Python Code/OutputFiles/selected_citation_review.csv”)

Sytematic Review Workflow diagram and info

my_analysis = systematic_review.analysis.SystematicReviewInfo(CITATIONS_FILES_PARENT_DIR_PATH, filter_sorted_citations_df,
                 validated_research_papers, selected_review_articles_df)
my_analysis.info()
my_analysis.systematic_review_diagram()

Analysis

Analysis needed

Fact table

Diagram

The number of articles

yes

no

Period of the publications

yes

yes

Number of authors

yes

no

Articles with single authors

yes

no

Articles per authors

yes

no

Authors per articles

yes

no

Top N countries with the highest number of articles

yes

yes

Top N journals with the highest number of articles

yes

yes

Top N keywords most used in the articles

yes

yes

The year with the highest number of articles

yes

yes

my_cite_analysis = systematic_review.analysis.CitationAnalysis(sorted_Finaldf)
my_cite_analysis.publication_year_info()
my_cite_analysis.publication_year_diagram()
my_cite_analysis.authors_info()
my_cite_analysis.publication_place_info()
my_cite_analysis.publication_place_diagram()
my_cite_analysis.keywords_info()
my_cite_analysis.keyword_diagram(top_result=10)
my_cite_analysis.publisher_info()
my_cite_analysis.publisher_diagram()