Syntax COVID-19 Analysis
The syntax analysis included 75889 abstracts from 100754 published articles. Last update in 2021-03-01 by LitCovid. Most of them were Journal Article (59.39%), Journal Article-Review (8.83%), Letter (6.8%), Journal Article-Research Support, Non-U.S. Gov’t (3.86%) and Editorial (3.28%).
Graphic 1. Daily (red) and cumulative (green) publication about COVID-19
United States (16.2%), China (8.07%), United Kingdom (7.28%), Italy (5.94%) and India (5%) were the main source of scientific literature. About 42.49% of articles analyzed came from these five countries.
Map 1. COVID-19 literature source. This map was generated by tmap (version 3.0) and sp (version 1.4-1) R packages
PlatCOVID performed 4 descriptive syntax analysis in these abstracts:
(1) Word atomization all abstracts
(2) Categorization based on word atomization
(3) Word atomization of each category
(4) Sentece atomization and Human Literature Curation
Using the atomization process, 224033 words/terms were found. 28220 commom words were execluded, remaining 195813 words. The table bellow shows the top 10 terms. All words are availible at supplemantary informations.
Box 1. Top 10 Words cited in Abstracts in COVID-19 literature.
Word | Frequency |
---|---|
pandemic | 54183 |
disease | 49770 |
health | 44194 |
during | 36519 |
study | 33816 |
infection | 32435 |
clinical | 30975 |
care | 29374 |
severe | 28116 |
respiratory | 26438 |
Our analysis suggests that the scientific focus, until now, has been to summarize the main clinical symptoms of COVID-19 (terms: respiratory, clinical, severe, acute, pneumonia, syndrome and symptoms, fever, chest and lung). It is also possible to infer that many articles were driven to describe the virus spreading (terms: novel, severe, virus, outbreak epidemic and spread). The other scientific efforts discussed were about the transmission, prevention, treatment, health care management and diagnosis of SARS-CoV-2 and COVID-19.
Categorization Process: The 5 classes of Science Interest
Based on global words tokenization/atomization from abstracts, we categorized the COVID-19 studies in five categories: (1) clinical & signs & symptoms, (2) epidemiology, (3) transmission, (4) treatment and (5) diagnosis (Fluxogram 1). The categorization process used the Mesh and DeCS terms list.
Fluxogram 1. Workflow of categorization. Click on the square to follow the information.
65 articles fit all categories. The articles acess on PMIDs: 32112886, 32278065, 32317810, 32347772, 32362969, 32397688, 32447742, 32499983, 32603887, 32605194, 32605661, 32623083, 32636542, 32811406, 32840614, 32881628, 32957928, 32989413, 33014150, 33014984, 33175702, 33186230, 33199136, 33374759, 33442244, 32145185, 32183901, 32185921, 32220177, 32228809, 32271601, 32300673, 32357503, 32442265, 32442720, 32475877, 32498762, 32506768, 32532933, 32534188, 32565599, 32584236, 32591667, 32641059, 32647672, 32679582, 32702935, 32729367, 32730095, 32754600, 32764417, 32773409, 32774008, 32790891, 32934940, 33005276, 33062082, 33080715, 33240881, 33363098, 33490198, 33493922, 33537362, 32297723, 33318893.
Venn 1. Categorizations of abstracts.
Then, we peformed the words atomization from abstracts of each categories. Acess to view all words atomization report in each category.
Box 2. Top 10 Words/terms atomization of each category.
Diagnose (n) | Treatment (n) | Epidemiology (n) | Transmission (n) | Signs (n) |
---|---|---|---|---|
disease (8280) | treatment (20574) | disease (4055) | transmission (10980) | disease (35523) |
diagnosis (7701) | disease (17844) | clinical (3032) | disease (6306) | pandemic (33958) |
clinical (6654) | pandemic (14019) | epidemiological (2852) | pandemic (5586) | clinical (30890) |
pandemic (6233) | clinical (13684) | health (2763) | infection (5386) | health (28256) |
infection (5770) | severe (11375) | infection (2688) | health (5226) | study (24810) |
study (4948) | care (11135) | pandemic (2579) | during (4266) | during (23741) |
respiratory (4612) | infection (11104) | severe (2069) | respiratory (4089) | infection (23191) |
health (4468) | respiratory (9839) | respiratory (2043) | virus (3847) | severe (20941) |
severe (4417) | health (9510) | study (1992) | risk (3820) | care (20286) |
during (4373) | during (9421) | risk (1700) | study (3345) | respiratory (19519) |
Finally, we peformed the tokenization sentece process from abstracts of each categories.
Frist, we colect the last 4 sentence of each abstract, assumed as the conclusion of the work, using pubmed.mineR. Around 3.03556^{5} conclusion sentences were achivied.
Second, we extract the sentece context of each category term, previously used, by tokenizer. 8542, 16570, 1366, 6844 and 24056 senteces were retrivied, about diagnosis, treatment, epidemiology, transmission and clinical, sings and symptoms, respectivelly. Articles with no context sentence were excluded.
Third, we began the human curation process (Fluxogram 2):
Fluxogram 2. Human curation process from PlatCOVID based on 5 categories. Click on the square to follow the information.