Journal of Peking University(Health Sciences) ›› 2018, Vol. 50 ›› Issue (2): 352-357. doi: 10.3969/j.issn.1671-167X.2018.02.025

• Article • Previous Articles     Next Articles

Construction of chemical information database based on optical structure recognition technique

LV Chuan-yu, LI Ming-na, ZHANG Liang-ren, LIU Zhen-ming△   

  1. (State Key Laboratory of Natural and Biomimetic Drugs, Peking University School of Pharmaceutical Sciences, Beijing 100191, China)
  • Online:2018-04-18 Published:2018-04-18
  • Contact: LIU Zhen-ming E-mail:zmliu@bjmu.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China (21772005, 21572010) and Peking University Seed Fund for Medicine-Information Interdisciplinary Research Project (BMU20160579)

Abstract: Objective: To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Methods: Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. Results: A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. Conclusion: This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.

Key words: Scientific literature, Optical structure recognition, Data mining, Chemical information database

CLC Number: 

  •  
[1] . [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 941-944.
[2] . [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 945-封三.
[3] WANG Xiao-hui, ZHANG Yan, LIU Lin-zhi, SHANG Chen-guang . Effects of metformin and adiponectin on endometrial cancer cells growth [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 767-773.
[4] SUN Jing, SONG Wei-dong, YAN Si-yuan, XI Zhi-jun. Chloroquine inhibits viability of renal carcinoma cells and enhances sunitinib-induced caspase-dependent apoptosis [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 778-784.
[5] WU Tian-wei, CUI Rong, ZHANG Bao-xu. Determination of 8-methoxypsoralen in mouse plasma by high performance liquid chromatography and its application to pharmacokinetic study [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 792-796.
[6] LI Yan, WANG Hui, DENG Ying, YAO Yao, LI Min. Effect of dexmedetomidine on supraclavicular brachial plexus block: a randomized double blind prospective study [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 845-849.
[7] SUI Hua-xin, LV Pei-jun, WANG Yong, FENG Yu-chi. Effects of low level laser irradiation on the osteogenic capacity of sodium alginate/gelatin/human adipose-derived stem cells 3D bio-printing construct [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 868-875.
[8] YANG Yin-jie, HOU Ben-xiang, HOU Xiao-mei. Effect of autoclave on surface microstructure and cyclic fatigue resistance of R-phase rotary instruments#br# [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 882-886.
[9] LONG Yun-zi,LIU Si-yi, LI Wen, DONG Yan-mei. Physical and chemical properties of pulp capping materials based on bioactive glass [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 887-891.
[10] CHAI Jin-you, LIU Jian-zhang, WANG Bing, QU Jian, SUN Zhen, GAO Wen-hui, GUO Tian-hao, FENG Hai-lan, PAN Shao-xia. Evaluation of the fabrication deviation of a kind of milling digital implant surgical guides#br# [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 892-898.
[11] PENG Li, WANG Zu-hua, SUN Yu-chun, QU Wei,HAN Yang, LIANG Yu-hong. Computer aided design and three-dimensional printing forapicoectomy guide template [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 905-910.
[12] YAN Wen-juan, ZHENG Jia-jia, CHEN Xiao-Xian. Application of fluoride releasing flowable resin in pit and fissure sealant of children with early enamel caries [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 911-914.
[13] HE Na, YAN Ying-ying, YING Ying-qiu, YI Min, YAO Gai-qi, GE Qing-gang, ZHAI Suo-di. Individualized vancomycin dosing for a patient diagnosed as severe acute pancreatitis with concurrent extracorporeal membrane oxygenation and continuous veno-venous hemofiltration therapy: a case report [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 915-920.
[14] GUO Xiao-yue, SHAO Hui, ZHAO Yang-yu. A case of systemic lupus erythematosus in pregnancy complicated by pulmonary hypertension [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 928-931.
[15] . [J]. Journal of Peking University(Health Sciences), 2018, 50(5): 937-940.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Author. English Title Test[J]. Journal of Peking University(Health Sciences), 2010, 42(1): 1 -10 .
[2] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 188 -191 .
[3] . [J]. Journal of Peking University(Health Sciences), 2009, 41(3): 376 -379 .
[4] . [J]. Journal of Peking University(Health Sciences), 2009, 41(4): 459 -462 .
[5] . [J]. Journal of Peking University(Health Sciences), 2010, 42(1): 82 -84 .
[6] . [J]. Journal of Peking University(Health Sciences), 2007, 39(3): 319 -322 .
[7] . [J]. Journal of Peking University(Health Sciences), 2007, 39(3): 333 -336 .
[8] . [J]. Journal of Peking University(Health Sciences), 2007, 39(3): 337 -340 .
[9] . [J]. Journal of Peking University(Health Sciences), 2007, 39(3): 225 -328 .
[10] . [J]. Journal of Peking University(Health Sciences), 2007, 39(4): 346 -350 .