Journal of Peking University(Health Sciences) ›› 2018, Vol. 50 ›› Issue (2): 368-374. doi: 10.3969/j.issn.1671-167X.2018.02.028

• Article • Previous Articles     Next Articles

A retrieval method of drug molecules based on graph collapsing

QU Jing-wei1, LV Xiao-qing1,2△, LIU Zhen-ming3, LIAO Yuan1, SUN Peng-hui1, WANG Bei1, TANG Zhi1,2   

  1. (1.Institute of Computer Science & Technology, Peking University, Beijing 100080, China; 2. State Key Laboratory of Digital Publishing Technology, Beijing 100080, China; 3. Department of Medical Chemistry, Peking University School of Pharmaceutical Sciences, Beijing 100191, China)
  • Online:2018-04-18 Published:2018-04-18
  • Contact: LV Xiao-qing E-mail:lvxiaoqing@pku.edu.cn
  • Supported by:
    Supported by National Natural Science Foundation of China (61573028, 61673029), Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology) and Peking University Seed Fund for Medicine-Information Interdisciplinary Research Project (BMU20160579)

Abstract: Objective: To establish a compact and efficient hypergraph representation and a graph-similarity-based retrieval method of molecules to achieve effective and efficient medicine information retrieval. Methods: Chemical structural formula (CSF) was a primary search target as a unique and precise identifier for each compound at the molecular level in the research field of medicine information retrieval. To retrieve medicine information effectively and efficiently, a complete workflow of the graph-based CSF retrieval system was introduced. This system accepted the photos taken from smartphones and the sketches drawn on tablet personal computers as CSF inputs, and formalized the CSFs with the corresponding graphs. Then this paper proposed a compact and efficient hypergraph representation for molecules on the basis of analyzing factors that directly affected the efficiency of graph matching. According to the characteristics of CSFs, a hierarchical collapsing method combining graph isomorphism and frequent subgraph mining was adopted. There was yet a fundamental challenge, subgraph overlapping during the collapsing procedure, which hindered the method from establishing the correct compact hypergraph of an original CSF graph. Therefore, a graph-isomorphism-based algorithm was proposed to select dominant acyclic subgraphs on the basis of overlapping analysis. Finally, the spatial similarity among graphical CSFs was evaluated by multi-dimensional measures of similarity. Results: To evaluate the performance of the proposed method, the proposed system was firstly compared with Wikipedia Chemical Structure Explorer (WCSE), the state-of-the-art system that allowed CSF similarity searching within Wikipedia molecules dataset, on retrieval accuracy. The system achieved higher values on mean average precision, discounted cumulative gain, rank-biased precision, and expected reciprocal rank than WCSE from the top-2 to the top-10 retrieved results. Specifically, the system achieved 10%, 1.41, 6.42%, and 1.32% higher than WCSE on these metrics for top-10 retrieval results, respectively. Moreover, several retrieval cases were presented to intuitively compare with WCSE. The results of the above comparative study de-monstrated that the proposed method outperformed the existing method with regard to accuracy and effectiveness. Conclusion: This paper proposes a graph-similarity-based retrieval approach for medicine information. To obtain satisfactory retrieval results, an isomorphism-based algorithm is proposed for dominant subgraph selection based on the subgraph overlapping analysis, as well as an effective and efficient hyper-graph representation of molecules. Experiment results demonstrate the effectiveness of the proposed approach.

Key words: Information storage and retrieval, Molecular structure, Graph structure, Hypergraph, Algorithms

CLC Number: 

  •  
[1] WANG Si-wei, LI Min, YANG Hui-fang, ZHAO Yi-jiao, WANG Yong, LIU Yi1. Evaluation of three methods for constructing craniofacial mid-sagittal plane based on the cone beam computed tomography [J]. Journal of Peking University(Health Sciences), 2016, 48(2): 330-335.
[2] LIU Wen-Long, WANG Lu-Man, HE Dong-Qi, ZHANG Tian-蓝, GOU Bao-Di, LI Qing. Molecular structure and fractal analysis of oligosaccharide [J]. Journal of Peking University(Health Sciences), 2014, 46(5): 739-743.
[3] YAO Yi-Sang, GAO Ling, LI Yu-Ling, MA Shao-Li, WU Zi-Mei, TAN Ning-Zhi, WU Jian-Yong, NI Lu-Qun, ZHU Jia-Shi. Amplicon density-weighted algorithms for analyzing dissimilarity and dynamic alterations of RAPD polymorphisms of Cordyceps sinensis [J]. Journal of Peking University(Health Sciences), 2014, 46(4): 618-628.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!