Journal of Peking University(Health Sciences) ›› 2018, Vol. 50 ›› Issue (2): 256-263. doi: 10.3969/j.issn.1671-167X.2018.02.010

• Article • Previous Articles     Next Articles

A customized method for information extraction from unstructured text data in the electronic medical records

BAO Xiao-yuan1,2, HUANG Wan-jing3, ZHANG Kai4, JIN Meng1,2, LI Yan2,5, NIU Cheng-zhi6△   

  1. (1. Medical Informatics Center, Peking University, Beijing 100191, China; 2. National Clinical Service Data Center, Beijing 100191, China; 3. School of Mathematical Sciences, Peking University, Beijing 100871, China; 4. Peking University School of Basic Medical Science, Beijing 100191, China; 5. Department of Hospital Management, Peking University Health Science Center, Beijing 100191, China; 6. Department of Information, the First Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China)
  • Online:2018-04-18 Published:2018-04-18
  • Contact: NIU Cheng-zhi E-mail:nczfkb@126.com
  • Supported by:
    Supported by the Peking University Seed Fund for Medicine-Information Interdisciplinary Research Project (BMU20140434)

Abstract: Objective: There is a huge amount of diagnostic or treatment information in electronic me-dical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing. Methods: Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method. Results: For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94. Conclusion: This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.

Key words: Medical records systems, computerized, Access to Information, Diabetes mellitus, Medical history taking

CLC Number: 

  • R319
[1] XU Xin-ran,HUO Peng-cheng,HE Lu,MENG Huan-xin,ZHU Yun-xuan,JIN Dong-si-qi. Comparison of initial periodontal therapy and its correlation with white blood cell level in periodontitis patients with or without diabetes mellitus [J]. Journal of Peking University (Health Sciences), 2022, 54(1): 48-53.
[2] Zi-jing WANG,Zai-ling LI. Characteristics of gastric microbiota in children with Helicobacter pylori infection family history [J]. Journal of Peking University (Health Sciences), 2021, 53(6): 1115-1121.
[3] WU Jun-hui,CHEN Hong-bo,WU Yi-qun,WU Yao,WANG Zi-jing,WU Tao,WANG Meng-ying,WANG Si-yue,WANG Xiao-wen,WANG Jia-ting,YU Huan,HU Yong-hua. Prevalence and risk factors of osteoarthritis in patients with type 2 diabetes in Beijing, China from 2015 to 2017 [J]. Journal of Peking University (Health Sciences), 2021, 53(3): 518-522.
[4] FAN Li-shi,GAO Min,Edwin B. FISHER,SUN Xin-ying. Factors associated with quality of life in 747 patients with type 2 diabetes in Tongzhou District and Shunyi District of Beijing [J]. Journal of Peking University (Health Sciences), 2021, 53(3): 523-529.
[5] GUO Hong-ping,ZHAO Ai,XUE Yong,MA Liang-kun,ZHANG Yu-mei,WANG Pei-yu. Relationship between nutrients intake during pregnancy and the glycemic control effect in pregnant women with gestational diabetes mellitus [J]. Journal of Peking University (Health Sciences), 2021, 53(3): 467-472.
[6] CHEN Ping,LI Ze-ming,GUO Yi,SUN Xin-ying,Edwin B. FISHER. To explore medication adherence of patients with type 2 diabetes mellitus using the latent profile analysis based on the Big Five personality theory [J]. Journal of Peking University (Health Sciences), 2021, 53(3): 530-535.
[7] Hong-yu GAO,Jing-ling XU,Huan-xin MENG,Lu HE,Jian-xia HOU. Effect of initial periodontal therapy on blood parameters related to erythrocyte and platelet in patients with type 2 diabetes mellitus and chronic periodontitis [J]. Journal of Peking University (Health Sciences), 2020, 52(4): 750-754.
[8] Jiao-niu DUAN,Wei DU,Rui-hong HOU,Ke XU,Gai-lian ZHANG,Li-yun ZHANG. Progressive necrosis of lipid: A case report [J]. Journal of Peking University(Health Sciences), 2019, 51(6): 1182-1184.
[9] Jiao HE,Ge-heng YUAN,Jun-qing ZHANG,Xiao-hui GUO. Approach to creating early diabetic peripheral neuropathy rat model [J]. Journal of Peking University(Health Sciences), 2019, 51(6): 1150-1154.
[10] Jiang XIE,Fei LI. Association of sleep overlap syndrome with type 2 diabetes in a cross-sectional study [J]. Journal of Peking University(Health Sciences), 2019, 51(2): 252-255.
[11] LI Xin, WANG Xin, WU Di, CHEN Zhi-bin, WANG Meng-xing, GAO Yan-xia, GONG Chun-xiu, QIN Man. Interleukin-1β and C-reactive protein level in plasma and gingival crevicular fluid in adolescents with diabetes mellitus [J]. Journal of Peking University(Health Sciences), 2018, 50(3): 538-542.
[12] YANG Chao, WANG Jin-wei, YANG Yao-zheng, BAI Kun-hao, GAO Bi-xia, ZHAO Ming-hui, ZHANG Lu-xia, WU Shou-ling, WANG Fang. Impact of anemia and chronic kidney disease on the risk of cardiovascular disease and all-cause mortality among diabetic patients [J]. Journal of Peking University(Health Sciences), 2018, 50(3): 495-500.
[13] LIU Sheng-lan, NA He-ya, LI Wei-hao, YUN Qing-ping, JIANG Xue-wen, LIU Jing-nan, CHANG Chun. Effectiveness of self-management behavior intervention on type 2 diabetes based on self-determination theory#br# [J]. Journal of Peking University(Health Sciences), 2018, 50(3): 474-481.
[14] ZHANG Xu-xi, WU Shi-yan, WANG Feng-bin, Mayinuer YUSUFU, SUN Kai-ge, HU Kang, ZHANG Xing, SUN Xin-ying, Edwin B. FISHER. Association between social support and self-management behaviors among patients with diabetes in community [J]. Journal of Peking University(Health Sciences), 2017, 49(3): 455-461.
[15] REN Qiao-meng, WANG Li-min, PENG Dan-lu, GUO Yan. Influence of awareness on the behaviors of Chinese adults with diabetes mellitus [J]. Journal of Peking University(Health Sciences), 2017, 49(3): 451-454.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] . [J]. Journal of Peking University(Health Sciences), 2009, 41(4): 456 -458 .
[2] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 125 -128 .
[3] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 135 -140 .
[4] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 158 -161 .
[5] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 217 -220 .
[6] . [J]. Journal of Peking University(Health Sciences), 2009, 41(1): 52 -55 .
[7] . [J]. Journal of Peking University(Health Sciences), 2009, 41(1): 109 -111 .
[8] . [J]. Journal of Peking University(Health Sciences), 2009, 41(3): 297 -301 .
[9] . [J]. Journal of Peking University(Health Sciences), 2009, 41(5): 599 -601 .
[10] . [J]. Journal of Peking University(Health Sciences), 2009, 41(5): 516 -520 .