Pyresparser

A simple resume parser used for extracting information from resumes

Features

Getting Started

Installation

pip install pyresparser
# spaCy
python -m spacy download en_core_web_sm

# nltk
python -m nltk.downloader words

Usage

from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file').get_extracted_data()

Result

The module would return a list of dictionary objects with result as follows:

[
  {
    'college_name': ['Marathwada Mitra Mandal’s College of Engineering'],
    'company_names': None,
    'degree': ['B.E. IN COMPUTER ENGINEERING'],
    'designation': ['Manager',
                    'TECHNICAL CONTENT WRITER',
                    'DATA ENGINEER'],
    'email': '[email protected]',
    'mobile_number': '8087996634',
    'name': 'Omkar Pathak',
    'no_of_pages': 3,
    'skills': ['Operating systems',
              'Linux',
              'Github',
              'Testing',
              'Content',
              'Automation',
              'Python',
              'Css',
              'Website',
              'Django',
              'Opencv',
              'Programming',
              'C',
              ...],
    'total_experience': 1.83
  }
]

Supported Resume File Formats

Advanced Options

Explicitly specifying skills file

Pyresparser comes with built-in skills file that defaults to many technical skills. You can find the default skills file here.

For extracting data against your specified skills, create a CSV file with no headers.

from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file', skills_file='/path/to/skills.csv').get_extracted_data()

Explicitly providing regex to parse phone numbers

While pyresparser parses most of the phone numbers correctly, there is a possibility of new patterns being added in near future. Hence, we can explicitly provide the regex required to parse the desired phone numbers. This can be done using

from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file', custom_regex='pattern').get_extracted_data()