Add first parser

This commit is contained in:
IamTheFij 2018-02-02 11:23:52 -08:00
parent 1d0d1b5cc2
commit 05182a65fe
6 changed files with 123 additions and 0 deletions

49
Design.md Normal file
View File

@ -0,0 +1,49 @@
# Email Parser Design
## Purpose
A service that can read emails from an IMAP inbox and extract data and send it to other services
## Functionality
* Extract tracking numbers and send to a package tracking service
* Extract flight numbers and confirmations and send to trip tracking service
* Extract dates and events and send to a calendar service
## Secondary services
### Package Tracking service
* Receive tracking info via API and store in database
* Web interface for viewing current status of all packages
* Filters by date and status
* ical subscription for arrival dates
### Flight Tracking Service
* Receive tracking info via API and store in database
* Web interface for viewing current status of all flights
* Filters by date and status
* ical subscription for flight times
## Architecture 1: Micro-services
A single service to read email content and send the email content to a list of parser services.
A parser service would conform to an interface with an API that accepts an email with several attributes: sender, recipients, subject, body, datetime. It would then extract some attribute and send it to a tracking service. The attribute would be something like the tracking number for a flight or package and a calender time for an event.
A tracking service would accept this info and store in it's database. It would then provide a front end to this data via a website and an ical calendar URL. It may also be possible to abstract further the schemas and interface such that all trackers share a common infrastructure but make unique requests to metadata services.
## Architecture 2: Micro-services
Scanner service to scan emails and send to indexer. Indexer receives the email contents and makes requests to the parser services. Parser services respond with extracted text and the indexer will insert them into the database. The indexer also exposes a restful api on top of the data model.
Viewing services would use the restful API to display content and expose additional metadata.
## Architecture 3: Message based queue
Scanner scans emails and inserts task into a queuing service (RabbitMQ with a fanout). Multiple parsers read from these queues and attach extracted data and make requests to the indexing service for storage. Front end services sit on top of a restful api on the database.
## Useful packages
Golang package for extracting numbers and carriers from unstructured text: https://github.com/lensrentals/trackr
Python package for retrieving status from a tracking number: https://github.com/alertedsnake/packagetracker
Ruby gem for extracting shipping info from a number or unstructured text: https://github.com/jkeen/tracking_number
Ruby gem for retrieving tracking info based on an ID: https://github.com/travishaynes/trackerific

6
docker-compose.yml Normal file
View File

@ -0,0 +1,6 @@
version: '2'
services:
parser_package_tracking:
build: ./parsers/package-tracking
ports:
- "8183:3000"

22
parsers/Readme.md Normal file
View File

@ -0,0 +1,22 @@
# parsers
A parser should conform to a simple API spec so that it can be easily accessed
# Healthcheck
Simple endpoint that accepts nothing and returns 'OK' on success.
|Path |`/`|
|Method |`GET`|
|Response|`"OK"`|
# Parse
The primary endpoint that will parse a message
|Path |`/parse`|
|Method |`POST`|
|Response|`json`|
Response
|Key |Example Value |Description|
|--------|----------------------|-----------|
|token |`"1Z879E930346834440"`|String token that was extracted|
|type |`"SHIPPING"` |A string that indicates what type of metadata that was extracted. This will be used by other services to understand what kind of data this is.|
|metadata|`{"carrier": "UPS"}` |A dictionary with any other additional metadat that may be used by other services|

View File

@ -0,0 +1,14 @@
FROM ruby:2.5.0
# TODO: Move to Gemfile
RUN gem install sinatra -v 2.0
RUN gem install tracking_number -v 0.10.3
EXPOSE 3000
RUN mkdir -p /src
WORKDIR /src
COPY main.rb /src/
CMD ruby main.rb

View File

@ -0,0 +1,6 @@
version: '2'
services:
main:
build: .
ports:
- "127.0.0.1:8183:3000"

View File

@ -0,0 +1,26 @@
require 'sinatra'
require 'tracking_number'
set :bind, "0.0.0.0"
set :port, 3000
# Simple status endpoint on root
get '/' do
'OK'
end
# Standard parser api receives PUT {"message": "Email body"} /parse
# Returns [{"token": "extracted token", "type": "token type", "metadata": {}]
post '/parse' do
body = JSON.parse(request.body.read)
trackers = TrackingNumber.search(body["message"])
results = []
for tracker in trackers do
results.push({
:token => tracker.tracking_number,
:type => "SHIPPING",
:metadata => {}
})
end
JSON.dump(results)
end