Bank Statement Automation
I've been trying over the last couple of years to get my financial life in order, and part of that has meant finding bank statements from the different providers and ensuring that they're filed away on the off chance that I need them (surprise: I needed them). To do that, I've spent some time building up a small library of Hazel scripts, and I thought I'd share them with the world now.
Hazel is an amazing tool, but I'm not going to re-hash how awesome it is here. You can find a myriad of articles waxing poetic about Hazel. I'm here to share some of the rules and related scripts that I've been generating over the last while, in hopes that maybe someone else will find some use out of them.
These screenshots should help get Hazel set up and monitoring your folders properly.
Note of course that this doesn't work with every single provider; for example, I have a credit card with Lowe's (they give a pretty good discount) that can't get the date of the statement, and so fails. Even trying with OCR (tesseract, etc) the contents of the file are simply too difficult to work with reliably. It would be awesome if there were some kind of requirement for all of the files generated to have some specific formatting (at least around provider name, account number, statement date, etc), but that seems as unlikely as me winning the lottery.
I've made a change to the way dates are handled that should hopefully work to determine the statement date. Essentially, we attempt to parse anything with numbers to see if it looks like a date, and then work from there. It worked on the Lowe's statement from above but also on a Discover statement which had also proven difficult.
There's a provider_map
in the code below that will allow you to override the found name of the statement provider; if you find that the provider found in the file is not to your liking, you can set an additional entry, and manually re-run the classifier below to update the path. There are also failsafes, so that (for example) either the provider or the statement date cannot be found, the file will be moved, and a record added to a failure log file so that the the statement can be researched further.
Finally, this tool requires pdftotext
and terminal-notifier
- pdftotext
(I believe) comes installed on MacOS by default; terminal-notifier
is a ruby gem that gives system notifications, and can be installed via gem
- I'm open to improvements to the system.
So, without further ado...
#!/bin/bash shopt -s nocasematch set -e set -x dest=~/Documents/Personal/Statements error_log="$dest/fail.log" ## Bash 3 hack for not having associative arrays provider_map=( "PennyMacUSA.com:Penny Mac" "lowes:Lowes" ) dryrun=0 provided_file="$1" shift while [[ $# -gt 0 ]]; do key="$1" case $key in -d|--dryrun) dryrun=1 shift # past argument ;; -v|--verbose) set -v shift ;; *) # unknown option echo "Unsupported argument ${key}" exit 1 ;; esac done # Try to get the provider name from the contents of the file pdftextcontents=$( pdftotext -f 1 "$provided_file" - 2>/dev/null ) function log_and_exit() { message=$1 provided_file=$2 dest=$3 if [ $dryrun -ne 1 ]; then mv "$provided_file" "$dest" fi echo "[ $( date +"%F %H:%M:%S" ) ] Couldn't determine ${message} in ${provided_file}; moving to ${dest} and bailing" | tee -a $error_log | terminal-notifier -title "Error handling bank statement ${provided_file}" -timeout 60 echo "$pdftextcontents" | tee -a $error_log echo "" | tee -a $error_log echo "------------------------------------------------------------------------------------------" | tee -a $error_log exit 1 } function provider_exit() { provided_file=$1 dest=$2 log_and_exit "provider" "$provided_file" "$dest" } function statement_exit() { provided_file=$1 dest=$2 found_date=$3 if [ ! -z "$found_date" ]; then echo "Found date ${found_date}, which is invalid" fi log_and_exit "statement date" "$provided_file" "$dest" } provider=$( echo "$pdftextcontents" | grep 'www\.' | head -n 1 ) # No provider found; sometimes, the first line of the pdf # contains a provider that we can use if [ -z "$provider" ]; then provider=$( echo "$pdftextcontents" | head -n 1 ) fi if [ -z "$provider" ]; then provider_exit "$provided_file" "$dest" fi # This allows us to change the mapping of a provider from # what may have been found to something more helpful to us for mapping in "${provider_map[@]}"; do _found_provider=${mapping%%:*} _mapped_provider=${mapping#*:} if [[ "$provider" == *"${_found_provider}"* ]]; then provider="${_mapped_provider}" break fi done dest="$dest/${provider}" mkdir -p "$dest"/ ## this code attempts to parse any value that comes in as a date ## anything that fails generates no output. any date that is in ## the future generates no output. only things that are in the ## past will generate anything statement_dt=$( echo "${pdftextcontents}" | tr ' ' '\n' | sort | uniq | grep '\d' | grep -E '[/-]' | xargs -I{} php -r 'try { $f = (new \DateTime("{}")); $d = $f->diff((new \DateTime())); if ($d->days > 0 && $d->invert == 1) { } else { echo $f->format("Y-m\n"); } } catch (\Exception $e) { }' | sort -hr | head -n 1 ); if [ -z "$statement_dt" ]; then statement_exit "$provided_file" "$dest" fi formatted_dt=$( echo "${statement_dt}" | php -r '$dt=fgets(STDIN); echo date("Y-m", strtotime($dt));' ) if [ $dryrun -ne 1 ]; then mkdir -p "$dest"/ mv "$provided_file" "$dest"/"$formatted_dt".pdf else echo "Would mv $provided_file $dest/$formatted_dt.pdf" fi