This project is to prepare a subsection of US Tax code for Natural language translation for either https://catala-lang.org/ or ErgoAI of Coherent Knowledge http://coherentknowledge.com
We hope to get initial basic translations from US Tax code into either Catala-Lang or ErgoAI within some threshold of acceptability. Once we get the basic translations within some threshold we want to see if a local startup can complete the translations by engaging humans with expertise in the US Tax code.
This very early stage startup (Neutral Tax Networks, Greenwich CT) is in the formative stage and has a patent in this area while developing other intellectual property.
Link to House of Representatives site where US code is located. Internal Revenue Code is Title 36 (part way down on this list): https://uscode.house.gov/browse/prelim@title26/subtitleA/chapter1/subchapterA/part1&edition=prelim
Here is the IRS website page that contains the links to the internal revenue code and regulations that are provided as a public service by Cornell law schools Legal Information Institute: https://www.irs.gov/privacy-disclosure/tax-code-regulations-and-official-guidance#irc
Link to the internal revenue code sections provide dry Cornell’s Legal Information Institute (accessible by clicking on one of the links on the IRS website): https://www.law.cornell.edu/uscode/text
Project Information Subsection
- All legal tax code XML from https://www.irs.gov/…, https://uscode.house.gov or or https://www.law.cornell.edu/uscode/text transformed into English tax code.
- A validated algorithmic mapping from the English legal tax to a format for storing in a relational database.
- Store all the legal tax code in a relational database (MySQL) using the mapping that is suitable for translation to ErgoAI or Catala-lang
{Empty}
They should be able learn Python or know how to code in Python or similar language.
They should be able to learn to parse XML with Python.
They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )
{Empty}
Some hands-on experience
{Empty}
University of Connecticut - Stamford
Stamford, Connecticut
CR-Yale
{Empty}
Yes
Already behind3Start date is flexible
6
{Empty}
12/08/2021
{Empty}
06/08/2022
Milestone Title: Capture tax code Milestone Description: Pulling all legal tax code XML from https://www.irs.gov/…, https://uscode.house.gov or or https://www.law.cornell.edu/uscode/text
Milestone Title: Transform tax code to English Milestone Description: Transform all XML into English legal tax code text out of the IRS XML tax-code
Milestone Title: Design organizational mapping Milestone Description: Validate a useful organizational mapping so the English legal tax code text is stored in a relational database (MySQL). This will likely require NLP processing of the tax code to make it suitable for ErgoAI or Catala-lang. This is the first part of the threshold of acceptability.
Milestone Title: Apply organizational mapping Milestone Description: Apply the organizational mapping to all of the tax code.
Milestone Title: Store mapped tax code Milestone Description: Store the mapped legal tax code in a relational database (MySQL)
Milestone Title: Leveraging mapped tax code for deduction Milestone Description: Validate the mapped legal tax code can be expressed as basic terms in ErgoAI or Catala-lang. This is the final part of the threshold of acceptability for the tax code translation.
{Empty}
{Empty}
{Empty}
The student will learn how to parse, transform (using XSLT), and store the transformed data.
The student will learn to parse the English tax code using Python NLP library such as NLTK.
The student will learn to organize the legal text for storage to make retrieval easy and mapping easy to either ErgoAI or Catala-lang.
The student will learn some data architecture.
This transformed/organized tax code will be stored in a relational database such as MySQL.
The student will learn SQL and how to interact with a relational database through a database workbench.
The student will learn how to work with a relational database from a language like Python.
If there is time, the student will learn about deduction in ErgoAI or Catala-lang.
{Empty}
{Empty}
No clear need for HPC.
Though the tax code is substantial so there is a possibility the NLP application may require a good deal of CPU cycles.
{Empty}
Final Report
This project had a solid impact on understanding automated knowledge authoring for legal reasoning. We explored a number of ways to simply transform legal text into logical reasoning in ErgoAI (a variation of Prolog).
{Empty}
{Empty}
Yes - both positive impact on our student, Krutika Patel, as well as positive impact on managing student research.
{Empty}
{Empty}
{Empty}
Yes - there is an impact towards technology transfer. Besides leadership by Phil Bradford, this project was done with a Connecticut entrepreneur (Henry Orphys) as well as a faculty member (Paul Fodor) from Stonybrook University. Henry has a distinguished law and tax accounting background and he is focused on launching a startup using the technology we explored. Paul is both a faculty member as well as an entrepreneur. We isolated several challenges and better understand the resources necessary for launching a product in this space.