Major Updates to the Registry

April 28, 2020, 8:22 p.m. | jashley

If you've wondered why the Registry hasn't been updated for awhile, there's a reason. Over the past several months we've updated the infrastructure hosting the site to something that will accommodate future growth; corporate crime doesn't seem to be stopping. In technical terms we've migrated from a home-grown collection of Python scripts that generate a static, flat-file website to Django, a Python web framework with a SQL back-end. While the site looks the same, these changes will enable us to input and update records faster and provide an easier path to doing fun things with the data like visualizations and machine learning projects. Along with the updates to the site's infrastructure the following goodies are also available:

  • NAICS codes for nearly all companies in the database. This was a great suggestion from a NY attorney whose name has been lost. It took awhile but it's done.

  • An expanded set of fields, particularly for deferred and non-prosecution agreements, to sort on. Basically, many of these fields previously had "Yes/No" and an explanation contained within a single entry. These two things now have their own columns which should make sorting easier.

  • A dedicated page for downloading nearly all data on the site. You can download current CSV/Excel and JSON files.

  • While it looked we'd stepped away, the new site was both under construction and constantly maintained, meaning the freshest data is now available with this release. Run, don't walk, to the "Downloads" page or the "Browse" page.

Looking forward, here's where work on the Registry is headed:

  • We've submitted another FOIA request for a baker's dozen of NPAs that have not been made publicly available. Those records will be posted if and when they become available. Hopefully, it won't turn out like this or this. Special thanks to the Reporter's Committee for Freedom of the Press and the Univ. of Virginia First Amendment Clinic, particularly Jennifer Nelson and Shira Mendelson, for their work and expertise with this.

  • Related to the above, we've requested a list of all deferred and non-prosecution agreements the DOJ has struck. The DOJ typically (?) reveals information about these agreements in a press release or as part of a court filing. The NPA's mentioned above were identified this way. However, it's not clear that the DOJ always releases information about the agreements it strikes and this FOIA request is an attempt to answer that question.

  • We've downloaded all case information for all district courts for all cases in which the U.S.A. was a party. It's a large dataset and we'll be using some machine learning techniques to extract entity names from a sea of individual names. When complete, we'll add it to the database.

  • An expanded set of visualizations to help make sense of the data.

  • Implementing a better search for the website. The "Browse" page has a number of delimiters for searching but there's currently no good way to search across the entire site. The old way was too frustrating and will not be named. However, you can find them in the Registry for other crimes. Or Google.

  • Likely a refresh of the site's graphics and overall design. It's been awhile.

We view the Corporate Prosecution Registry as a public resource meant to foster research and debate on a very particular corner of law. We welcome suggestions on improving the site and you can find our contact information below.