Outta Harms Way |Willful Compassion

Frank Howd
13 min readJun 24, 2021
“The greatest nations are defined by how they treat their weakest inhabitants” — Jose Ramos
“The greatest nations are defined by how they treat their weakest inhabitants” — Jose Ramos

Tell me a little about the organization, the project, and the team please…

Over the course of the last month, I was given an opportunity to work on a meaningful project — Human Right First, Asylum Analysis. The Human Rights First (HRF) organization aims to help people who have fled dangerous situations in their home countries to obtain asylum in the United States. Human Rights First is a non-profit, nonpartisan, 501(c)(3), international human rights organization. Per HRF, “Asylum is a legal status that the U.S. government can grant to people who are at risk of harm in their home countries because of who they are — because of their religion, political opinion, sexual orientation, or ethnicity, for example — if the governments in their home countries will not protect them. We help people living in the greater Washington D.C., New York City, Los Angeles, and Houston metropolitan areas who do not already have legal representation, cannot afford an attorney, and need help with a claim for asylum or other protection-based form of immigration status.”

The Asylum Analysis project is an application being built out collaboratively by data scientists and web development teams in order to assist immigration attorneys and refugee representatives in advocating for clients in asylum cases. Our application uses data science to help identify patterns in judicial decisions, assist lawyers with gathering quick insights and drawing inferences from data, as well as assisting in the prediction of possible trial outcomes based on those patterns. Understanding how a specific judge has ruled based on certain grounds may add insight in attorneys trial preparation and increase the odds for a successful outcome. The hope is that advocates for asylum seekers can use our tools to tailor their arguments before a particular judge and maximize their client’s chances of receiving asylum.

Attorneys are invited as administrators to use the application and upload pdf files of prior cases that they have participated in. The application then uses Python and Tesseract for optical character recognition (OCR) to convert pdf images into text data that can be searched via natural language processing (NLP) techniques. Key data, which we refer to as structured fields, are extracted from the text data and sent to the back-end for storage. These fields are then plotted visually to quickly see relationships and draw inferences in the scraped data.

How’d I help out with so many participants working on the same goal?

My role on this team was a Data Scientist colleague, and I spent my time analyzing back-end data objects, and worked on the return of flexible dynamic visualizations to aid lawyers gain insights and draw inference during trial preparations. At times, I also acted as one of the data science team points-of-contact to project managers, stakeholders, and various other team-members during project reviews, stakeholder meetings, and stand-ups.

So what’s the problem? And what is the solution?

This project has been worked on iteratively over the course of 7 months, in one month cycles. Each cycle is comprised of a front-end team, a back-end team, as well as a data science team. During my one-month tenure working on the project, I worked intimately with colleagues from all of these teams. So many truly amazing people — but I digress…

By the time our cohort was introduced to the project, much of the application was mature. The conversion of pdf images into text data and structured fields was largely in place, but the visualizations being returned were limited to three graphs with hard coded static inputs, and the y-values were displayed as floating point values. My first task was to correct the visualizations from showing floats on the y-axis.

Appropriately ‘stepping’ the y-axis is important to the project. Lawyers want to draw inferences fast (their time is money, literally), and this modification to the codebase removes the ambiguity of the prior presentation. It lends to more efficient data interpretation. The users of this application are lawyers who may have up to 150 cases they are working on at a time, and may only have a couple of minutes to review the site for help in developing strategies. Readability matters when visualizing data, and this fix definitely aids in the readability and absorption rate of the presented data.

After a review of the codebase, a Trello card was created for the addressal of this issue, and I was eager for a win. I put my name on the Trello card, and communicated my interest in working on this code with my fellow team members. First step was to break the card into tasks and create a timeline for completion. I needed to gain an understanding of the tools currently being used, which required further review of the established codebase. I also needed to review associated package documentation before beginning my modification of the y-axis. I made the technical decision to test my code first in a Jupyter notebook, and I made the decision to create a dummy dataset for testing — as an aside, the dataset was also leveraged by other team members, so proved itself useful on multiple levels. Finally, I needed to add and commit my code to a branch on the repo, have it reviewed, and then merged into the codebase. I projected this to take 3 working days — one day for review of the codebase and documentation, one day for building a dummy dataset and testing ideas, and a final day for implementation and a code review by my peers. While finalizing my plan of attack, I added one additional day for unforeseen problems and unknown unknowns.

Since we, as a team, had previously decided to leverage the prior codebase, I started to examine the Plotly documentation to decide how to fix this issue. Plotly is the plotting package being used for the project, and is part of the existing data science architecture, so it was very important for me to familiarize myself with the library — I’d never used it in any capacity before. I next created the dummy dataset of test data that included columns of key data to be displayed in relation to a judge’s rulings — the judge’s outcomes being the y-axis and a specific factor plotted along the x-axis. My test dataset included some of the structured fields found in the key data, and when they were populated, I was sure to include some NaN values to help me assess edge case responses. I had something to test my ideas — awesome. I next imported the existing code base into a Jupyter notebook and began breaking the code down in sections so that I could further gain understanding of what was done previously. I then started to utilize what I learned from reading the Plotly docs and found some arguments to apply that I thought would correct the issue.

When the function was fully running, and with the y-axis properly stepped in my Jupyter notebook, I created a branch on the repo, and committed the notebook. I then zeroed in on updating the codebase. Once this was completed I made yet another commit to the main data science repo, and then created a pull request that was accepted. It felt good to merge that branch into Main.

Here is the a peek at original codebase —

And here are the changes I made to correct the problems with the y-axis —

This solved the problem with the y-axis being displayed as floating point values.

But wait, there’s more dear reader…

My next point of business was to act as the data science point-of-contact during our first stakeholder meeting. The stakeholders provided us with additional background about the project, about the functionality of the site’s current iteration, and what they would like to see before the final production push. Upon questioning the stakeholders, it was discovered what meant the most to them — they wanted the flexibility to select the factors that meant the most to them individually, the factors that helped them understand the likelihood of outcomes for individual cases they were currently litigating.

I heard loud and clear that there was additional desired functionality that would help them better prepare for trial. I created a card on the Trello board that day with this User story — “As a User, I can pick key metrics to have visualized to aid in my trial preparation.”

As a result, the teams decided to collectively aim higher and stretch our goals —we all wanted to work together and try to make an impact on such an important project. We now wanted key data to be graphed dynamically, based on a User’s filtered selections, allowing for improved insights to aid them with their trial preparations. The aim was also to allow for either stacked bar charts, or traditional bar charts, depending on User preference. We also wanted to give the User an ability to filter the content that would end up visualized by hearing type — was it an initial hearing, or was this ruling from an appellate court?

Whoa — a lot more to do.

I won’t go into as much detail about Trello cards and task ideation, as I’ve bored you with this process already. For the sake of the your time, I will run through what happened from a slightly higher altitude.

My role, along with another data science colleague, was to participate in the implementation of dynamic visualizations, based on User defined filters. Rather than return three charts with static x and y-axis values, we were to return a single object to the back-end, one that may have any number key data points plotted along the x and y-axes, data points determined and selected by the User. The User selections move from the front-end to the back-end, ultimately hitting a data science API endpoint. From here the data is interpreted, and turned into a visualization. The the process is then reversed, moving from the data science endpoint to the back-end, and ultimately the end result is displayed to the User on the front-end. Want to see trial outcomes plotted against gender? How about they asylum seekers language versus trial outcome? How about protected grounds versus hearing outcomes? Maybe you wanted to look at things differently and plot ‘type-of-violence’ versus ‘indigenous-group’. The idea was to make the visualizations a LOT more flexible, adding more impact, more insight, an allowing the User to draw additional inferences from the key data that was gained from the data science OCR tool.

The two of us set out to work. We bounced around many ideas, and trialed many more. We broke down the data science needs into tasks to be executed over a two week period. This afforded us 3 additional days in case we ran into problems, a cushion if you will. I started with further reading of the Plotly documentation and riffed on ideas inside a Jupyter notebook. I understood that we needed a way to accept any of the structured text fields included in the PostgrSQL database, so I worked on ways to accept these fields for our visualization generation.

I attempted working with a strategy that looped over the available structured fields, searching for User specified key data. I was making some progress, but it was a slow memory intensive process. And there was also a problem with testing — we were waiting to receive a POST request object from the back-end, and they were having problems sending us that data. My data science colleague then conceived and tried out an idea which utilized a GET rather than POST request — this allowed the data science API to query the back-end database directly. It absolutely worked and was done with amazing execution; however, this approach still did not allow for the dynamism we were striving for, and we truly wanted to delight the User.

We had hit a massive blocker. We were starting to get worried and we were at a standstill. I next arranged a couple of meetings with the the Data Science Project Lead who offered some guidance to the task at hand. One morning he shared with me an example of something he had built that was similar to what we were trying to accomplish— it was over morning coffee before our work began that day. He first provided me with feedback about my code, and where it needed work. To put it more bluntly, he advised that I was looking the wrong way towards an answer and should just, well, start over. I had been trying to leverage the existing codebase, and his advisement was to start from scratch. I found this feedback frustrating at first, but as he kindly walked me through his thinking I was not only able to understand what he was saying, but I came around to his perspective. Ultimately, I was grateful for his feedback and insights into the matter. His feedback was constructive and helpful, and it helped me grow as a professional. I was becoming as flexible as the visualizations we were trying to create.

I asked the Data Science Project Lead if he would share these insights with the colleague I was working with, and he graciously shared his time again. We ultimately made the decision to run with the ideas he proposed. We decided that while there were risks about finishing on time, there were also risks tied to trying to continue down the path we were on — circling around an idea without it ever coming to fruition. After assessing the costs and benefits of our situation, we decided to make the technical decision to dump the portion of established codebase related to visualizations and start from scratch. We ultimately repurposed the essence of his ideas, and I was able to successfully merge a pull request that allowed for all of the functionality we were hoping to include.

The User now has the ability to plot any of the OCR structured fields along either the x or y axes, to help them draw inference as they see fit. They now also have the option to visualize data as a stacked bar chart or as a standard bar chart. And the User is now allowed to specify outcomes from initial hearings or appellate decisions.

Here is a copy of the original code with static visualization returns —

Here is the updated code that I successfully merged to the main repository —

So why am I so excited writing this?

Asylum attorneys who work with Human Rights First are now able to specify what they want to see visually, and this helps them draw inferences to prepare for trial with a specific ruling judge the way they see fit. We no longer return to them what we think is most helpful. The User will now have the functionality they desire, which will allow them to draw more meaningful inferences that aid them in their trial preparation, and ultimately help them succeed in achieving safe asylum for disadvantaged people who have been put in harm’s way.

Challenges ahead…

The amount of data we had available during my time on the project was sparse, and, so, graphical representations were created using small amounts of data — never fun from a data science perspective. Having only a small set of data to draw inference from creates challenges. One is the possibility of sample bias associated with small data pools. Until the project has enough users to generate enough observations to be meaningful, a user might see a plot based on a limited number of data points, and draw inferences from an incomplete dataset — this may not be helpful, and may even lead them to making assumptions that may not be accurate. With only a small amount of data currently available, it is possible that the collected data doesn’t accurately represent reality. The best solution is to make a lot more data available. Until there is a meaningful amount of data included in the database, it might be wise to include a warning, or notice, showing the number of cases used to create the visualizations. This will lend some additional perspective and hopefully avoid a User from drawing inference when there is not enough data to do so. As the user base grows, and case uploads increase, this should improve and become a lesser challenge over time.

Takeaways

My time working on the Human Rights First Asylum Analysis project was very, very rewarding. I have learned to deal with Fast API endpoints, and use a visualization package I had no prior experience working with. I became more proficient with Python. I had the opportunity to not only work collaboratively on a team, but I was also able to work with teams from different disciplines. I was fortunate enough to make some new friends, receive positive feedback, and also deliver feedback to my fellow colleagues. Everyone brought different skillsets to the table — at times I was contributing, at times I was grateful for the other team members by my side, at times I was inspired, and at other times I was humbled. In the end, I am ever grateful for the experience, and grateful for the opportunity to work on a project that may bring some benefit to others in the world. We truly live in the golden age.

--

--

Frank Howd

Studying Data Science at Lambda School | UCONN Alumni