After being part of a fast-growing financial company for about a year at Merchant Advance Capital, I have come to accept the limitations when wanting to eagerly dive into data that is unique to the industry. Initially, it was frustrating to see that so many modelling practices and standards learned throughout my education could not simply be followed within the alternative business lending industry. As I slowly started to peel back at what I knew, and begun to open myself up more to things that I did not know in practice, I soon noticed that I needed to conform my attitude and skills towards what the company really needed from my role. I want to talk a little bit about what I have learned thus far and hope to reflect on these lessons so that they may help me push forward in becoming a better Data Scientist.

1. Applying data science is pointless if you don’t know the data you’re working with and how it relates to your problem at hand.

The bulk of Merchant Advance Capital’s alternative lending practices is providing loans to subprime businesses within Canada. Many of these businesses lack the collateral to successfully obtain loans from a bank or are considering quick and cheap alternatives for their business needs. One important thing to note here is that building models to predict risk levels of different businesses requires knowing exactly what kinds of businesses you are lending to. It is super easy and sometimes tempting to gather a bunch of business characteristics and immediately send them through a machine learning algorithm to obtain predictions. It is always better to carefully choose, craft and analyze these characteristics and ensure that the relationships drawn from them make intuitive business sense. Domain expertise is very crucial.

2. Refrain from using machine learning algorithms where you cannot fully interpret the relationship between business characteristics and your model predictions.

I had to learn this the hard way when several reporting issues came about through different avenues. One such avenue was within the operations department, where loan application administrators had a difficult time translating machine learning predictive outcomes to business owners and their respective sales representatives. As a result, there began to be a lack of trust within the scorecard regime. In the event that a loan is a rejected, these respective parties deserve a fair reason as to why they have been declined. If you were to build risk scorecards using black-box methods, more often than not, your predictions will be very hard to interpret from a characteristic-to-characteristic level. It would also be difficult to explain why a business owner scored a certain way if a sales representative demanded specific reason for decline.

3. Refrain from using machine learning algorithms where you cannot fully understand the costs and benefits of your model predictions.

When first developing a risk scorecard, little did I know how significantly involved its use would be within the core business of the company. The predictions of your machine learning model can translate into restrictions on product pricing and the promotion of certain products to different segmented populations. It is so important that the characteristics used to describe and understand your target population are quantifiable and make intuitive business sense. It could so happen that these characteristics will be a unique aspect of your customer base that generates the most money or generates the most loss.

4. There must be a balance between the implementation of machine learning algorithms and the use of them at the operational level.

One of the biggest hypes in data science is the ability to utilize, understand and process big data in a matter of minutes. Applied data scientists often face challenges that are operation-specific such as lack of data automation, collection and organization. In a subprime lending industry where the bulk of our customer base are somewhat technologically adverse, the simplest solution for loan applications is through e-mail and paper submissions.

With huge technological inefficiencies as a restriction on the data pipeline, I often run into a give-and-take situation with respect to predictive modelling and process automation. Sometimes efficiency is accomplished by not including every business characteristic in the model because it either cannot be automated, its availability is costly or it is simply untrustworthy. I often run into unfavourable validation statistics that could have easily been solved with the provision of more uncorrelated predictive features, but the data collection is inefficient and expensive.

Sometimes predictive prowess and operational efficiency have to go hand in hand. Of course, short-term downfalls such as these can slowly be overcome as operational changes improve, technological capabilities are enhanced, and further research is done to understand which data points are worth collecting.

5. The Financial Industry is well-known for its standard modelling practices and conservatism. Sometimes, it is more beneficial to use these practices as benchmarks and gain flexibility using alternative underwriting practices.

It is important to know what kinds of data are unique to the company and what would not typically be looked at by major financial institutions. With the uproar of social media presence among today’s businesses, bad online reviews, nicely composed websites or product images can make or break the decision to receive financing. In cases like these, data science can immensely enhance the power of underwriting applications. The utilization of social media text analytics, geo-locational analysis, and the human experience can trump the analysis of a few financial ratios that financial institutions would normally be restricted to using.