Artemio Rimando - Evaluating a life measured in smiles | A data scientist lifestyle blog
  • HOME
  • LET’S COLLABORATE
    • CONTACT ME
    • SUBSCRIBE
    • BIG BROTHER GAMES
      • INQUIRY
      • PAST GAMES
  • ABOUT
    • CREDENTIALS
    • ABOUT
  • DATA SCIENCE
    • PYTHON
    • APPLIED LEARNING
    • SOFT SKILLS
  • CAREER
    • CAREER ADVICE
    • RECRUITMENT ADVICE
  • LIFESTYLE
    • OUT AND PROUD
    • LIFESTYLE
  • TRAVEL
    • HONG KONG
    • MACAU
    • SAN FRANCISCO
HOME
LET'S COLLABORATE
    CONTACT ME
    SUBSCRIBE
    BIG BROTHER GAMES
    INQUIRY
    PAST GAMES
ABOUT
    CREDENTIALS
    ABOUT
DATA SCIENCE
    PYTHON
    APPLIED LEARNING
    SOFT SKILLS
CAREER
    CAREER ADVICE
    RECRUITMENT ADVICE
LIFESTYLE
    OUT AND PROUD
    LIFESTYLE
TRAVEL
    HONG KONG
    MACAU
    SAN FRANCISCO
Artemio Rimando - Evaluating a life measured in smiles | A data scientist lifestyle blog
  • HOME
  • LET’S COLLABORATE
    • CONTACT ME
    • SUBSCRIBE
    • BIG BROTHER GAMES
      • INQUIRY
      • PAST GAMES
  • ABOUT
    • CREDENTIALS
    • ABOUT
  • DATA SCIENCE
    • PYTHON
    • APPLIED LEARNING
    • SOFT SKILLS
  • CAREER
    • CAREER ADVICE
    • RECRUITMENT ADVICE
  • LIFESTYLE
    • OUT AND PROUD
    • LIFESTYLE
  • TRAVEL
    • HONG KONG
    • MACAU
    • SAN FRANCISCO
Data Science•Soft Skills

What to do as a Beginner Data Scientist and How to Improve

Growth as a data scientist will take on many forms and scale up several different paths depending on the function that you serve within your work environment. The learning curve as an early-stage Data Scientist will vary on several things such as your background education and knowledge, prior experiences within the field and industry, and whether you work within a team of data scientists, or as a standalone data scientist.

For myself, the learning curve was and continues to be steep and challenging. I began my career as a standalone data scientist for a start-up company, coming straight out of school and having very limited knowledge of the financial industry. All I had under my knowledge-base at the time was an in-depth understanding of the Logistic Regression, some economic analytical projects involving time-series, and a toolkit consisting of R and Microsoft Excel. Out of uplifting encouragement, I could of done more to add to my skill set before I started my job, but with what I knew, with an eagerness to learn, and with an immense curiosity, I had exactly what I needed to begin my career.

My role as a data scientist is to build and maintain proprietary credit scoring models, and provide adhoc analysis and reports upon request. There was already a whole list of challenges that I faced when I first started: a lack of appropriate credit scorecard building knowledge, a lack of knowledge on advanced data analytic techniques, verifying that my work met industry standards, and lacking the knowledge to closely monitor model effects.

These challenges pushed me to figure out the best practices and processes in the best way I thought possible. Here are some of the ways I went about addressing the challenges I faced during the start of my career.

Conducting Independent Research

My first gut instinct to approach a problem where you virtually have almost no background experience and no one to turn to for answers is to research! Having obtained a Master’s degree from a program that infused independent research heavily within its curriculum, this only came natural to me. For example, the most important thing in tackling a scorecard building project was first understanding its entirety and breaking it down into manageable and understandable pieces. It was extremely important to know why it is used, how it is used, and how it will benefit my company’s operations.

What often happened throughout my research was that I would find complex solutions that were difficult to implement without advanced enterprise software or advanced programming knowledge, or I would find solutions that seemed too easy and not convincing enough to use. This process of researching and attempting to reproduce certain projects on the internet definitely increased my technical understanding and in many ways helped me boost my proficiency in R. Along the way, I even picked up some Python and I also learned to how to write queries in Microsoft SQL Server and MySQL to better streamline my data and model building processes.

Networking

Another challenge was ensuring that the credit scoring models were built following best practices within the financial industry. This was a little more difficult for two reasons. The first one being that a scorecard for an alternative business-lending company would differ immensely from the more common scorecards developed in the industry such as that for personal loans. Secondly, the modelling practices for alternative subprime business-lending is still relatively new with the emergence of these industries stemming back since the 2008 Financial Crisis. Therefore, research is limited and most ideas behind these driving forces are mostly proprietary.

To overcome this challenge, I engaged in some more internet research, but more importantly, I networked with industry professionals and took what I could from my discussions with them. Most of our discussions involved understanding what techniques were used widely in the industry. During this time, LinkedIN, and my personal connections contributed to my learning of overcoming this challenge. I learned to set up interactions with professionals online as well learned to generate and connect ideas between professionals within my own work.

Engaging in Trial and Error

At first, there is high pressure when you first start as a data scientist with expectations of completing your projects within specified deadlines. The scorecard was my very first project and with the limited knowledge that I had, I was almost forced into a situation of trial and error. Initially, my practices involved researching and building in an endless cycle, where I often updated the scorecard to meet new standards and practices I learned along the way. At the time, there was very little internal user feedback on the scorecard because it was assumed that it was performing exactly the way it should be. It was essential that through this trial and error process that there was constant communication and understanding among the company in order to continue building a robust scorecard. Here, I learned a lot about not only the technical side of model building, but also found that my role as a standalone data scientist has a unique place within the operational team.

Being Prepared and Building Confidence

No data science problems at high levels of technicality and knowledge can be solved so easily. As a standalone data scientist where you are mostly doing things on your own accord and expected to make educated executive decisions, you are bound to run into personal hurdles such as worries and frustrations. When something goes wrong with your models, you become the first person accountable which in many ways can be offsetting. I came to realize that all of these feelings were natural and it was perfectly fine!

In order to overcome this challenge, it was always in my best interest to be prepared to provide thorough answers to questions that the company asked me, and be able to address concerns. Whenever there was a problem or concern raised with the models I built, or the data analysis methodologies, I was always forward with a positive answer or came up with a solution. It was in my best interest to be accountable and honest with my abilities. This stemmed from the realization that I do not know everything, but I do want to learn to make sure I do my best work in order to help the company grow. With the appropriate communication among upper management and their moral support, these personal challenges slowly faded and I actually began to expedite my learning of more applied business data science.

Moving Forward

What I appreciate the most about the early stages of my career is the amounts of learning that I have done and the huge amounts of growth I experienced as a person. With that said, the learning never ends as new modelling needs occur, data repositories grow with new data to be analyzed, and new modelling techniques and solutions are introduced with new technologies.

I know that as I continue along this career path, I am bound to learn some more programming, apply other predictive models, and conduct interesting kinds of analysis. With these ongoing changes within a fast-growing company, there is bound to be one problem solved with ten more problems arising. The best part of being in the early-stage of my career is that I know I still have a lot to learn, and as I move forward, I will anticipate the challenges ahead, and be more than happy to tackle them one step at a time.

Data Science•Soft Skills

5 Lessons in Applied Data Science from Alternative Business Lending

After being part of a fast-growing financial company for about a year at Merchant Advance Capital, I have come to accept the limitations when wanting to eagerly dive into data that is unique to the industry. Initially, it was frustrating to see that so many modelling practices and standards learned throughout my education could not simply be followed within the alternative business lending industry. As I slowly started to peel back at what I knew, and begun to open myself up more to things that I did not know in practice, I soon noticed that I needed to conform my attitude and skills towards what the company really needed from my role. I want to talk a little bit about what I have learned thus far and hope to reflect on these lessons so that they may help me push forward in becoming a better Data Scientist.

1. Applying data science is pointless if you don’t know the data you’re working with and how it relates to your problem at hand.

The bulk of Merchant Advance Capital’s alternative lending practices is providing loans to subprime businesses within Canada. Many of these businesses lack the collateral to successfully obtain loans from a bank or are considering quick and cheap alternatives for their business needs. One important thing to note here is that building models to predict risk levels of different businesses requires knowing exactly what kinds of businesses you are lending to. It is super easy and sometimes tempting to gather a bunch of business characteristics and immediately send them through a machine learning algorithm to obtain predictions. It is always better to carefully choose, craft and analyze these characteristics and ensure that the relationships drawn from them make intuitive business sense. Domain expertise is very crucial.

2. Refrain from using machine learning algorithms where you cannot fully interpret the relationship between business characteristics and your model predictions.

I had to learn this the hard way when several reporting issues came about through different avenues. One such avenue was within the operations department, where loan application administrators had a difficult time translating machine learning predictive outcomes to business owners and their respective sales representatives. As a result, there began to be a lack of trust within the scorecard regime. In the event that a loan is a rejected, these respective parties deserve a fair reason as to why they have been declined. If you were to build risk scorecards using black-box methods, more often than not, your predictions will be very hard to interpret from a characteristic-to-characteristic level. It would also be difficult to explain why a business owner scored a certain way if a sales representative demanded specific reason for decline.

3. Refrain from using machine learning algorithms where you cannot fully understand the costs and benefits of your model predictions.

When first developing a risk scorecard, little did I know how significantly involved its use would be within the core business of the company. The predictions of your machine learning model can translate into restrictions on product pricing and the promotion of certain products to different segmented populations. It is so important that the characteristics used to describe and understand your target population are quantifiable and make intuitive business sense. It could so happen that these characteristics will be a unique aspect of your customer base that generates the most money or generates the most loss.

4. There must be a balance between the implementation of machine learning algorithms and the use of them at the operational level.

One of the biggest hypes in data science is the ability to utilize, understand and process big data in a matter of minutes. Applied data scientists often face challenges that are operation-specific such as lack of data automation, collection and organization. In a subprime lending industry where the bulk of our customer base are somewhat technologically adverse, the simplest solution for loan applications is through e-mail and paper submissions.

With huge technological inefficiencies as a restriction on the data pipeline, I often run into a give-and-take situation with respect to predictive modelling and process automation. Sometimes efficiency is accomplished by not including every business characteristic in the model because it either cannot be automated, its availability is costly or it is simply untrustworthy. I often run into unfavourable validation statistics that could have easily been solved with the provision of more uncorrelated predictive features, but the data collection is inefficient and expensive.

Sometimes predictive prowess and operational efficiency have to go hand in hand. Of course, short-term downfalls such as these can slowly be overcome as operational changes improve, technological capabilities are enhanced, and further research is done to understand which data points are worth collecting.

5. The Financial Industry is well-known for its standard modelling practices and conservatism. Sometimes, it is more beneficial to use these practices as benchmarks and gain flexibility using alternative underwriting practices.

It is important to know what kinds of data are unique to the company and what would not typically be looked at by major financial institutions. With the uproar of social media presence among today’s businesses, bad online reviews, nicely composed websites or product images can make or break the decision to receive financing. In cases like these, data science can immensely enhance the power of underwriting applications. The utilization of social media text analytics, geo-locational analysis, and the human experience can trump the analysis of a few financial ratios that financial institutions would normally be restricted to using.

Meet Artemio!

Artemio is a Torontonian-at-heart living in Vancouver, BC. You can find him in and around the city sipping bubble tea and playing Pokemon GO.

Welcome!

You will find blog posts written about a passion for data science, travel, and the joys of life.

Follow Me!

Subscribe Here!

Instagram Feed

artemiorimando

Achoo-choo 🇨🇦🚂 #covidtravel2020 #wearamas Achoo-choo 🇨🇦🚂 #covidtravel2020 #wearamask #choochootrain #revelstoke #beautifulbc🍁
Stay golden 🌄 #covidtravel2020 #albertaviews #l Stay golden 🌄 #covidtravel2020 #albertaviews #lakeannette
Out here capturing a summer take of a similar phot Out here capturing a summer take of a similar photo I took in the winter last year 🙈🤓 #covidtravel2020 #albertaviews #lakelouisecanada
Crystal clear 😌 #covidtravel2020 #albertaviews Crystal clear 😌 #covidtravel2020 #albertaviews #morainelake #luckyaf
Looking like sound waves but all I hear is quiet 😌🌄 #covidtravel2020 #albertaviews #pyramidlake #tranquil
Early riser 🌄🇨🇦 #covidtravel2020 #jasperp Early riser 🌄🇨🇦 #covidtravel2020 #jasperprovincialpark #albertaviews
Feeling lucky we saw the highest point of the Cana Feeling lucky we saw the highest point of the Canadian Rockies today 🇨🇦🏞️🚡 #covidtravel2020 #albertaviews #luckyaf
Streaming Game 6 and walking this trail 😏🦖🇨🇦 #winwin #covidtravel2020 #beautifulbc🍁 #kamloops
As of late, caused unnecessary game drama/stress w As of late, caused unnecessary game drama/stress with friends and family 😏, (ironically) advocated for destigmatizing mental health 🙏, longest streak for not leaving the house was like 20 days, drank a lot of bubble tea, and a picture under a bridge to show for it. 2020 has been wild so far. #pandemic #staysafe
Smoggy sunrise 🇹🇭 #thailand #bangkok #infini Smoggy sunrise 🇹🇭 #thailand #bangkok #infinitypool #sunrise #gaytravel #gaypassport #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld
Wet trunks, sandy toes, sun block, speedy boats 🇹🇭 #thailand #phuket #kohphiphi #phiphi #phiphiislands #paradise #gaytravel #gaypassport #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld
Ayy Okay👌🕶️🇹🇭 #thailand #phuket #koh Ayy Okay👌🕶️🇹🇭 #thailand #phuket #kohphiphi #phiphi #phiphiislands #gaytravel #gaypassport #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld
🤳 Big Buddha 🇹🇭 #thailand #phuket #bigbud 🤳 Big Buddha 🇹🇭 #thailand #phuket #bigbuddha #gaytravel #gaypassport #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld #blackandwhite
🇻🇳 Long 🐲 Lan 🦁 Quy 🐢 Phung🐥 #v 🇻🇳 Long 🐲 Lan 🦁 Quy 🐢 Phung🐥

#vietnam #haolu #ancientcapital #ancient #gaysian #gaytravel #gaypassport #instagay #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld #blackandwhite
Ha Long Bae 🚣‍♂️🇻🇳 #vietnam #halong Ha Long Bae 🚣‍♂️🇻🇳 #vietnam #halongbay #baitulongbay #gaysian #gaytravel #gaypassport #instagay #travelasia #instatravel #travelpics #travelgram #travel #globetrotter #igtravel #igtravelworld #blackandwhite
Load More... Follow on Instagram

Most Popular Posts

  • Data Engineering using Airflow with Amazon S3, Snowflake and Slack
    Data Engineering using Airflow with Amazon S3, Snowflake and Slack
  • Scorecard Building in R - Part IV - Training, Testing and Validating the Logistic Regression Model
    Scorecard Building in R - Part IV - Training, Testing and Validating the Logistic Regression Model
  • Scorecard Building in R - Part II - Data Preparation and Analysis
    Scorecard Building in R - Part II - Data Preparation and Analysis
  • Scorecard Building in R - Part III - Data Transformation
    Scorecard Building in R - Part III - Data Transformation

Links to the Past

artemiorimando

Achoo-choo 🇨🇦🚂 #covidtravel2020 #wearamas Achoo-choo 🇨🇦🚂 #covidtravel2020 #wearamask #choochootrain #revelstoke #beautifulbc🍁
Stay golden 🌄 #covidtravel2020 #albertaviews #l Stay golden 🌄 #covidtravel2020 #albertaviews #lakeannette
Out here capturing a summer take of a similar phot Out here capturing a summer take of a similar photo I took in the winter last year 🙈🤓 #covidtravel2020 #albertaviews #lakelouisecanada
Crystal clear 😌 #covidtravel2020 #albertaviews Crystal clear 😌 #covidtravel2020 #albertaviews #morainelake #luckyaf
Looking like sound waves but all I hear is quiet 😌🌄 #covidtravel2020 #albertaviews #pyramidlake #tranquil
Early riser 🌄🇨🇦 #covidtravel2020 #jasperp Early riser 🌄🇨🇦 #covidtravel2020 #jasperprovincialpark #albertaviews
Follow on Instagram
This error message is only visible to WordPress admins
Error: There is no connected account for the user 31859063.

Subscribe for new updates!

© 2019 ARTEMIO RIMANDO // All rights reserved.