Free Updates

Exclusive content to your inbox for FREE!

Loan Descriptions – Can They Be Helpful When Choosing Loans? Part 1

by Peter Renton on December 10, 2012

This is the first post of a two part series on Lending Club loan descriptions by guest writer Sam Kramer. Sam spent the last 15 years working in the finance industry, where he was exposed to financial analysis and consumer credit. Sam is married, has two children, and includes investing in Lending Club amongst his hobbies. He can be contacted on Twitter @P2P_CT.

When I first started investing in Lending Club (“LC”) about one year ago, I was drawn to the large data files and ability to analyze the loan data. At the time, I conducted an analysis of default rates relative to loan description length. The result surprised me, as loans with no description showed a lower default rate (in this analysis, default is defined as any loan which is 16 or more days’ late, defaulted or charged off) than loans with any length of description. I did not feel comfortable forming my investment thesis around loans with no description, and decided to test my ability to choose loans based on their description by reading a number of old descriptions. I was not happy with my performance of selecting loans by their description length, so decided to focus on other metrics when deploying my initial investment in the LC platform.

I have given this topic a lot of thought over the past year, and now find myself revisiting loan description lengths to determine whether they might be useful when selecting loans. I have updated my findings in this chart – loan description length is calculated using Microsoft Excel, and makes adjustment for phrases like “Borrower 123456 added on 8/14/10>” (click on the chart to enlarge it):

The above chart demonstrates that very short loan descriptions (between 1 and 10 characters) have quite a high default rate.  However, short loan descriptions (11-350 characters) have a default rate which is closer to the default rate of no description loans.  Once again, no-description loans appear to have a lower-than-average default rate.

I decided to break my analysis into three areas: no descriptions, very short descriptions (1-10 characters), and all other lengths (11+ characters).

No description loans

I noticed that no-description loans make up a large proportion of the total population (36%).  This surprised me – surely more than two thirds of borrowers would feel the need to enter some type of loan description when submitting their loan application?  .

Perhaps an additional factor is at work here.  The data appears to support this intuition:

The chart above shows the loan description lengths as a percentage of loans issued by quarter.  The bright red bars represent the percentage of quarterly loan issuance without descriptions.  This chart implies that around Q4 2009, something happened which resulted in a large proportion of loans being issued without descriptions.  A closer look at the underlying data shows that starting in October 2009, a large number of loans started being issued without loan descriptions.  This rapid change in the data makes me think that borrowers might not be the ones responsible for the large proportion of no description loans.

This data also made me think that my initial default analysis needs to be revisited, as the loans issued in 2007 – 2009 are, for the most part, fully repaid (and have an established default rate), while the 2010 and newer loans still have a larger portion of their balances outstanding, and in all likelihood the default rates on these vintages will increase in the future.  Put in other words, the newer loans (which are the only area where no-description loans are present) currently have a lower default rate than the older loans; because no-description loans (in large volumes) are a relatively new phenomenon, they will naturally have a lower default rate than the loans with descriptions.

In order to more closely analyze the default rate on no-description loans (and to avoid the dates when there was a low instance of no-description loans), I narrowed my analysis to 36-month loans issued after January 1, 2010.  The revised default rate is as follows:

This revised analysis shows a no-description default rate which is slightly higher than the average default rate for this population.  The default rates on the other loan description lengths have a shape similar to Chart 1.

This analysis demonstrates that no-description loans do not perform better than average, but rather perform in-line with the overall population.  As such, I do not believe that no-description loans give insight on credit risk.  Rather, investing in no description loans is akin to ignoring loan descriptions entirely.

This is interesting, but is only useful here in that it says we can ignore these loans when establishing a loan selection strategy based on descriptions.

In the second part of this series we will take a look at both the very short loan descriptions that seem to have a much higher default rate as well as other loan description lengths.

{ 19 comments… read them below or add one }

Peerlend December 10, 2012 at 10:35 am

Great stuff – look forward to the next installment. Might want to look at credit quality correlation with length of description – my suspicion is that the hidden variable may be different borrower funnels / levels of informational prompting per credit grade.

Reply

Peter Renton December 10, 2012 at 4:19 pm

That would be a logical extension of this kind of analysis. Maybe that is an idea for a follow up post.

Reply

Sam Kramer December 11, 2012 at 1:03 am

That is a good point. An initial run by loan grade shows similar patterns for B-D grade loans (long description A’s perform in-line with other A’s). However, the loan population used above (issued in 2010 or newer, 36 months) doesn’t provide a useful sample size for F and G loans (only 51 G’s are in my population).

As Peter mentions, this could be an idea for a follow up post.

Reply

Andrew N December 10, 2012 at 10:42 am

Very interesting analysis. I wish there was an easy way to analyze the grammar of a loan description. I would like to see an analysis on the performance of loans where the description has proper grammar and punctuation versus a loan without those attributes.

Reply

Peter Renton December 10, 2012 at 4:22 pm

Grammar analysis would be very difficult and I don’t know anyone who has even attempted such an analysis. I am sure some English major with a love of statistics and p2p lending will tackle this one day.

Reply

Charlie December 10, 2012 at 11:14 pm

Google has a spelling API that one could leverage to build such a tool. Passing loan descriptions off to that service could make the project “relatively” simple. ( ;

http://code.google.com/p/google-api-spelling-java/

Reply

HighROI December 10, 2012 at 7:24 pm

Run a search for “dept” which is by far the most common (and perhaps the dumbest) misspelled word that I noticed. You should have over 400 results which may be enough to get some idea.

Reply

Sam Kramer December 11, 2012 at 1:55 am

I spelling / grammar analysis would be very interesting and could provide an objective (albeit time intensive) tool for use in loan screening.

This analysis might also be interesting when combined with the loan description analysis.

Can anyone suggest a simple way to flag spelling and grammar errors in Excel?

Charlie, your mention of the Google spelling API above sounds like it is beyond my limited programming skills.

Reply

Russell December 11, 2012 at 10:47 am

A question about methodology. You say you remove text a la “Borrower 123456 added on 8/14/10>” … do you also remove the text that was subsequently added? I can’t remember the last time I read an actual user generated loan description aside from the responses to lender questions. But, I am a low volume investor, so that might just be me running into the 34% of loans that have no description.

Otherwise, description length other than 0 is (obviously) confounded with whether any lender asked the borrower a question. Moreover, the actual length of the “description” then might also be a proxy for another interesting variable… number of questions responded to. In that case an interesting value might be mean response length per question as well as number of questions asked. The later being a likely indicator of the class of investor who is putting money into the loan and the first being some indicator about the borrower.

Reply

Sam Kramer December 11, 2012 at 12:19 pm

Questions / responses are not included in the loan description field. No description loans that have questions responded to will still show up as no description.

However, in instances where borrowers make multiple loan description entries (say, on different days), the adjustment to remove “Borrower 123456 added on 8/14/10>” would only take into account the first instance of this phrase, and not subsequent instances. In this event, loan description length could be overstated by up to (33 x number of entries) characters.

Reply

Dan B December 11, 2012 at 1:21 pm

Please correct me if I missed something, but is it not true that if you instead redid chart #3 & just made 2 categories……………loans with no description & loans with some description, that both categories would then fall within the margin of error of the entire sample (3.1%)? The 3.3% no description ones certainly already do.

I look at chart #3 above as it stands & see that everything falls close to the average except the 5 spikes. These 5 spikes whose sample sizes are 206, 364, 301, 862 & 173…………. even when combined account for a mere 3% of the total number of loans. What am I missing?

Reply

Peerlend December 11, 2012 at 4:00 pm

What you’re missing is that some people enjoy thinking they have a systematic approach! :)
(Yes, yes, I know – you enjoy thinking that your system – no system – is better than theirs is!)

BTW – lots of the anomalies are attributable to LC – “pool” is especially funny, if one knows that LC did a financing deal with a big pool company during the time when those show up.

Reply

Dan B December 11, 2012 at 4:17 pm

Agh! I knew I was missing something. :)

Reply

gharkness December 19, 2012 at 11:01 am

Wow, that’s really interesting! I would NEVER loan money to anyone foolish enough to go into debt over a pool, a vacation or a wedding. (OK so beat up on me, but that is the way I feel about it – if I wouldn’t do it, I am definitely not loaning money to someone who would.) So I completely missed out on a whole bunch of loans, but maybe that’s not such a bad deal.

Someone (sorry – can’t remember who) once did an analysis of loan descriptions with words like “need,” “help,” “deserve,” and such, and found that – at least among the loans they analyzed, there was a much higher default rate when these words were used. Have you looked at any of those specific words?

This is a very interesting analysis and I find myself, along with many others, trying to figure out exactly how I can mitigate the risk and still get the highest returns possible. Can’t wait for the next installment!

Reply

Peter Renton December 19, 2012 at 5:41 pm

The site you are talking about is Lending Tuber: http://lendingtuber.blogspot.com/. They did a lot of analysis of different words from Lending Club loan descriptions.

And the next installment in this series is already here:
http://www.lendacademy.com/lending-club-loan-descriptions-2/

Reply

HighROI December 11, 2012 at 4:21 pm

@Sam Here’s another common spelling mistake “consolodate” and “consolodation” between the two you will get over 300 results, combine that with 400 of “dept” and you will have plenty of notes to run analysis on and see if they have a higher default rate.

Reply

rasras December 11, 2012 at 5:23 pm

a real $ maker/alpha strategy is to find a way to go SHORT the loans with “consolodate” and “consolodation” and/or “dept” in them!

Reply

Mike S December 15, 2012 at 1:21 pm

I’m glad he considered age. I find it helpful to also consider average interest rate and sample size. One of the analyses I’ve done was on “family words” (family, child, children, son, daughter, mother, father, sister, brother, wife, husband). On my pre-filtered data, I found:

1124 notes contain a family word
Of those, 76 (6.8%) are bad (anything besides current or fully paid)
13.52% average interest rate

13486 notes do not contain a family word
Of those, 682 (5.1%) are bad
13.59% average interest rate

My theory is that people use “family words” to invoke sympathy. The assumption is that the need to invoke sympathy correlates with a higher credit risk. Can a statistician chime in and confirm whether this means that notes containing a family word are 33% (6.8/5.1) more likely to be bad?

Reply

Russell December 15, 2012 at 2:22 pm

A quick simulation shows that assuming an underlying default rate of 5.2% (the weighted average of the two default rates) that one would expect that much of an absolute difference between the two groups less than 2% of the time. The actual increase in risk for loans with family words can be estimated from the value observed, but we (obviously) can not know the true value. How to interpret the 6.8/5.1 ratio… I’ll leave that up to someone else.

Reply

Leave a Comment

Notify me of followup comments via e-mail. You can also subscribe without commenting.

{ 1 trackback }

Previous post:

Next post:

Real Time Analytics