Logit transform | 分数对数转换

After we discussed logarithm (‘log’) last week, we explored a bit on some commonly used methods that have log embedded in them.   For example, the logit function, or logit transform (using the “natural” logarithm).   We explained its definition by the following Python code.

>>> epsilon=0.001
>>> def logit(c):
>>>  d = np.log((c+epsilon)/(1+ epsilon-c))
>>>  return d

The following is the inverse, which is to bring what was transformed back to what it was before.

>>> def inverse_logit(a):
>>>  b = ((1+ epsilon)*np.exp(a) – epsilon)/(np.exp(a)+1)
>>>  return b

>> print(logit(0.1)) #-2.1883847407670785
>>> print(inverse_logit(logit(0.1))) #0.09999999999999999

It is much more revealing on what the logit transform is doing by looking at some pictures of how this works.  See how fast when it is transformed!   Why it is stretched instead of being shrunk?  We know that taking log is to do division multiple times (recall log10 of a number is how many times it needs to divide by 10 in order to become 1).     But when it applies to numbers between 0 and 1, it gives us the opposite effect.  A positive small number less than one has to divide by 10 negative times to become 1.  For example, 0.01 needs to be divided by 10 negative two times to be restored to 1.  That’s why you see that y axis we have negatives.

On the other hand, we also have positives in the y-axis.  That’s because about half of the numbers (c+epsilon)/(1+ epsilon-c) (the odds) are large positive numbers.  Play around with it and you will surely get it.

Magic Math Mandarin

>> x = np.linspace(0,1,1000)
>>> y = logit(x)
>>> plt.scatter(x=x, y=y, alpha=0.3)
>>> title =”logit transform”
>>> plt.title(“%s”%title)
>>> plt.xlabel(“numbers between 0 and 1 (inclusive”)
>>> plt.ylabel(“after logit transform”)
>>> plt.xlim(-6, 6)
>>> plt.ylim(-6, 6)
>>> plt.gca().set_aspect(‘equal’, adjustable=’box’)
>>> plt.draw()

Look at the same plot with the axis scaled differently:

Magic Math Mandarin

Cosmic distance ladder | 宇宙距离

The class has no homework today.  We watched the video lecture by Terence Tao (see link below).   The name of the video is “Cosmic Distance Ladder”.  Quite a mystifying name.

The stories, which Terence Tao told in the lecture, were about philosophers and astronomers from ancient times, such as Aristotle and others, and those who were closer to us in history.  What all of them have in common is that they were able to use good observations and ingenious reasonings to indirectly measure the distance between the Earth and the Moon, and the Sun, and the distance of the galaxies, without any technology (the earliest did not even know the number Pi), with amazing accuracy (as verified by what we know today).

You should definitely watch the video a few times.  Think about this: compare with human observation and reasonings, what computers can do is still just technology and tools.  The computers can’t do indirect reasonings that connect the dots from disparate information. It makes zero sense to believe computers (including phones) are smarter than you are.

So, use your great mind. Let your mind observe and reason, and make computers help you along the way.

 

 

 

Logarithm | 对数

As we had explored in previous classes, division is subtraction again and again and again, multiplication is adding again and again.  Exponentiation is multiply again and again and again— They are all inventions to simplify repeated computation.

So is the invention of logarithm: taking log is division again and again and again.   They were invented by John Napier who was a Scottish mathematician, physicist, and astronomer  in 1614 as a means to simplify calculations.

🙂 Today’s  Python numpy class summary:

Log10 means how many times divide by 10 will return you to 1. log10(100) will give you 2 because 100 divide by 10 twice returns us to one.
>>> np.log10(100)
One trillion divide by 10 twelve times returns it to 1.
>>> np.log10(1000000000000)

>>> np.linspace(0.0, 3.0, num=4)
Out: array([0., 1., 2., 3.])

>>> np.logspace(0.0, 3.0, num=4)
Out: array([   1.,   10.,  100., 1000.])

>>> np.linspace(0.0, 12.0, num=13)
Out: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])
>>> np.logspace(0.0, 12.0, num=13)
Out: array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10, 1.e+11, 1.e+12])

Bonus:  Did you know that Engineers and scientists used to use a tool called “slide rule” (计算尺) to do logarithmic computations until 1970s when electronic computer and calculators came into use.  You should go and check it out if any of your grandparents have one of these.

What is social credit score | 什么是社会信用分

Social credit score 社(shè)会(huì)信(xìn)用(yòng)分(fēn) means rating how trustworthy you are based on your spending habits, social connections and your online behavior on social media. Traditional credit scores are what lending institutions use to judge how likely you will pay back the money before they lend you. Traditional credit score use information such as whether and how long you have a job, how much money you owe relative to how much you earn, and so on. Social credit is being used in similar ways with a different set of data. Tencent Credit 腾讯信用分 and Sesame Credit 芝麻信用分 are the prime examples.

For example, social credit score can be like a test score number that ranges between 300 and 850 and made up of five dimensions/categories 5 个维度:

  • social connections
  • consumption behavior
  • security
  • wealth
  • compliance

WSJ reported in 2016 on social credit in China with an article title “China’s New Tool for Social Control: A Credit Rating for Everything“. The words “social control” and “Big Brother” have bad connotations. However, politics aside, we do appreciate people who have good social credit.

What can social credit scores be used for? Traditional credit scores are used by banks or other lenders to approve loans, used by employers to screen candidates, and some other kind of approvals. Likewise, social credit scores can be used for similar purposes. Alibaba’s affiliate Ant Financial opened a strictly online bank called MyBank that serves small businesses and individuals. This online bank takes deposits the same way as Synchrony or AlyBank in the US do. But it also gives out unsecured loans (without any collateral) up to $850,000. That is a lot of money to lend without collateral (by the way, mortgages are secured loans collateralized by the house). Underlying the decision to lend or not to lend is social credit score, calculated based on huge amount of online transaction data.

Back to the US, which companies may have data that can generate a full or partial social credit score? I think Airbnb, Amazon, Facebook, Lyft, UBER and etc. all have huge amount of data that can be used for credit scoring. Due to the high regulatory cost (banks are highly regulated), it is not likely that any of them would want to become a bank. But they can take some of the businesses that banks always have been doing. For example, Amazon has Amazon Cash and variations of it in various countries such as the U.K. Also, UBER has recently offered UBER cash. These are banking businesses: they allow people to shop or ride without a credit or debit card as long as you load your account with cash.

What does aggregate mean?

In math and especially statistic, aggregate means to get some ideas on a set of numbers or measures.

For example, let’s say we are having lunch at the school cafeteria.  You may want to know some aggregated measures, such as:
More

API

Yes, this is still a Magic Math Mandarin blog–but to have tomorrow’s skills means a lot more than just knowing basic math.
API = plugin

This post is a quick note (updates are in progress).

Site that documents most APIs
https://www.programmableweb.com/
More

hash function/ cryptography/ digital signiture

Hash function (散列函數):
Yes, this is still a Magic Math Mandarin blog–but to have tomorrow’s skills means a lot more than just knowing basic math.
A hash function maps arbitrary string of data fix length of bits.

The simplest (a bad one) is to use the ASCii representation of alphabet, sum up the numbers and take mod.   This creates a problem as “cat”, “act” and “tca” are going to be mapped to the same hash.  This problem is called “hash collision” 碰撞.

More

A CS lecture 马克十几年前教的课

It is amazing how much one can learn from this old video, recorded 2 years after Mark Zuckerburg left Harvard. Here is what I remember after listening 5 times:

1. Don’t follow what the big guys are doing. There is so much you can do on your own with technology (find your own problem and get to it)
More

Teaching long division |长除法

Long division is still taught exactly the same way many years ago: a highly mechanical process that is worse than doing dishes.

Perfecting the steps is essentially the same as robot training.

Our classes should discuss why long division was invented and why it works, and why there are many other ways.