Entropy | 熵 shāng

Nothing is lost, nothing is created, everything is transformed.
― Antoine Lavoisier (August 1743 – 8 May 1794)

Unlike before, we started the class today with a quote. This is because it is really difficult to talk about entropy, and we made many analogies (such as water flows from high to low, a mirror broken never, or almost never, returns to whole again) to bring our attention to how things work in daily life that we have taken for granted.

Some theory/hypothesis says that the universe started with Big Bang, a state with very low entropy. There are many states of high entropy than low entropy (imagine 10…000 to 1). So we will have to cycle through lots of high entropy states before it is low again. Well, we only have barely touched the topic. Whereas our true goal is to talk about the so-called decision tree model, which we will cover tomorrow.

To help you remember the word “entropy” and its meaning (as if we knew!), “en” comes from “energy”.  “tropy” means “transfom”, and comes from Latin.

Entropy is a measure of the number of possible ways energy can be distributed in a system.

By the way, Lavoisier 拉瓦锡 was a great chemist.

Use snippets to code faster | snippet 加速编程

Until NeuralLink or other developments that can help us bypass typing all together, we want to find ways to save ourselves time from typing code.   Using snippet is a must for children, who may not be great at typing.   Here is how to do it in Sublime Test 3 (ST) and Visual Studio Code (code).

Sublime Test:
Tools – > Developer -> New Snippet.

VS Code:
Shift + Control + p

Write the snippet:

Snippets are json files. Use this web app, or some packages, or write json if you prefer to put the code into the json format.

Explaining .json snippet in VS Code:

  • The first set of “” encloses the name of the snippet (call it anything you like).
  • Prefix defines a prefix used in the IntelliSense drop down. For example, if you have a snippet for plotting, you may want to prefix it with “plot”.
  • body is the snippet content.
  • Note that: $1, $2 for tab stops
  • Description is the description.  When you start typing the prefix of a snippet, its description will come up.

Digital currency | 数字货币

DC/EP 是 Digital Currency/Electronic Payment.

An important piece of recent financial news that you might have missed is that the Chinese central bank (央行) has issued its own digital currency, saying that its design is similar to Facebook’s proposed cryptocurrency Libra.

Digital currency is a big deal.

It not only affects how people go about their daily lives, but also world politics.

For example, we know that the United States sanctions certain countries by blocking them from the banking system. But what if these countries and the rest of the world start to use digital currency as the world currency instead of relying on the US dollar?

That would be really bad for the US.

For the young, it is time to learn how DC and EP work, and be ready for change.

ChineseCentralBankDigitalCurrenyDesgin

Decision trees | 决策树 jué cè shù

Today, we focused our the class on decision tree. Decision tree is a way to organize data.   You can look at it this way: you ask a bunch of questions and make a bunch of decisions, and organize data based on these decisions.

For example, if our data consists of colors and shapes of 3 pieces of fruits. We have 1 yellow apple, 1 red apple and a yellow banana. We have two features: shape and color.

By organizing our data, we can identify types of fruit. We go through our data on shape and color one by one. If we first organize our data by color, we know that will incorrectly group the yellow apple with the yellow banana. But if we first organize our data by shape, that will right away group apples and banana separately. So we organize this data by shape, and then by color (if we want to make a distinction between yellow apple and red apple).
Magic Math Mandarin
The way to organize data may be (highly likely) different for another dataset of fruits. But you get the point: we organize data to best group things. In each step of the way, our data gets more organized. The “energy distribution” has become lower entropy.

That was a classification tree model.

When we have lots of decision trees for different random parts of a larger data, we have the so-called “random forest” 随机森林 model, originated by Leo Breiman.

We showed in class how to code a decision tree from scratch.  Here is a shorter version using Python sklearn library.

 

Tycho Brahe and Johannes Kepler | 第谷·布拉赫 和 约翰尼斯·开普勒

In today’s class we deviated from our normal computer work and went in further on one of the stories told by Terence Tao in the “Cosmic Distance Ladder” video.  This particular story was about Tycho Brahe and Johannes Kepler.

Statue of Johannes Kepler (left) and Tycho Brahe (right) in Prague

It was a very important story of data and analysis.

Tycho Brahe and Johannes Kepler had totally disparate backgrounds and temperaments.

In spite of this, Tycho’s painstaking and detailed observational data of the planet Mars, combined with Kepler’s mathematical genius, allowed Kepler to derive the three laws of planetary motion. Both Tycho Kepler’s 3 Laws of Planetary Motion and Kepler made significant contributions to the change in the prevailing world view of a geocentric universe. It was the beginning of a systematic study that transformed Medieval thinking – alchemy became chemistry and astrology led to astronomy.

https://chandra.harvard.edu/edu/formal/icecore/The_Astronomers_Tycho_Brahe_and_Johannes_Kepler.pdf

Link for those students who can read in Chinese:
https://baike.baidu.com/item/约翰尼斯·开普勒/973574?fromtitle=%E5%BC%80%E6%99%AE%E5%8B%92&fromid=158768

Logit transform | 分数对数转换

After we discussed logarithm (‘log’) last week, we explored a bit on some commonly used methods that have log embedded in them.   For example, the logit function, or logit transform (using the “natural” logarithm).   We explained its definition by the following Python code.

>>> epsilon=0.001
>>> def logit(c):
>>>  d = np.log((c+epsilon)/(1+ epsilon-c))
>>>  return d

The following is the inverse, which is to bring what was transformed back to what it was before.

>>> def inverse_logit(a):
>>>  b = ((1+ epsilon)*np.exp(a) – epsilon)/(np.exp(a)+1)
>>>  return b

>> print(logit(0.1)) #-2.1883847407670785
>>> print(inverse_logit(logit(0.1))) #0.09999999999999999

It is much more revealing on what the logit transform is doing by looking at some pictures of how this works.  See how fast when it is transformed!   Why it is stretched instead of being shrunk?  We know that taking log is to do division multiple times (recall log10 of a number is how many times it needs to divide by 10 in order to become 1).     But when it applies to numbers between 0 and 1, it gives us the opposite effect.  A positive small number less than one has to divide by 10 negative times to become 1.  For example, 0.01 needs to be divided by 10 negative two times to be restored to 1.  That’s why you see that y axis we have negatives.

On the other hand, we also have positives in the y-axis.  That’s because about half of the numbers (c+epsilon)/(1+ epsilon-c) (the odds) are large positive numbers.  Play around with it and you will surely get it.

Magic Math Mandarin

>> x = np.linspace(0,1,1000)
>>> y = logit(x)
>>> plt.scatter(x=x, y=y, alpha=0.3)
>>> title =”logit transform”
>>> plt.title(“%s”%title)
>>> plt.xlabel(“numbers between 0 and 1 (inclusive”)
>>> plt.ylabel(“after logit transform”)
>>> plt.xlim(-6, 6)
>>> plt.ylim(-6, 6)
>>> plt.gca().set_aspect(‘equal’, adjustable=’box’)
>>> plt.draw()

Look at the same plot with the axis scaled differently:

Magic Math Mandarin

Cosmic distance ladder | 宇宙距离

The class has no homework today.  We watched the video lecture by Terence Tao (see link below).   The name of the video is “Cosmic Distance Ladder”.  Quite a mystifying name.

The stories, which Terence Tao told in the lecture, were about philosophers and astronomers from ancient times, such as Aristotle and others, and those who were closer to us in history.  What all of them have in common is that they were able to use good observations and ingenious reasonings to indirectly measure the distance between the Earth and the Moon, and the Sun, and the distance of the galaxies, without any technology (the earliest did not even know the number Pi), with amazing accuracy (as verified by what we know today).

You should definitely watch the video a few times.  Think about this: compare with human observation and reasonings, what computers can do is still just technology and tools.  The computers can’t do indirect reasonings that connect the dots from disparate information. It makes zero sense to believe computers (including phones) are smarter than you are.

So, use your great mind. Let your mind observe and reason, and make computers help you along the way.

 

 

 

zero, one and two | 零,一,二

It is not easy for a young child to comprehend multiplication by 1, as how they are taught in school is often the robotic multiplication table.   She or he can very quickly answer mutiplications by 2, or 3.    Because of this, questions like “what is the product of 1,2, 3, 4” (i.e. 4 factorial) can get a wide range of answers because the number “1” confuses the young mind.

Pychologist says that an infant learns the number 2 before the number 1.   And we can see why: with 2, there is something to compare against, like two fingers.  If there is only one finger, there is no variation, it is confusing.

When we teach multiplication, don’t forget to show that math is an integral part of the real world around us.   It is invented to simplify addition.  Multiply by 1 means just the thing itself.  Multiply by 2 means adding two of this thing together.  Multiply by 3 means adding three of the thing together.  The thing can be a bag of candies or the footage of a home.

Finally, we should show children how to use computers (not calculators) to do computations.   While a question like “give me the sum from 1 to 199” can be solved within seconds with math tricks, a slightly different question “give me the product from 1 to 199” won’t work with the same trick.  But if you know how to make the computer do the job, you can still answer it within seconds.

 

argmax, argmin argsort and quick sort | 快速排序

This Saturday class we went over indexing and ordering a group of items by their sorted indices. For those who are more advanced, please go over the section on quick sort.

For example,

>> import numpy as np
>>> packpack =np.array([‘snack’,’book’,’pen’,’eraser’,’apple’])
# Position of the biggest word (alphabetically)
>>> np.argmax(packpack)

[out]: 0
# Position of the smallest word (alphabetically)
>>> np.argmin(packpack)

[out]: 4

# Position of the words if we are to sort them alphabetically
>>> np.argsort(packpack)

[out]: array([4, 1, 3, 2, 0], dtype=int64)

Now let us sort them:
>>> packpack[np.argsort(packpack)]

[out]: array([‘apple’, ‘book’, ‘eraser’, ‘pen’, ‘snack’], dtype='<U6′)

Then we tried sorting numbers:

numbers = np.array([2,3,5,7,1,4,6,15,5,2,7,9,10,15,9,17,12])
>>> numbers[np.argsort(numbers)]

[out]: array([ 1, 2, 2, 3, 4, 5, 5, 6, 7, 7, 9, 9, 10, 12, 15, 15, 17])


Finally we dig deeper: how do you really sort things fast systematically? Using quick sort!

 

Logarithm | 对数

As we had explored in previous classes, division is subtraction again and again and again, multiplication is adding again and again.  Exponentiation is multiply again and again and again— They are all inventions to simplify repeated computation.

So is the invention of logarithm: taking log is division again and again and again.   They were invented by John Napier who was a Scottish mathematician, physicist, and astronomer  in 1614 as a means to simplify calculations.

🙂 Today’s  Python numpy class summary:

Log10 means how many times divide by 10 will return you to 1. log10(100) will give you 2 because 100 divide by 10 twice returns us to one.
>>> np.log10(100)
One trillion divide by 10 twelve times returns it to 1.
>>> np.log10(1000000000000)

>>> np.linspace(0.0, 3.0, num=4)
Out: array([0., 1., 2., 3.])

>>> np.logspace(0.0, 3.0, num=4)
Out: array([   1.,   10.,  100., 1000.])

>>> np.linspace(0.0, 12.0, num=13)
Out: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])
>>> np.logspace(0.0, 12.0, num=13)
Out: array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10, 1.e+11, 1.e+12])

Bonus:  Did you know that Engineers and scientists used to use a tool called “slide rule” (计算尺) to do logarithmic computations until 1970s when electronic computer and calculators came into use.  You should go and check it out if any of your grandparents have one of these.