An great collection of Python notebooks | Python 笔记本集

Here is a really great collection of Python notebooks with lots and lots of links.  We start with some appetizers:

But there are so many and so much more!! Okay, I am just going to copy (fork) most of this page:


  • Linear algebra with Cython. A tutorial that styles the notebook differently to show that you can produce high-quality typography online with the Notebook. By Carl Vogel.
  • More

Math olympia medal count analysis | 奥数奖牌分析

The International Mathematical Olympiad (IMO) is an annual six-problem mathematical olympiad for pre-college students younger than 20. The first IMO was held in Romania in 1959. As we will see, eastern Europeans were top performers in the IMO in the earlier years. You can find the summary data analysis in our Jupyter Notebook on GitHub.

It has since been held annually, except in 1980 (what happened in 1980?). More than 100 countries, representing over 90% of the world’s population, send teams of up to six students (under 20 years old) to compete.

Problems cover extremely difficult algebra, pre-calculus, and branches of mathematics not conventionally covered at school and often not at university level either, such as
– projective and complex geometry
– functional equations
– combinatorics
– number theory (where extensive knowledge of theorems is required).

No calculus is required. Supporters of not requiring calculus claim that this allows “more universality and creates an incentive to find elegant, deceptively simple-looking problems which nevertheless require a certain level of ingenuity”.

Rank Country Appearance Gold Silver Bronze Honorable_Mentions
0 1 China 32 147 33 6 0
1 2 United States 43 119 111 29 1
2 3 Russia 26 92 52 12 0
3 4 Hungary 57 81 160 95 10
4 5 Soviet Union 29 77 67 45 0
5 6 Romania 58 75 141 100 4
6 7 South Korea 30 70 67 27 7
7 8 Vietnam 41 59 109 70 1
8 9 Bulgaria 58 53 111 107 10
9 10 Germany 40 49 98 75 11

Debt | 债务

Exactly two years ago, we wrote about national debt.  It was close to $20 trillion at that time.  Now it is $22 trillion.

We are  presenting  very large numbers.

But large is only a relative term, depending on the unit we are using, and relative to what.

According to the Institute of International Finance, global debt, as of 3Q2018, is close to $244 trillion.
About one third of the debt was added in the last ten years or so. So that means that over the last ten years the total global debt grew by a half.

You can see it from the Global Debt Monitor January 2019 Report.

This probably does not mean much to you or me, unless we have some comparisons.

Visuals can help you see the numbers, but it stops short of helping us to understand the number, since money in dollars is just money in dollars unless we compare it with something.

How about we compare it with GDP (gross domestic product)? GDP in dollar is the value of all the things people produce or service for a period of time in dollar.

So debt to GDP ratio is like the amount of money you owe at the end of the year relative to the amount of money you have made over the year.   When the ratio is over 1, it means what we owe is more than what we have made in a year.

Now hopefully we can understand the ratio a little bit.

For a great narrative of history of US debt to gdp ratio, see “The Long Story of U.S. Debt, From 1790 to 2011, in 1 Little Chart” from The Atlantic by Matt Phillips.

The article was written on Nov 13, 2012. But history does not go away.

You can connect the dots to the following chart, which you can find from Federal Reserve Bank of St. Louis.  It seems that we have debt to GDP ratio getting close to historical highest level.

That was right after World War II.

So what is in the US debt?

The total US debt now is about $22 trillion.

The U.S. debt to China is $1.138 trillion as of October 2018. That’s 29 percent of the $3.9 trillion in Treasury bills, notes, and bonds held by foreign countries.

The rest of the $22 trillion national debt is owned by either the American people or by the U.S. government itself. China has the greatest amount of U.S. debt held by a foreign country.

Domestically, the total US household debt as of 4Q2018 is at $13.54 trillion (New York Fed). For a fantastic and fascinating visual account of the numbers, see the report by New York Fed.

You can find the numbers and reports easily from different federal reserve banks and government office such as the Congressional Budget Office, and the US Treasury.

These numbers, ratios and time series by component are a lot more interesting and tell a whole lot more than everyday noisy news.

Herstory of money-1 | 钱的历史

This is the first installment of a series of post about money, cryptocurrency and credit scoring, accompanied by Python Jupyter Notebook in our GitHub repo on credit scoring.

In this post we talk about paper money 纸币.  The reason why we keep it in the practical math category is because the herstory of money is also the herstory of math.  In God we trust and in math we trust.  God made the universe with beautiful math.

Did you know that paper money 纸币 was first used in ancient China around the 11th century 北宋朝?

Paper money was used broadly during those days due to shortage of copper and the convenience of paper money. However, the convenience combined with the unlimited power of the government to print money lead to inflation, subsequently the loss of credibility of the government, and its eventual downfall. So, even though the Northern Song dynasty had an advance monetary system, its credit failed due to long and costly wars.

Did you know that the Chinese Southern Song 南宋 dynasty government printed money in no less than six ink colors to prevent counterfeiting?

They printed notes with intricate designs and sometimes even with mixture of unique fiber in the paper to avoid counterfeiting. That was in 1107!

Backed by gold or silver too?

Isn’t it amazing that their nationwide standard currency of paper money was backed by gold or silver?! That was in between 1265 and 1274.

In the 13th century, Chinese paper money of Mongol Yuan 元 became known in Europe through the accounts of travelers, such as Marco Polo

“All these pieces of paper are, issued with as much solemnity and authority as if they were of pure gold or silver… with these pieces of paper, made as I have described, Kublai Khan causes all payments on his own account to be made; and he makes them to pass current universally over all his kingdoms and provinces and territories, and whithersoever his power and sovereignty extends… and indeed everybody takes them readily, for wheresoever a person may go throughout the Great Khan’s dominions he shall find these pieces of paper current, and shall be able to transact all sales and purchases of goods by means of them just as well as if they were coins of pure gold”
— Marco Polo, The Travels of Marco Polo

Python for fun 4

Today we work on animation with hundreds of triangles, balls, squares and vickies (you can call the shapes anything you like).   Shapes are defined by coordinates (x,y).   A huge improvement over last time is that we use a loop to add shape to lists of shapes.

Pineapple math

This Saturday morning, Victoria went on an adventure of counting how many “eyes” a pineapple has.  She used a marker to mark the ones she has counted: 10, 15,

It is not Fibonacci sequence as we expected. Why?

Geometry and machine learning

Geometry helps us see clearly without cluttering our minds with jargon and notations.  We will begin with something everyone in machine learning does.

Linear Regression= ” linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.”

What does it mean? Let’s think about it from geometry.  If we don’t know how tall Emma is (dependent variable y, or “label”) since we haven’t met her. But we can guess her height by other numbers, such as the length of her shadow (independent variable X, or “feature”). Our guess can be quite accurate if we are in same city where she lives and we are measuring everything at the same time of day.

That was easy.   Now we are going to guess the height of many different people, based on the the length of their shadows.  Again, our guess can be quite accurate if we all live in same city  and we are measuring everything at the same time of day.   But this is usually not the case.    Say if we live in northeast America and the measures are taken during early afternoon in autumn, our predictions may not be perfect but are still in the ball park.  Otherwise we really cannot make a good prediction.

What are we to do with measures that are not taken under perfect conditions? If we know the location and time when the measures are taken, then we can make adjustments to the numbers before using them. But there still can be lots of problems if we made mistakes in taking down the numbers or locations or time. Like a trades person, we have different tools to fix problems. One is called “least squares method”. Some other methods are “weighted least squares”, “maximum likelihood”, “gradient descent” and etc.

What is gradient (descent), and how it is used

As I remember from my college math class, gradient is the vector of partial derivatives of a function with respect to each of its variables.  Now, it seems people use it interchangeably with the derivative, or the rate of change of the function.   The gradient vector at a point is supposed to tell you the rate of change along the x direction, the rate of change along the y direction, and so on.   By using gradient, you know the rate of change in any direction using Pythagoras theorem.

So what is gradient descent?  And why descent?    The word descent is relative to problems that are trying to find the minimum of the function.   If your problems are always trying to find the maximum, you would want gradient ascent.   So let’s focus on the minimization problems.

By going opposite direction of the gradient on a convex surface, you will always be going down.   If a surface is hard to visualize, you can use a parabola y=x^2 as an example.   The gradient (derivative) is 2x.  When x is positive, we have to go in the negative direction.  When x is negative, we have to go in the positive direction.   Why?  It just happens this way given how we define our coordinate system.

However, in real life, the function may have more than one local minimums and global minimums).   To find the minimum of the function, we can use gradient as a tool.  It may be complicated in how to make it work.

For a simple surface, you can see visually an “opera” version of the interpretation (totally unnecessary although somewhat entertaining) in this video below:


Three types of gradient descent:

Two extreme ways of gradient descent and one in the middle: batch, stochastic (aka “SGD”), and mini-batch.

At one extreme, SGD goes through the full loop of error computing and model update one example at a time. Intuitively, if the data is not very noisy, this method can gives quick insight of what is happening. This method also has low requirement for computer memory. But it may not get stable results when the data is noisy (which is usually the case).

At the other extreme, batch sweeps through all data and computes all the errors (in memory) before updating the model. Intuitively, this gives a more stable result, but the result may have converged on a local minima. Furthermore, it would face hardware challenges when the data size is large.

What is in the middle is mini-batch. It splits the training data into mini batches, and performs batch GD on individual mini batches.

Essential tips to remember when actually running GD:
Normalize the data first: center each column of the data at 0, divided by its standard deviation so that its “normalized” standard deviation is 1. This is to make sure every column has the same scale. Otherwise some may overpower others for nothing but size. Note that normalizing data is also an essential step in some clustering and principal component analysis (PCA). Nearly all software can do this easily.

What is machine learning

AlphaGo has just become world champion in the game of Go.  But this is expected.  We have seen that sewing machines sew and cars run faster than people.   Ingenious inventors and engineers create machines that do things better than we can, and that is the purpose of creating machines: machine doing (what we make them to do).  Now, what is machine learning?

First of all, we know that computers (machines) can compute numbers using 0’s and 1’s.

Second, we know that lots of information can be represented by numbers or converted into numbers.

  1. Things are measured in numbers:
    • age
    • temperature measures hot or cold
    • weight measures heavy or light
    • speed measures fast or slow
    • prices measure expensive or cheap
    • amount of money measure how much money a person/company has
  2. Things that are not directly measured in numbers but can be represented by numbers
    • alphabets
    • colors, and by extension, images (colors in lots of pixels)
    • sound
    • just about anything, if we understand what it is.

If we link 1 & 2 together, we can try to make computer process information for us.  That has been what we have done since computers were invented.  We write programs to tell computer do things for us.

Machine/computer learning is figuring out ways to make machines (computers) learn the way we do.  The keyword is “learn“, which means getting better and better at the subject that is learned.  Below picture shows a very simple kind of learning:  find solutions to the lowest point on the error surface.

There is a great series of articles on machine learning called “machine learning is fun” (where the image above came from).  Besides good explanations, what I like the most from the articles is the following statement that many people (including people who are supposed to know this well) should remember: machine learning is NOT magic!  There are people who will be more than happy to sell you products that are supposed to perform machine learning magic if you don’t understand how machine learning works.

“So remember, if a human expert couldn’t use the data to solve the problem manually, a computer probably won’t be able to either. Instead, focus on problems where a human could solve the problem, but where it would be great if a computer could solve it much more quickly.”

But the best ideas are from Deep Mind founder Demis Hassabis, whose team created AlphaGo.  I have watched some of Hassabis’ talks multiple times.   Watching this video below carefully.  It is exciting!

What is convolutional neural network

In practice, a CNN learns the values of these filters on its own during the training process (although we still need to specify parameters such as number of filters, filter size, architecture of the network etc. before the training process). The more number of filters we have, the more image features get extracted and the better our network becomes at recognizing patterns in unseen images.