# Logarithm | 对数

As we had explored in previous classes, division is subtraction again and again and again, multiplication is adding again and again.  Exponentiation is multiply again and again and again— They are all inventions to simplify repeated computation.

So is the invention of logarithm: taking log is division again and again and again.   They were invented by John Napier who was a Scottish mathematician, physicist, and astronomer  in 1614 as a means to simplify calculations.

🙂 Today’s  Python numpy class summary:

Log10 means how many times divide by 10 will return you to 1. log10(100) will give you 2 because 100 divide by 10 twice returns us to one.
>>> np.log10(100)
One trillion divide by 10 twelve times returns it to 1.
>>> np.log10(1000000000000)

>>> np.linspace(0.0, 3.0, num=4)
Out: array([0., 1., 2., 3.])

>>> np.logspace(0.0, 3.0, num=4)
Out: array([   1.,   10.,  100., 1000.])

>>> np.linspace(0.0, 12.0, num=13)
Out: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])
>>> np.logspace(0.0, 12.0, num=13)
Out: array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10, 1.e+11, 1.e+12])

Bonus:  Did you know that Engineers and scientists used to use a tool called “slide rule” (计算尺) to do logarithmic computations until 1970s when electronic computer and calculators came into use.  You should go and check it out if any of your grandparents have one of these.

# Find if something is also somewhere else (contd) | 找一找那里是不是也有

Today’s class we continued the game of finding matches.  We expanded from numbers to names.

Let’s pretend that there is a room out there that has the following famous people:

And another room with these famous people:

room2 = pd.Series([‘Isaac Newton’, ‘Thomas Edison’,’Mary Somerville’,’Matilda’,’Ada Lovelace’])

These two rooms are in building:

building = pd.concat([room1, room2], axis=1)

building.columns= [‘room1′,’room2’]

room1            room2
0     Grace Hopper     Isaac Newton
1  Albert Einstein    Thomas Edison
3     Emmy Noether          Matilda

Now we want to find a list of people “who” in the building and the rooms.

Are they in room 1?

`np.isin(who, room1)`
Out: array([ True,  True,  True])

Are they in room2?

`np.isin(who, room2)`

Out: array([False, False,  True])

Are they in the building?

`np.isin(who, building)`
Out: array([ True,  True,  True])

We are constantly comparing things.  How to compare is a very tricky and interesting subject.  You should look up the source code of the function in1d and see how it does it.

# Find if something is also somewhere else | 找一找那里是不是也有

In today’s class we played a game: finding the numbers in a group that are also in another group. As usual, we started simple using numpy (np).   Here is the summary:

`score1 = np.array([0, 1, 3, 5, 10,3])`

`score2 = np.array([2,3])`

We use the inld function from numpy to do the matching and counting:
`np.in1d(score1,score2)`

`score1[np.in1d(score1,score2)]`

Our result is:

array([3, 3])

There is another way to do this:

`np.array([item in score2 for item in score1])`

array([False, False,  True, False, False,  True])

score1[np.array([item in score2 for item in score1])]

will give you exactly the same answer.   Try it.

For very large groups of numbers or words, what we’ve just learned will work just as well.

Please practice and turn in the homework.

# Count non-zeros using numpy.count_nonzero | 数非零数

Today our class practiced making the computer count number of non-zero numbers using the numpy library from Python.  This can be useful if you have a ton of numbers.

`import numpy as np; import pandas as pd`

`some_array = np.array([[0,1,7,0,0],[3,0,0,2,19]])`

array([[ 0,  1,  7,  0,  0],
[ 3,  0,  0,  2, 19]])

`np.count_nonzero(some_array)`

5

`np.count_nonzero(some_array,  axis=0) ` Count across the rows, i.e. count along the column

array([1, 1, 1, 1, 1], dtype=int64)

`np.count_nonzero(some_array,  axis=1) ` Count across the columns, i.e. count along the row

array([2, 3], dtype=int64)

`d = {'Basket1': [3, 0], 'Basket2': [3, 4]}`
`df = pd.DataFrame(data=d, index=['Apple','Chips'])`

# Count the number of non-zeros across the rows
`pd.Series(np.count_nonzero(df, axis=0), index=df.columns.tolist())`

This was the result we got.

dtype: int64

That was a very tiny data. If we have a dataset with a million rows and columns, we should definitely do this!

# GitHub repo vs gist

• For sharing snippets 代码片段 (secret or public), use gist. Gist keeps all revision history without making a fuss. No tickets or pull requests. Note: secret is not private, but you would need the url to access, unlike public gists.
• For projects that want publicity and open collaboration, use something like GitHub repo.

# Parents and students in China – impressions from our summer trip | 中国家长和学生- 暑期中国行印象

We always knew that Chinese parents and teachers are dedicated to education.   Even so, we were still amazed by how Chinese teachers and parents are working hand-in-hand pushing the boundaries of education.

We don’t necessarily agree with everything they do, such as the bias towards solving test problems in classes.  But there are a great deal more things we agree or appreciate, such as relentless hard work and practice than ones we don’t appreciate.

Below is a snapshot of a first-day summer class for seven year old children.   Just look at how intently the parents are.   Some parents sat through the entire class to take notes, and some sat in the lounge in the corridor chatting about schools or doing various things while waiting for their kids.    The temperature was about 100 degrees outside. The level of dedication is astonishing.

From time investment perspective, at least one family member, whether mother, father or a grandparent, spends as much as 20 hours and up each week on their children’s education.

From the money side, summer/winter break, weekend, or after-hour schoolings are privately run, which aren’t cheap.   Some well-to-do ones have spent hundreds of thousands (measured in US dollars) before high school.  Some bold ones even send their kids, sometimes as young as eight or nine years old, to overseas private boarding schools in exclusive locations in Switzerland, the UK and other places.    Poorer families still pay for various lessons to make sure their kids are as nourished as possible in education.   For those very constraint in resources, such as those parents who must work 7 days a week, we saw their kids studying with video lessons on mobile phones in cram corners instead of hanging out loose.

The primary motivation is the quality of survival for the next generation: to get into top middle schools, top high schools, top universities, great social network and ultimately great jobs in adulthood.   Parents start count down of the number of days till college entrance exam even at primary schools.   Kids routinely study until mid-night since third grade, and don’t get a day off until winter/summer break.

On the contrary, in United States, students and parents are heard asking for less homework and more free time to play.    Over the last two decades, the quality of education, as measured by test scores, have steadily declined in the US.   Less work is a key factor.  However, a deeper question is: why do American parents and kids want less work from schools?   These questions need to be answered by representative data, not ideaologies.

# Saving Ipython script history

`%save sessionName linesToKeep`

This will save script in line numbers you specify in linesToKeep,for example, 1-20 34-50 64 into a file called “sessionName.py” in your current working directory.

If you are not so specific on which lines you want to keep, you can save everything.

`%save sessionName ~0/ `

This saves everything from the current session, denoted as “~0″ into file “sessionName.py” in your cwd (current working directory).

`%save pastSession ~1/`

This saves everything from the past session, denoted as “~1” into file “pastSession.py” in your cwd.

# Discussion on AI to a child | 和孩子讨论人工智能

One of the children asked in class whether it was true that AI would replace a lot of jobs.   Yes and no, I said.   Our current AI is very good at pattern recognition, but not good at other things.   It does not have a brain like we do.

I started to explain that our current AI is like a model, which they know a bit.   To give a concrete example, I said, suppose you have two sets of numbers, one set is 1, 2, 3, 4, 5, and the other set is 2, 4, 6, 8, 10.  The computer is very good at (because we make it so) figuring out that the second set is twice of the first set.    It can recognize faces because all faces have 2 eyes, 1 nose, and 1 mouth, and have similar shapes.    In summary, it is very good at figuring out patterns if you let the computer have lots of them.

The children asked many great questions.  Some questions are truly profound, and can basically summarize the essence of the whole business of modeling.  It is amazing that how uncluttered minds can go straight to the point and not fooled by jargons or buzz.

# distill.pub

The site is so good that its site name deserves to be in the title: https://distill.pub

What is it? It is the merge of digital science, math and art that translates technical hypes like AI into something that feels like poetry.

# scikit-learn processes simplified | 简化scikit-learn过程

Code light and concept heavy–that’s been the trend as programming languages stack on one another to wrap up more and more algorithms with fewer and fewer lines of code. A child can run a linear regression model. Yes, you hear it right. We have done it, and we summarize the building blocks of Python machine learning library scikit-learn in this table:

 Process Code Import libraries %matplotlib inlineimport numpy as npimport pandas as pdimport matplotlibimport matplotlib.pyplot as plt Format pd.options.display.float_format = ‘{:20,.3f}’.format Import data from sklearn.datasets import load_bostondataset = load_boston() Prepare data y = dataset.targetX = dataset.data Split data from sklearn.model_selection import train_test_split Algorithm a from sklearn.linear_model import RidgeCVmy_regr = RidgeCV() Train-predict my_regr.fit(X_train, y_train)y_pred =my_regr.predict(X_test) Plot – performance f, ax =plt.subplots(1, 1)ax.scatter(y_test, y_pred)plt.plot([0, 50], [0, 50], ‘–k’)ax.set_ylabel(‘Target predicted’)ax.set_xlabel(‘True Target’)ax.set_title(‘Ridge regression on test data’)ax.text(5, 40, r’\$R^2\$=%.2f, MAE=%.2f’ % (   r2_score(y_test, y_pred), median_absolute_error(y_test, y_pred)))ax.set_xlim([0, 50])ax.set_ylim([0, 50]) Interpretation interpretation = pd.DataFrame({‘X’: dataset.feature_names, ‘coef’: my_regr.coef_}) Algorithm b from sklearn.model_selection import cross_val_predictfrom sklearn import linear_model Train-predict my_regr = linear_model.LinearRegression()y_pred = cross_val_predict(my_regr, X, y, cv=10) Plot – performance f, ax =plt.subplots(1, 1)ax.scatter(y, y_pred, edgecolors=(0, 0, 0))plt.plot([0, 50], [0, 50], ‘–k’)ax.set_ylabel(‘Target predicted’)ax.set_xlabel(‘True Target’)ax.set_title(‘linear regression with cross validation’)ax.text(5, 45, r’\$R^2\$=%.2f, MAE=%.2f’ % (   r2_score(y, y_pred), median_absolute_error(y, y_pred)))