# AI Learning Road (7): Generate training data AI training data generation

2023-03-18

After the preparation of the previous, now come to the new stage of generating training data. Before generating data, let’s look at this piece of news. It can also be said that it is an old smell.

In June 2012, the New York Times disclosed the Google Brain project and attracted extensive attention from the public. This project is co -led by Andrew Ng, a professor of machine learning at the famous Stanford University and Jeff Dean, the world’s top expert in large -scale computer systems. (DNN, Deep Neural Networks), enabling it to self -train itself to identify the 14 million pictures of 20,000 different objects. Before starting to analyze the data, you do not need to manually enter the system to enter any characteristics such as “the appearance of the face, limbs, and cats.” Jeff Dean said: “We never tell the machine during training:‘ This is a cat ’(that is, no sample). The system is actually the concept of inventing or understanding‘ cat ’.”
In March 2014, it is also based on deep learning methods. Facebook’s DeepFace project has made the recognition rate of face recognition technology reached 97.25%, which is slightly lower than the accuracy rate of 97.5% of human recognition. Compared with humans. The project uses 9 -layer neural network to obtain facial characteristics, and the parameters of neural network processing are as high as 120 million.

From this news, we can see that you must have a lot of training data to use TensorFlow. Here are 14 million pictures, and the calculation amount is amazing. With data, you can let the machine learn by yourself and construct your own deep neural network. But we all learn TensorFlow in ordinary PCs, so the number should not be too much, otherwise the calculation is too slow. Back to our example, it is a very simple linear regression problem. If the traditional programming method is used, it is based on mathematical methods, and then abstract it and decompose it, such as using the minimum dharma method to solve, and then let the developers implement the algorithm by line code. Now that I have come to TF, it seems that the problem of linear regression has become simple. It does not require program developers. Only one operator needs to enter the experimental data to TF to automatically calculate the equation. And TF can not only calculate the one -dimensional linear regression, but also calculate the multi -dimensional linear regression without modifying any code, just train different input data training. It looks like a template in C ++, which is a common, flexible and adaptive template.

In fact, you can treat TF as a child. If you want to let him understand, you must first tell it what the word corresponds to. These words and meaning are the training data, which is the basis for TF to generate neural networks.

In this example, for simply, use the following code to generate training data:

``````# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3``````

This code uses the Random module of the Numpy Curry Module Rand to generate random numbers. The previous article has learned the use of this function. If you don’t know it, you can look back. Because the default data generated by the default is 64 -bit floating -point numbers, in order to simply convert to 32 -bit floating -point numbers, using ASSTYPE (NP.Float32) to convert. After this code is running, it will generate the following data:

====================== RESTART: D:/AI/sample/tf_1.15.py ======================
[ 0.70405102  0.61570853  0.87308896  0.45296744  0.37736508  0.6797002
0.09379926  0.32359388  0.46764055  0.77653575  0.15795791  0.65861601
0.26614654  0.2556974   0.89700532  0.60780615  0.43122688  0.89048529
0.77601838  0.15203112  0.98827535  0.5890581   0.32155743  0.23476918
0.42037088  0.88411021  0.76413459  0.54980898  0.61089933  0.06307018
0.70900387  0.89800489  0.42763299  0.52450663  0.91588277  0.09274247
0.06676489  0.948502    0.00355364  0.82583654  0.45627272  0.23299092
0.27515557  0.36546576  0.12408578  0.8973456   0.83187389  0.94323117
0.20377819  0.27644819  0.59617561  0.8375017   0.6347205   0.37442347
0.94081807  0.36718944  0.85045648  0.04433619  0.07171503  0.14927267
0.55873674  0.63533461  0.15528481  0.21342377  0.0684417   0.33737803
0.07325422  0.13422777  0.47799423  0.08758666  0.0748972   0.16612834
0.87103868  0.51996118  0.64916074  0.59876722  0.13151391  0.58004898
0.40890983  0.12511343  0.24550414  0.89369571  0.44549868  0.5793246
0.97435832  0.00236449  0.74470139  0.28310984  0.41833594  0.15089646
0.19401258  0.19051966  0.18906432  0.05238314  0.7398296   0.067056
0.87864172  0.91925138  0.54840708  0.24474047]
[ 0.37040511  0.36157086  0.3873089   0.34529674  0.33773652  0.36797005
0.30937994  0.3323594   0.34676406  0.3776536   0.31579581  0.36586162
0.32661468  0.32556975  0.38970053  0.36078063  0.34312269  0.38904855
0.37760186  0.31520313  0.39882755  0.35890582  0.33215576  0.32347694
0.34203711  0.38841105  0.37641346  0.35498092  0.36108994  0.30630702
0.37090039  0.38980049  0.3427633   0.35245067  0.3915883   0.30927426
0.30667651  0.39485022  0.30035537  0.38258368  0.34562728  0.32329911
0.32751557  0.3365466   0.3124086   0.38973457  0.38318741  0.39432314
0.32037783  0.32764482  0.35961756  0.3837502   0.36347207  0.33744237
0.39408183  0.33671895  0.38504565  0.30443364  0.30717152  0.31492728
0.35587367  0.36353347  0.31552848  0.32134238  0.30684417  0.33373782
0.30732542  0.3134228   0.34779942  0.30875868  0.30748972  0.31661284
0.38710389  0.35199612  0.36491609  0.35987672  0.31315139  0.3580049
0.340891    0.31251135  0.32455042  0.38936958  0.34454989  0.35793248
0.39743584  0.30023646  0.37447014  0.328311    0.34183359  0.31508964
0.31940126  0.31905198  0.31890646  0.30523834  0.37398297  0.30670562
0.38786417  0.39192516  0.35484073  0.32447407]
>>>

Generate these data running code:

``````import tensorflow as tf
import numpy as np

x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
print(x_data)
print(y_data)``````

By simple two -line code, 100 training data can be generated. Here, we are generated using a linear equation. The slope is 0.1 and the cutting distance is 0.3. With these data, we can provide it for TF training. Here we did not tell TF about slope and interception, but asked TF to find it through X and Y data. In fact, it is like we take these data to ask a junior high school student, let him observe the data, and restore the slope and cutter. As for what method this junior high school student uses and what process, we don’t need to care about it. Therefore, students who do physical experiments in the future will be simple to do report homework. You only need to get the experimental data and directly give the data to TF. It automatically calculates the equation. Can TF find the corresponding slope and cutting? Is it a bit worried and listen to the next decomposition.

source