What is Data Distribution?
Data Distribution is a list of all possible values, and how often each value occurs or their relative frequency.
The methods available in random module of numpy library is used to get the randomly generated data distributions.
Random Distribution
A random distribution is a set of random numbers that follow a certain probability density function, which is a function that describes a continuous probability. i.e. probability of all values in an array.
The choice method with the specified parameters as, value array, continuous probability for each value, and the shape for array generation, is used to generate random distribution.
Generating a 1-D array containing 10 values, where each value has to be 6, 7, 8 or 9.
from numpy import random
x = random.choice([6, 7, 8, 9], p=[0.1, 0.0, 0.6, 0.3], size=(10))
print(x)
Output-
[6 6 8 8 8 8 8 8 9 9]
Explanation-
p=[0.1, 0.0, 0.6, 0.3] – The probability for the value to be 6, 7, 8, 9 is set to be 0.1, 0.0, 0.6, 0.3 respectively.
and the sum of all probability values [0.1, 0.0, 0.6, 0.3] should be equal to 1.
Generating a 2-D array with 3 rows, each containing 2 values.
from numpy import random
x = random.choice([6, 7, 8, 9], p=[0.1, 0.0, 0.6, 0.3], size=(3,2))
print(x)
Output-
[[9 8]
[8 9]
[8 9]]
Leave a Reply