Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory39.2 KiB
Average record size in memory40.1 B

Variable types

NUM5

Reproduction

Analysis started2020-08-25 13:49:51.681350
Analysis finished2020-08-25 13:49:56.523497
Duration4.84 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

a has unique values Unique
b has unique values Unique
c has unique values Unique
d has unique values Unique
e has unique values Unique

Variables

a
Real number (ℝ≥0)

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49603561662410556
Minimum0.00011036852237689132
Maximum0.9983580831890366
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB

Quantile statistics

Minimum0.0001103685224
5-th percentile0.03892341171
Q10.2410683702
median0.5024830366
Q30.7375589717
95-th percentile0.9471630622
Maximum0.9983580832
Range0.9982477147
Interquartile range (IQR)0.4964906016

Descriptive statistics

Standard deviation0.2898457978
Coefficient of variation (CV)0.5843245688
Kurtosis-1.202722056
Mean0.4960356166
Median Absolute Deviation (MAD)0.2492625755
Skewness-0.008708593557
Sum496.0356166
Variance0.0840105865
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.938586406110.1%
 
0.11443313710.1%
 
0.475293162110.1%
 
0.210246392510.1%
 
0.460379202310.1%
 
0.211391527510.1%
 
0.598656938410.1%
 
0.708639137810.1%
 
0.413899201510.1%
 
0.696264316310.1%
 
Other values (990)99099.0%
 
ValueCountFrequency (%) 
0.000110368522410.1%
 
0.00016706333110.1%
 
0.00153464604310.1%
 
0.00174418192910.1%
 
0.00282927698410.1%
 
ValueCountFrequency (%) 
0.998358083210.1%
 
0.996994026510.1%
 
0.996297475910.1%
 
0.995693852610.1%
 
0.99467224910.1%
 

b
Real number (ℝ≥0)

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4976896441671239
Minimum3.255334367957552e-05
Maximum0.9998802914813884
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB

Quantile statistics

Minimum3.255334368e-05
5-th percentile0.05840948171
Q10.2440723592
median0.48119514
Q30.7571758405
95-th percentile0.9540105859
Maximum0.9998802915
Range0.9998477381
Interquartile range (IQR)0.5131034814

Descriptive statistics

Standard deviation0.2934853901
Coefficient of variation (CV)0.5896955936
Kurtosis-1.247295364
Mean0.4976896442
Median Absolute Deviation (MAD)0.2556933251
Skewness0.04871458042
Sum497.6896442
Variance0.08613367422
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.10437190910.1%
 
0.80543450410.1%
 
0.281239153910.1%
 
0.661354826510.1%
 
0.260403984210.1%
 
0.96253232710.1%
 
0.4613338710.1%
 
0.824247753510.1%
 
0.201458112910.1%
 
0.33715321410.1%
 
Other values (990)99099.0%
 
ValueCountFrequency (%) 
3.255334368e-0510.1%
 
0.000954337743210.1%
 
0.00184383239510.1%
 
0.00232055907610.1%
 
0.00266935387810.1%
 
ValueCountFrequency (%) 
0.999880291510.1%
 
0.998656867510.1%
 
0.998411767910.1%
 
0.997156153910.1%
 
0.996300254810.1%
 

c
Real number (ℝ≥0)

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5002124154518173
Minimum0.00016299075369019533
Maximum0.9990704237749324
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB

Quantile statistics

Minimum0.0001629907537
5-th percentile0.04327553595
Q10.2474555422
median0.5167601724
Q30.7461510712
95-th percentile0.956067994
Maximum0.9990704238
Range0.998907433
Interquartile range (IQR)0.498695529

Descriptive statistics

Standard deviation0.2900708012
Coefficient of variation (CV)0.579895245
Kurtosis-1.172289288
Mean0.5002124155
Median Absolute Deviation (MAD)0.2489490884
Skewness-0.02498797573
Sum500.2124155
Variance0.08414106973
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.767308628710.1%
 
0.223595394710.1%
 
0.981003716910.1%
 
0.582817962310.1%
 
0.246394026110.1%
 
0.516402047510.1%
 
0.412879984110.1%
 
0.622311413910.1%
 
0.520498362110.1%
 
0.77254996610.1%
 
Other values (990)99099.0%
 
ValueCountFrequency (%) 
0.000162990753710.1%
 
0.00120747387610.1%
 
0.00158314670910.1%
 
0.00168173751610.1%
 
0.00176075477110.1%
 
ValueCountFrequency (%) 
0.999070423810.1%
 
0.997905019910.1%
 
0.9968334910.1%
 
0.996373795510.1%
 
0.995498093210.1%
 

d
Real number (ℝ≥0)

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5108146611339072
Minimum0.00037029279084022093
Maximum0.9995547781577581
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB

Quantile statistics

Minimum0.0003702927908
5-th percentile0.05405086738
Q10.2512262066
median0.5231901836
Q30.7652953195
95-th percentile0.9562950624
Maximum0.9995547782
Range0.9991844854
Interquartile range (IQR)0.5140691129

Descriptive statistics

Standard deviation0.2896394878
Coefficient of variation (CV)0.567014829
Kurtosis-1.188396004
Mean0.5108146611
Median Absolute Deviation (MAD)0.254591609
Skewness-0.06505746225
Sum510.8146611
Variance0.08389103287
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.446281258210.1%
 
0.962624363410.1%
 
0.403554672210.1%
 
0.0784631192210.1%
 
0.774170614510.1%
 
0.402834402310.1%
 
0.681274648810.1%
 
0.527564701310.1%
 
0.730716804110.1%
 
0.681947302410.1%
 
Other values (990)99099.0%
 
ValueCountFrequency (%) 
0.000370292790810.1%
 
0.00135928198810.1%
 
0.0018615639310.1%
 
0.00304620185610.1%
 
0.00310127508910.1%
 
ValueCountFrequency (%) 
0.999554778210.1%
 
0.996543248610.1%
 
0.996092348710.1%
 
0.995351055410.1%
 
0.992592580810.1%
 

e
Real number (ℝ≥0)

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5010268385918255
Minimum0.00037950967686839476
Maximum0.9997855084461895
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB

Quantile statistics

Minimum0.0003795096769
5-th percentile0.04960697893
Q10.2526117816
median0.4910308642
Q30.7677758696
95-th percentile0.9342226726
Maximum0.9997855084
Range0.9994059988
Interquartile range (IQR)0.5151640881

Descriptive statistics

Standard deviation0.289226879
Coefficient of variation (CV)0.5772682353
Kurtosis-1.251931332
Mean0.5010268386
Median Absolute Deviation (MAD)0.259204655
Skewness0.01192021794
Sum501.0268386
Variance0.08365218752
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.230102534910.1%
 
0.0660560877810.1%
 
0.19350582210.1%
 
0.559715298310.1%
 
0.87040076810.1%
 
0.150915232410.1%
 
0.758169431710.1%
 
0.88294866810.1%
 
0.743241395810.1%
 
0.203299416410.1%
 
Other values (990)99099.0%
 
ValueCountFrequency (%) 
0.000379509676910.1%
 
0.00085987859710.1%
 
0.00131453563210.1%
 
0.00462982806310.1%
 
0.00492923828210.1%
 
ValueCountFrequency (%) 
0.999785508410.1%
 
0.996987626610.1%
 
0.99622516310.1%
 
0.994559506710.1%
 
0.993542396810.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

abcde
00.1686050.0884190.6310290.5330960.917958
10.3158950.0738480.8294130.1437370.280338
20.1378300.8694740.5018730.8086150.149282
30.4937170.0344210.5289370.6000670.932171
40.3616750.2057240.7426650.1290080.637564
50.7695930.7726780.2450680.1486800.726782
60.4733840.4561810.6658890.1778210.019648
70.1830220.4768330.0012070.6884010.688314
80.5329890.1413230.5144370.8553950.127710
90.1961390.8362040.5681910.3894680.274022

Last rows

abcde
9900.7322590.5627890.4476960.8044640.634221
9910.7390070.7539290.1651270.8440810.801953
9920.4479960.5852460.9546700.0594090.783304
9930.2951820.2110350.0408360.1693370.150936
9940.3359730.2524200.5664530.0247000.012688
9950.2654950.5484240.9587610.2914870.488196
9960.9174190.8970770.5822880.0378520.588344
9970.5076820.8162180.8431390.7693460.287649
9980.3946970.4634180.1883580.8295620.928909
9990.9095440.6967540.1683890.9484270.036411