Clean up the m_ssing data
September 22, 2017"Garbage in, garbage out". Do not produce bad report because of bad data. Welcome to new package, CleanData, solution for solving the missing data in your report. The package is free to download.
Problem
Missing data is always a problem in data analysis in general and report making in particular. It causes:
- Errors in your report. Missing data comes with NaN or null value which cause trouble for calculation.
- Wrong or bias result. For example,
null
value will drive the average down.
Solution
The missing data can be solved generally in two ways. You can cancel the rows with missing data or fill the missing data with reasonable value. The CleanData
package provide all two solutions though two classes DropNull
and FillNull
.
The DropNull
has capability of detecting null
value in your data table. We can detect null
in all columns or just some of specific columns either by column names, column types (number,string, datetime). The DropNull
also take into consideration of how serious the missing data is. We can set the threshold which is the minimum number of missing values in a row to be consider being dropped.
The FillNull
also detects the null
value in all columns or in particular column names or types. It will try to replace the null
value with a new value. That's value can be set by us for example we want to replace all null
value with 0
. Or we can replace missing value with smarter number such as MEAN or MEDIAN. FillNull
can helps you to do that.
->pipe(new FillNull(array(
"newValue"=>FillNull::MEAN,
"targetColumnType"=>"number",
)))
Summary
The CleanData package is a good solution for dataset with missing data. In the future, we would like to apply some AI technique to better fill data to the missing one. The AI could suggest a reasonable value based on its observation of relation among surrounding data. KoolReport will be best report tool for php.
<3 koolreport team