When you are looking at data it can be helpful to place it into categories based using what I call a category tree breakdown. This can give us additional information that may not be apparent without looking at the breakdown.
In this simple example, say we have gathered a simple dataset on various political candidates, including: First Name, Last Name, Party Affiliation, Gender, State Affiliation, and Job. Maybe we would want to see which genders were associated with which jobs or what the similarities are across party or state affiliations.
Each of these could be done individually in SQL, but that would be inefficient and wouldn't expose the power of being able to perform a breakdown on the fly.
For example, let's say I want to look at party affiliation. Then for each party I want to know what are the positions held by those candidates. Then for each of those jobs I want to see which states have an affiliation.
That can be done by specifying the breakdown array as follows:
When this breakdown is applied to the dataset, the following result displays:
Republican
Senator
Texas
Governor
Ohio
Chairman
New York
Democrat
Secretary of State
New York
Senator
Vermont
The breakdown can be specified in any order we want. Say we want to look again at the party affiliation, but this time by gender and first name:
Republican
male
Ted
John
Donald
Democrat
female
Hillary
male
Bernie
The way this code works is that it continually slices off a category index to look for in the dataset. Then it loops through the passed rows and puts them into groups based on their values for the index. These newly categorized values (which are stored in the $to_add array) need to be again gone through for the next breakdown (if there is one) and so are passed along to the next iteration so the next breakdown can be checked. The remaining breakdown list, the unchecked values that were just added, and the $data object containing the result are all passed back into the method as a recursive call.
When there are no more rows to be added and subsequently checked, the recursive call stops being made and the resulting $data object is returned to the original calling function.
Unfortunately this call can be expensive when many rows and categories are being parsed. If you have 'n' rows and 'm' columns to review so the big O(n) notation and where k is the number of distinct entries for each category is something like O(n) = n+k^m.
Try this algorithm to find interesting facts on your own datasets.
In this simple example, say we have gathered a simple dataset on various political candidates, including: First Name, Last Name, Party Affiliation, Gender, State Affiliation, and Job. Maybe we would want to see which genders were associated with which jobs or what the similarities are across party or state affiliations.
Each of these could be done individually in SQL, but that would be inefficient and wouldn't expose the power of being able to perform a breakdown on the fly.
The Dataset in PHP
$rows = array(All I need to do to perform a breakdown is to supply the function with the full dataset and specify which keys to look at to start building categories. The order of the breakdown is important and determines the structure of the tree that is generated as a result.
array('fname' => 'Ted', 'lname' => 'Cruz', 'party' => 'Republican', 'gender' => 'male', 'state' => 'Texas', 'job' => 'Senator'),
array('fname' => 'John', 'lname' => 'Kasich', 'party' => 'Republican', 'gender' => 'male', 'state' => 'Ohio', 'job' => 'Governor'),
array('fname' => 'Donald', 'lname' => 'Trump', 'party' => 'Republican', 'gender' => 'male', 'state' => 'New York', 'job' => 'Chairman'),
array('fname' => 'Hillary', 'lname' => 'Clinton', 'party' => 'Democrat', 'gender' => 'female', 'state' => 'New York', 'job' => 'Secretary of State'),
array('fname' => 'Bernie', 'lname' => 'Sanders', 'party' => 'Democrat', 'gender' => 'male', 'state' => 'Vermont', 'job' => 'Senator'),
);
For example, let's say I want to look at party affiliation. Then for each party I want to know what are the positions held by those candidates. Then for each of those jobs I want to see which states have an affiliation.
That can be done by specifying the breakdown array as follows:
$breakdown_arr = array('party', 'job', 'state');I like to use named keys in my array because that makes it easier to read than integer indexes.
show_breakdown($breakdown_arr, $rows);
When this breakdown is applied to the dataset, the following result displays:
Republican
Senator
Texas
Governor
Ohio
Chairman
New York
Democrat
Secretary of State
New York
Senator
Vermont
Specify Any Type of Breakdown
The breakdown can be specified in any order we want. Say we want to look again at the party affiliation, but this time by gender and first name:
$breakdown_arr = array('party', 'gender', 'fname');This shows me a completely different take on the data. We see here that while the Republican party is dominated by males, the Democratic party is split evenly by gender.
show_breakdown($breakdown_arr, $rows);
Republican
male
Ted
John
Donald
Democrat
female
Hillary
male
Bernie
The Algorithm For Performing the Breakdown
function get_array_breakdown($breakdown_indexes, $rows, &$data = null) {
if (count($breakdown_indexes) < 1) {
return array();
}
if (!$data) {
$data = new datum();
}
$metric_index = array_shift($breakdown_indexes);
$to_add = array();
foreach ($rows as $row) {
$val = $row[$metric_index];
$to_add[$val][] = $row;
}
//Now loop through the children to continue the breakdown
foreach ($to_add as $key => $to_add_val) {
$datum = new datum();
$datum->name = $key;
$datum->data = $to_add_val;
$data->addChild($datum);
get_arrays_for_breakdown($breakdown_indexes, $to_add[$key], $datum);
}
return $data;
}
The way this code works is that it continually slices off a category index to look for in the dataset. Then it loops through the passed rows and puts them into groups based on their values for the index. These newly categorized values (which are stored in the $to_add array) need to be again gone through for the next breakdown (if there is one) and so are passed along to the next iteration so the next breakdown can be checked. The remaining breakdown list, the unchecked values that were just added, and the $data object containing the result are all passed back into the method as a recursive call.
When there are no more rows to be added and subsequently checked, the recursive call stops being made and the resulting $data object is returned to the original calling function.
Unfortunately this call can be expensive when many rows and categories are being parsed. If you have 'n' rows and 'm' columns to review so the big O(n) notation and where k is the number of distinct entries for each category is something like O(n) = n+k^m.
Try this algorithm to find interesting facts on your own datasets.
Complete Example For Performing Category Tree Breakdowns in PHP
<?php
//first name, last name, position
$rows = array(
array('fname' => 'Ted', 'lname' => 'Cruz', 'party' => 'Republican', 'gender' => 'male', 'state' => 'Texas', 'job' => 'Senator'),
array('fname' => 'John', 'lname' => 'Kasich', 'party' => 'Republican', 'gender' => 'male', 'state' => 'Ohio', 'job' => 'Governor'),
array('fname' => 'Donald', 'lname' => 'Trump', 'party' => 'Republican', 'gender' => 'male', 'state' => 'New York', 'job' => 'Chairman'),
array('fname' => 'Hillary', 'lname' => 'Clinton', 'party' => 'Democrat', 'gender' => 'female', 'state' => 'New York', 'job' => 'Secretary of State'),
array('fname' => 'Bernie', 'lname' => 'Sanders', 'party' => 'Democrat', 'gender' => 'male', 'state' => 'Vermont', 'job' => 'Senator'),
);
$breakdown_indexes = array('party', 'job', 'state');
show_breakdown($breakdown_indexes, $rows);
/*
Output:
For breakdown in form of: ["party","job","state"]
Breakdown result is:
Republican
Senator
Texas
Governor
Ohio
Chairman
New York
Democrat
Secretary of State
New York
Senator
Vermont
*/
$breakdown_indexes = array('party', 'gender', 'fname');
show_breakdown($breakdown_indexes, $rows);
/*
Output:
For breakdown in form of: ["party","gender","fname"]
Breakdown result is:
Republican
male
Ted
John
Donald
Democrat
female
Hillary
male
Bernie
*/
function show_breakdown($breakdown_indexes, $rows) {
$new_data = get_array_breakdown($breakdown_indexes, $rows);
echo 'For breakdown in form of: '. json_encode($breakdown_indexes).PHP_EOL;
echo 'Breakdown result is: '.PHP_EOL;
echo $new_data;
echo PHP_EOL;
return $new_data;
}
function get_array_breakdown($breakdown_indexes, $rows, &$data = null) {
if (count($breakdown_indexes) < 1) {
return array();
}
if (!$data) {
$data = new datum();
}
$metric_index = array_shift($breakdown_indexes);
$to_add = array();
foreach ($rows as $row) {
$val = $row[$metric_index];
$to_add[$val][] = $row;
}
//Now loop through the children to continue the breakdown
foreach ($to_add as $key => $to_add_val) {
$datum = new datum();
$datum->name = $key;
$datum->data = $to_add_val;
$data->addChild($datum);
get_arrays_for_breakdown($breakdown_indexes, $to_add[$key], $datum);
}
return $data;
}
class datum {
public $name = '';
public $data;
public $children = array();
private $depth = 0;
public function addChild($child) {
$child->depth = $this->depth + 1;
$this->children[] = $child;
}
public function __toString() {
$str = $this->toStringHelper();
foreach ($this->children as $child) {
$str .= $child;
}
return $str;
}
private function toStringHelper() {
$str = str_pad(' ', $this->depth * 4);
$str .= $this->name;
$str .= PHP_EOL;
return $str;
}
}
This blog is unique from all others. Thanks for sharing this content in an excellent way. Waiting for more updates.
ReplyDeleteEnglish Speaking Classes in Mulund
IELTS Classes in Mulund
German Classes in Mulund
French Classes in Mulund
Spoken English Classes in Chennai
IELTS Coaching in Chennai
English Speaking Classes in Mumbai
IELTS Classes in Mumbai
Spoken English Class in Porur
IELTS Coaching in Anna Nagar