CHAPTER 2
Data Acquisition
The National Center for Atmospheric Research (NCAR) is an academic and research institution with a facility called Foothills Laboratory in Boulder, CO. This campus of three similar two- and three-story buildings has approximately 1200 identified rooms, with 1174 having individual occupancy sensors. The facility has a local-host control hierarchy with individual control zones communicating with the central facility control computer [Morrow, 1997]. Using the STAEFA MS2000 software system, this computer can ascertain the status of any of the occupancy sensors at specified regular intervals.
The NCAR Facilities Support Services permitted and assisted in setting up "history blocks" in the MS-2000 system which recorded the occupancy status for the selected sensors every five minutes. This was performed by the central MS-2000 system sometime during a five minute interval, from the status obtained at that moment from the local controller.
The MS2000 system requirements limited each "history block" to at most four sensors. Therefore, the 195 sensors required 53 separate "history blocks" in the MS2000 system for recording the raw occupancy data of the assigned sensors with a time/date stamp. The MS2000 system coordinated the recording of each "history block" but did not coordinate among them, so each of the 53 "history blocks" have separate timing between recorded events.
When the system operated properly, all the "history blocks" would be recorded within a five minute interval, but not in a set order, sequence or period. For individual "history blocks" the time between entries would not be fixed five minute intervals, but would vary, typically by a few seconds. Therefore, the status of each sensor was recorded at some time within each five minute period.
Problems with the internal computer clocks resulted in significant error in the data timing during the initial months, with out-of-sequence time stamps or longer periods between recorded occupancy. The preliminary results from this work led to a revision of the NCAR computer clock system which for the most part eliminated this problem during the period of data accumulation. However, the data from the first month, October 1994 (9410), does have substantially more timing problems than the later months.
While there were inconsistencies in the timing due to the organization of the MS-2000 system and problems with the computer clocks, in general the occupancy status for each of the 195 sensors can be considered as reported at some time within regular five minute intervals.
2.2 Room Selection
By random selection within the applicable room categories (information provided by NCAR), 195 rooms were picked for this data set. The room selection was made by identifying the number of available rooms in the entire facility in a certain category and establishing the number which should be included in the data set. The objective was to include approximately one-sixth of the rooms of each designated room type that were available for collecting occupancy data. A random number generator was used in a custom software routine to build a list of rooms of the desired number from the available rooms. The results are a random distribution of rooms among the total of eight floors of the facility, including rooms on the interior and exterior, adjacent and isolated. Figure 2.1 shows a typical floor plan, with hatching in the rooms or areas which were part of the 195 rooms selected. A total of 107 rooms in the categories of office and office service were included in the set of 195 rooms.
2.3 Data Accumulation and Formatting
The raw data is each room's status as either "occupied" or "unoccupied" and an associated time/date stamp, taken from the central facility management computer, for each room at nominal five minute intervals, for over 14 months. Due to problems with the computer systems involved, this was reduced to a twelve month period from October 1994 (9410) through September 1995 (9509).
The data was reformatted from the 53 "history blocks" to files showing the status of each room for a specific five minute period by assigning the raw data to the appropriate period. This was done for all of the 195 rooms together into twelve files; each file represents a single month. These monthly files are in rows; each row represents a specific five minute period and shows the status of each room as either '0" for unoccupied or "1" for occupied. This is consistent with the information recorded in the "history blocks". This meant reading the time/date stamp of each record in each "history block" and determining the appropriate five minute window for that record, and then resolving conflicts and identifying any gaps. Out-of-sequence records were typically discarded, although extended groups were patched when feasible. In this way the variable time intervals in the raw data of the 53 "history blocks" were recast into fixed intervals of five minutes for all of the 195 rooms.
Errors in the timing and data stream produced a third value ('error') for those periods without proper raw data. Therefore, for each of 195 rooms and for each five minute interval of each day over a twelve month period from October 1994 (9410) through September 1995 (9509), the occupancy status can be determined as "occupied", "unoccupied" or "error".
2.4 Determination of Set of Office Rooms
The subset of sensors identified as offices and office services comprises 107 rooms. Almost all are intended to have only one occupant. From preliminary analysis of the occupancy data, eight rooms show no change in occupancy status for at least one month during the period of particular interest, from Monday through Friday from 8 AM to 5 PM. In some cases, the room has no reports of occupancy, but in other cases the data indicate that the room is always occupied for the period between 8 AM and 5 PM for a particular month. For most of these rooms the occupancy rate changes from month to month, but if there was one month with constant occupancy rate, the room was removed from the set for this work. With these eight rooms removed from the subset under consideration, a set of 99 office or office service rooms comprises the available data set.
The set of 99 rooms was evaluated for "error" rates for the 8 to 5 portion of workdays. The error rate is calculated as the number of errors in a given period divided by the number of five minute intervals in that period.
Table 2.1 shows the summary of the error rates by months and by hours of the day, as averages and as peak values, calculated in the manner described below. The columns are for individual hours in the period between 8 AM and 5 PM. The column labels refer to the sequential hour of the day, so that Hr10 refers to the period between 9 and 10 AM. The row labels refer to the specific year-month combination, from October 1994 (9410) through September 1995 (9509).
The average error rate is the average of all the error rates calculated separately for each hour. The peak error rate is the maximum error rate among the quarter-hours within a particular hour during the month. This definition of "peak" is the same as defined below in section 2, that is as the average over one of the four quarter-hours in each hour. Therefore, if there are twenty workdays in a month, the average of each hour is an average of the twenty separate one-day hourly values, while the peak is the maximum of eighty values. Each one-day hourly value is calculated from twelve records (representing five minute periods) with separate datum for each of the 99 rooms, or 1,188 entries (12 records x 99 rooms). For a month with twenty workdays, each hourly value would represent 23,760 entries (20 days x 1,188), so each monthly daily value would represent 213,840 entries (9 hours x 23,760).
The second column in the average section displays the average of each month's values for the hours from 8 AM to 5 PM, or the monthly daily average. The second column in the peak section shows the maximum of the values for each month's hours from 8 AM to 5 PM, or the monthly daily peak.
The average error rate for the October 1994 (9410) data is around 10%, significantly greater than for any other months, which typically have error rates under 2.5%. The peak occupancy rates in Table 2.1 indicate that in some months (e.g. 9410) there is at least one quarter-hour that is entirely "error" data. This means that for all 99 rooms for all three of the five minute periods in a quarter-hour, the data is all "error". Conversely, any month with Max of 0.000 (e.g. 9412) has no errors during the entire month.
Comparison of the two portions of these tables provides some information about the pattern of errors. For example, for November 1994 (9411), the peak value for Hr09 is 1.000, while the average is 0.021. Since a typical month has twenty to twenty-three days, each hour of a typical month has eighty to ninety-two quarter-hours. Therefore for Hr09 of 9411, while one quarter-hour is entirely "error" data, there are few other errors in that hour of that month.
As discussed in detail below, the high rate of errors and the maximum peak occupancy do not occur in the same hours of the 8 to 5 period in a particular month. Therefore the impact of the errors on overall results is considered minimal.
2.5 Average Occupancy Statistics
The average occupancy rate can be defined in many ways, particularly in terms of the period over which the average is calculated. For the purpose of this discussion, averages are typically taken over a period of one month, for either the entire nine hour workday period (8 AM to 5 PM) or for each hour separately.
Calculations include every five minute period within the daily period of interest over the month, counting the occupied and unoccupied records for all the rooms in the specified set. The average occupancy rate is equal to the number of occupied records divided by the number of both occupied and unoccupied records. In this way "error" records were not reflected in the average occupancy statistics.
The average daily occupancy is the monthly average of the occupancy rate over the nine hour period of workdays between 8 AM (Hr09) and 5 PM (Hr17). Workdays are Monday through Friday, including any holidays. This definition of workdays means the monthly and seasonal variations reflect the occurrence of holidays during each period. For a given set of rooms, there is one average daily occupancy rate for each month.
The average hourly occupancy is the monthly average of the occupancy rate in that particular hour of all workdays. Therefore for any given set of rooms, there are nine average hourly occupancy rates associated with each month. The hours are labeled in the manner of DOE-2, so the hour from 8 AM to 9 AM is Hr09.
The determination of peak occupancy is performed to conform with the definition of peak demand used by the local utility. The peak demand refers to the highest electrical demand in kilowatts occurring during the specified month. The rate is determined over 15 minute periods, which are specified for quarter hours starting at the hour. Within each 15 minute period, the rate is averaged.
Following this definition, the peak occupancy is the maximum value of the quarter-hour averages during the month. This refers to the average over 15 minutes of the occupancy rate for the specified set of rooms. The 15 minute period means that three records, each representing five-minute occupancy information, are used in each averaging calculation. Since the 15 minute periods are defined to be the quarter-hours, the three records would be combined in a regular pattern e.g. the 00, 05 and 10 records would be averaged to calculate the peak for the first quarter-hour period. As with the calculation of average values discussed above, "error" values are not included in the calculation of a quarter-hour average which is a peak occupancy calculation.
It is important to note that peak occupancy is a function of the specific set of rooms and the specific time period considered. While average values can be accumulated from data for individual rooms into overall averages, this process is not valid with peak occupancy rates. For averages, a simple mathematical accumulation or summation can be made, in which individual rooms can be grouped together into a set and the average of that set calculated from the averages of the individual rooms. Similarly the averages of a set of rooms for different hours, days or months can be combined to establish that room set's average for longer periods.
This is clearly not the case for peak occupancy rates, which are dependent on both the particular rooms in a set and the period of time over which the peak is determined. As an example, for all 99 offices, the peak occupancy rate over the year is 0.77 while the annual average is 0.49. However for a specific subset of rooms, selected from this overall set, the peak occupancy could be 1.0 for the year, yet have the same annual average occupancy rate of 0.49. A different subset's peak occupancy rate could be less than 0.77, yet still have the same annual average occupancy. Also, sets with the same peak occupancy rate may have different average occupancy rates. Thus, the peak occupancy rate is dependent on the composition of the set of rooms being evaluated.
In every case, a set's peak occupancy rate is between its average occupancy rate and 1. Also, the peak occupancy over a month is less than or equal to the peak over a year, while the average may be the same or higher or lower. This condition is based on the means of calculating the peak as the maximum over the period of interest. This calculation procedure creates a hierarchy within accumulated periods, such as the months which make up a year. However, there is no dependence between separate periods, so for example the peak for the same set of rooms can vary substantially over different months. Thus, the peak occupancy rate is dependent on the period being evaluated.
Since peak occupancy data is specific to the set of rooms and period of evaluation, it is necessary to clearly define which set and period are associated with each value. The significance of this feature of peak occupancy data when applying it to building models can not be overstated. Furthermore, the association of sets of rooms with peak occupancy values means that through the selection of specific sets of rooms, additional peak occupancy data can be developed.
For the purpose of this procedure, the 99 office and office service rooms are combined into one set. The hourly peak and hourly average occupancy rates of this set for each month are provided in. Since the set of rooms is the same throughout, the maximum value of the hourly peak occupancy in a given month is the daily peak occupancy for that month, shown in the far right column of the peak occupancy rates in Table 2.2. Also, the maximum value of the monthly daily peak occupancy rates is the annual daily peak occupancy rate, shown in the lower right corner of the peak occupancy rates in Table 2.2. Similarly, the maximum value of the hourly peak in a given hour is the annual hourly peak occupancy rate, provided in the bottom row of the peak occupancy rates in Table 2.2, and the maximum value of those is also the annual daily peak occupancy rate.
Inspection of Table 2.2 shows the typical relationships between average and peak occupancy rates. First, for every position in the average occupancy rates, the corresponding position in the peak occupancy rates has a greater value, since the peak is always greater than the average. In the average occupancy rates, the far right column and bottom row show the averages of the values in the associated row or column. Over the entire year, the average occupancy rate for these 99 rooms for Monday through Friday from 8 AM to 5 PM is 0.52, shown in the lower right corner of the average occupancy rates. The far right column shows that there are small variations between the months in the average occupancy rates. Not surprisingly, the lowest average occupancy rates are in December and July, which are months known for holidays and vacations. The bottom row shows the daily occupancy profile, averaged over the entire year. This profile, shown in Figure 2.2 is typical of daily average occupancy profiles, with a low value in the morning increasing toward noon, a dip at noon and then a falling off as the afternoon progresses. The shape is similar to the ASHRAE/IES 90.1 profiles shown in Figure 1.1, but the values are approximately 60% of the ASHRAE/IES rates.
For the peak occupancy rates, there is a difference in the way the values in the far right column and bottom row are developed. As is appropriate for peak occupancy rates, these values are the maximums for the associated row or column respectively. Similarly in Table 2.2 the value at the lower right corner is the maximum of both the monthly daily peaks (far right column) and the annual hourly peaks (bottom row). The peak occupancy profile shown in Figure 2.2 consists of the annual hourly peaks presented in the bottom row of Table 2.2. The use of maximums instead of averages is at the heart of the distinction between peak and average occupancy data, the difference in how the statistics are developed.
The Average Decile procedure entails using the average occupancy rate as the independent variable and calculating the corresponding set size and peak occupancy rates. The intent is to develop a data set with a range of average occupancy rates.
The 99 rooms are each evaluated separately for monthly daily average occupancy, using the calculation procedure described above. This value is then converted to a "decile" (decile = Int(Avg*10)). If a room in a particular month has a daily average occupancy of between 0.400 and 0.499, then for that month that room is assigned a decile of 4. In a different month the same room may have a daily average occupancy between 0.600 and 0.699, and so would have a decile of 6 for that month.
Grouping the rooms by decile produces 10 sets of differing size for each month. Each set is associated with a decile, so the rooms in the set have similar average occupancy during that month. For each of the ten sets in each month, the average and peak occupancy rates are calculated, producing 120 data triads (12 months x 10 deciles) for each hour or daily period. Each triad has set size, average and peak values, and can be associated with a unique month-decile combination. The resulting hourly average and hourly peak occupancy rates are shown in Table 2.3.
Inspection of these tables shows trends and profiles similar to those discussed for the Entire Set procedure. In addition, the average values show that the decile procedure does produce room sets with the desired range of average occupancy rates. The progression in the peak occupancy data across the deciles shows how peak occupancy increases as the average occupancy rate of the associated room set increases.
A slightly different perspective on the results of the Average Decile procedure is shown in Table 2.4, with both the peak occupancy rates and the number of sets associated with each average decile-set size combination. The peak occupancy values tend to decrease as the set size increases, showing the relationship between peak occupancy and set size. The number of sets associated with each combination of average decile-set size shows how few values are contributing to the peak occupancy rates. This low number of entries in each combination indicates that this data would not be a good foundation for a prediction tool, which leads to the development of the final set of occupancy statistics.
The Set Size procedure entails using the number of rooms as the independent variable and calculating the corresponding average and peak occupancy rates. The intent is to develop a data set with a range of set sizes.
This procedure begins with the selection of subsets of a specified number of rooms from the available 99 office and office service rooms, using a random number generator in a custom software routine. The routine produces a subset of rooms which have been randomly selected from the available rooms while avoiding duplication. Therefore, if the specified number is ten, the routine will randomly select a room from the 99 available rooms and, if the selected room is not already in the subset, include it. This process is repeated until the subset has ten rooms in it. Such a subset would be have a set size of ten.
For each subset, the average and peak occupancy rates are determined for each of the twelve months, for both hourly and daily periods. For each subset that is defined, the results are data triads of set size, average occupancy rate and peak occupancy rate for each of the twelve months.
Therefore, any size of subset from 1 to 99 can be evaluated. Proposed subset sizes are 10, 20, 30, 40, 50. Due to the random selection from a rather large pool, a very large number of different subsets can be created (combination theory indicates that for 99 elements the number of available distinct subsets may be as great as 1015). The proposed number of subsets is 50 for each of the five proposed set sizes. Since each subset of each size would have twelve "daily" data triads of set size, average occupancy and peak occupancy (one for each month), there would be 600 data triads (50 subsets x 12 months) for each size of the subsets, or 3000 "daily" data triads (5 sizes of subsets x 600) in all.
The calculations are repeated for the same sets of rooms to determine the average and peak for each hour in each month. This produces 3000 data triads for each of the nine hours between 8 AM and 5 PM, for a total of 27,000 "hourly" data triads.