Using Commodity Hardware as an Affordable Means to Track On-site Visitor Flow


Gray Bowman, USA, Kyle Jaebker, USA

Abstract

Low cost computing enables the cultural sector to pursue distributed tracking and monitoring systems that were out of reach just a couple of years ago. See how the IMA is using ultra-affordable computing to build an onsite visitor tracking system, and how log analysis is performed in order to map tracking data.

Keywords: mobile, visitor tracking, analytics, audience evaluation, commodity hardware, WiFi, RaspberryPi

1. Introduction

Museums have long been interested in the evaluation of their spaces and how visitors interact with works of art. Recording how visitors move from gallery to gallery and where they stop has often been the task of trained observers in the galleries. This process is done to determine if the information displayed is accessible and engaging. As the costs of technology continue to decrease, finding technological means to automate the tracking of visitors could lead to additional findings and help inform the evaluation process. Having access to records of which galleries people entered, from where, and the length of stay can enlighten a museum in many ways. This data can be used to determine gallery layout and wayfinding needs, or to inform other decisions about how a collection is displayed to the visitor. In addition, users can be provided with information through their smartphones about the area of the museum they are in.

An increasing number of museum patrons are bringing smartphones when visiting. Smartphone penetration levels are continually increasing, with the NPD Group (2012) reporting 70 percent penetration in Q3 of 2012. Additionally, comScore (2013) reports that 123.3 million Americans now own smartphones. These numbers will continue to rise as the cost of low-end smartphones further declines and more mobile phone users make the switch to smartphones.

The Audience Engagement department at the Indianapolis Museum of Art (IMA) has administered exit surveys to get a better understanding of who visits the museum. Starting in September 2012, the IMA added the question, “Are you carrying a smartphone with you today?” Based on 350 responses, 73.3 percent of respondents were carrying a smartphone. This percentage falls in line with recent studies, as mentioned earlier. In addition, the IMA found that for the age group from 25 to 44, 83.7 percent of visitors carried a smartphone. This age group accounted for approximately 56 percent of total visitors during this survey period. All of these statistics indicate the continued growth of smartphones and their prevalence in museums.

While commercial applications exist for tracking users throughout a space, they are prohibitively expensive for most institutions and rely on proprietary technologies. In the last five years, an explosion of easily accessible and low-cost technologies has come to market that are aimed at making device-building accessible, and fostering education about electronics and computers. The Arduino and the Raspberry Pi are examples of open-source projects producing hardware to serve both the microcontroller and computer markets that have built a large following. Both components are often featured as the core of many homebrew projects.

The IMA wanted to find a low-cost device for this purpose, and after a few brainstorming sessions it was determined that building our own low-cost device using off-the-shelf components and software could be possible. With this idea in mind, the IMA built a tracking device with a server backend that could aggregate the data collected. From December 21, 2012, the tracking device was deployed near the front desk of the IMA. This allowed for the collection of data as visitors entered the museum and moved into the galleries.

2. Building a low-cost tracker

Overview

A majority of smartphones and mobile devices are configured to prioritize Wi-Fi over other types of data connections. As these devices are carried around, they search for a network to connect to. Even if already connected, the device will search for a network that is preferred over the current connection. This is done by broadcasting a packet of data known as a “probe request.” If the requested network is present, it will answer the request and establish a connection.

Generally, one probe request is broadcast for each network the device has previously connected to. Each probe request contains the MAC (Media Access Control) address of the device, an identifier intended to be unique to the device sending it. These probe requests are unencrypted, and other than the MAC address, do not contain any personally identifiable information. It is the capturing of these requests that allows the analysis of visitor traffic.

By utilizing a network of Wi-Fi-equipped, low-cost computers, it is possible to capture these broadcasts from multiple locations, anonymize the device identifiers while keeping them unique, and store them in a central database for analysis. This data can then be used to show traffic patterns, totals, and frequency.

Hardware

The Raspberry Pi Model B is a low-cost, general-purpose computer, about the size of a credit card, that runs Linux. It is based on a 700MHz ARM11 CPU, has multiple on-board peripheral connections, and is available for $35. It requires an SD card for storage of the operating system and software, and a USB Wi-Fi adapter for capturing the wireless traffic.

For the prototype agent, a 16-gigabyte SD card was used. The only storage consideration is the size of the log file, which is reported to the server in real time, but continues to grow until truncated. The agent is configured to do this automatically each day, so enough storage to hold a days worth of data is required.

After researching many Wi-Fi adapters, the TP-Link TL-WN722N was chosen. It supports “b,” “g,” and “n” type networks, and comes with a high-gain antenna. More importantly, this adapter supports monitor mode and allows us to retrieve diagnostic information such as signal strength. Monitor mode is a configuration that allows the adapter to listen to all traffic on the radio band without being associated to any one network, and is required to capture all traffic.

Each agent device built to the specifications above requires a power source and an ethernet connection. A standard micro USB AC adapter is used for power. It is important to use a power source rated at 1A or greater to supply enough current for both the Raspberry Pi and the Wi-Fi adapter. Connecting the agent to an ethernet port allows the collected data to be transmitted back to the server without interrupting the collection of probe requests over the Wi-Fi adapter.

bowman.fig1
Figure 1. Completed agent hardware

Agent Software

The software stack for the agents consist of Linux, Tshark, and Python. The Raspbian OS is a custom Linux distribution derivative of Debian Linux meant for use on the Raspberry Pi. It has a large collection of packages available for it, precompiled as needed for the ARM processor.

Wireshark is an open-source network packet analyzer that allows the capture and filtering of all data transmitted across the network. It also enables collection of radio diagnostic data not normally exposed by the adapter to the operating system, such as radio signal strength.

While Wireshark provides a GUI interface for these tasks, Tshark is the command-line-only version of Wireshark. It is a perfect fit for the low-powered Raspberry Pi, using minimal resources and allowing the capture to be easily scripted.

The popular, general-purpose programming language Python is used to control the Tshark process, handle failures, and transmit the collected data back to the central database. All data is transmitted via HTTP.

Server Software

The central server was built using Django, a web application framework backed by MySQL. It provides an endpoint for the capturing agents to report to, as well as an administrative interface to view the collected data. The Django ORM (Object-Relational Mapping) also provides an easy way for developers to form advanced queries against the data.

Availability

All software created for this project has been open-sourced and is available on GitHub (https://github.com/IMAmuseum/visitorflow).

Process

The agent is placed in an area of traffic and connected to Ethernet and power. After booting, the “listen” Python script is invoked by a cron task, and a short delay of a few seconds is inserted to ensure all booting processes are stabilized.

The Tshark process is initiated and monitored by the Python script. Should the process fail or end, it will be restarted automatically. Tshark logs all the captured probe requests to a text file as they are observed. There are a plethora of options when initializing Tshark, and filtering criteria is specified at run time. The Linux utility “stdbuf” is used to disable buffering of the output. Our prototype used the following options to isolate the probe requests:

stdbuf -oL tshark -i wlan0 -I -f 'broadcast' -R 'wlan.fc.type == 0 && wlan.fc.subtype == 4' -T fields -e frame.time_epoch -e wlan.sa -e radiotap.dbm_antsignal > tshark.log

The “listen” script runs through a loop that ensures Tshark is running, and monitors the Tshark log. As Tshark adds probe requests to the log, they are read in by the script and forwarded to the back-end server as an HTTP POST request. Should the Python script fail for any reason, the cron task that initiated it will reboot the device and the process will begin anew.

On the server side, the Django web application receives the HTTP request, parses the POST data, and saves it to the database for later analysis. To assist in the data analysis, a Python script was developed to process the data received by the server. As stated earlier, a device sends probe requests for each network it knows of; for instance, if a device has seven previously known networks, seven probe requests are transmitted. The processing script parsed through all requests and converted these groupings of requests to a single record.

3. Visitor analytics

Using the data captured by the visitor tracking agent, one can start to get a picture of how many devices enter the building. By capturing the mac address of each device (and obfuscating it), a record is logged each time that device is detected by the agent. This data can then be used to count the number of devices over a given period of time. Figure 2 shows the number of devices detected per day during the testing period. The gap in the data is due to the agent being offline, and no data was collected.
bowman.fig2
Figure 2. Unique devices per day

In addition to tracking how many unique devices are detected per day, the data can show how often a device is detected. This would allow for the museum to detect how often a visitor returns, as well as an approximate length of their visit(s). Figure 3 shows how many times a device was detected over the course of the observing period. Most devices were only detected once by the agent .
bowman.fig3
Figure 3. Number of devices and how often they were detected

From September through December 2012, the IMA conducted a visitor exit survey. This survey collected data about how many visitors carried smartphones in the museum. It was determined that 73.3 percent of visitors to the IMA smartphones. Using this data, we can start to approximate the actual number of visitors based on the data collected by the agent. Figure 4 shows the number of unique devices and the number adjusted for the 26.7 percent of visitors that do not have smartphones. In addition, the IMA has a Trafsys counting system that counts the number of entrances into the IMA.
bowman.fig4
Figure 4. Unique devices compared with Trafsys and adjusted devices

During this trial period, the agent detected a total of 11,469 devices. Adjusting for the IMA smartphone adoption rate of 73.3 percent, a total of 14,522 devices would have been detected if everyone had a smartphone. The Trafsys data for the same period of time (removing the dates where the agent was down) shows a total of 13,969 visitors to the IMA. The numbers of visitors counted by the Trafsys system and the agent differ by 3.96 percent. Further testing is needed to determine if these numbers would hold up over time, but it appears the agent has the ability to closely approximate the number of visitors to the museum. As discussed further in the drawbacks section, this calculation does not account for users with multiple devices. Adding some additional survey questions about all devices a user has at the museum could lead to a more accurate calculation of visitors.

4. Possible applications

With one device deployed at the IMA, only basic information about how many devices were being detected could be collected. Installing multiple devices across the museum would allow for some more advanced analytics and possible applications.

Flow of People Between Areas

Observing the flow of visitors throughout the galleries can be beneficial to museums in many ways. Knowing where people are more likely to visit in the museum and where they get lost can help with staffing and signage. Knowing how visitors move throughout a building also could inform where to place certain exhibitions and works to offer greater access and visibility.

By deploying multiple agents throughout the museum, visitor-flow detection becomes a possibility. As visitors move throughout the museum, their smartphones will be detected by different agents. Depending on how well known the range of each agent is, the flow of the user could possibly be determined. Consideration for where each device is placed and how its range overlaps nearby agents is important to note when setting up multiple devices.

Triangulation

With three or more agents positioned in an overlapping arrangement, determining the location of a detected device through triangulation becomes a possibility. This requires a Wi-Fi adapter that will provide the signal strength information to the operating system. When the same probe request is observed by two or more agents, the signal strength can be compared and an estimate of the device location can be made. If estimates were found to be reliable, devices could be plotted on a map in real time.

5. Drawbacks

Smartphone penetration and Multiple Devices

If every visitor to the museum were carrying a smartphone, calculating statistics on number and frequency of visits would be greatly simplified. General numbers are published for smartphone penetration in the United States and other markets but could vary widely for a particular museum. This causes problems when trying to determine how many actual visitors are visiting based on devices. Conducting on-site surveys could help produce a more accurate penetration percentage for a given location.

In addition to smartphone penetration, the agent does not differentiate what type of device is looking for a network. If a user is carrying a laptop, smartphone, and a tablet, all three devices could look for a network at the same time. This would cause an increase in the numbers and potentially throw off counting analytics. Again, one way to mediate this issue would be to do on-site surveys to determine what and how many devices the average visitor brings into the museum.

Range

The effective range of 802.11b/g Wi-Fi is 150 feet indoors or 300 feet outdoors. This 50-percent difference due to material penetration can severely distort the radio footprint of the agent. Varying antenna styles can be used to attempt to shape the signal, or agents can be positioned to compensate for signal loss.

Bleed Over

A degree of overlap in range between agents is desirable for the purpose of triangulation. However, the unpredictability in range and signal penetration can present difficulties in planning the overlap. Testing and tuning the agents should be done to form a consistent overlap between networks.

Perceived Privacy

It is likely that most device carriers are not aware that all of their previous connections are broadcast everywhere they go. Many can find the notion of tracking an individual’s location between rooms or buildings to be an unacceptable use of their digital presence, even if perfectly legal.

Smartphone Wi-fi Enabled

For the agent to collect any data from a user, the user’s smartphone must have Wi-Fi enabled. Some users looking to extend the battery life of their smartphones might turn off Wi-Fi, so those devices would not be detected. In addition, if a user has never accessed a Wi-Fi hotspot in the past, the packets of data that are sent to detect known networks are not transmitted. This again would make the device invisible to the agent.

6. Next steps

While the current dataset collected is quite small, the results are promising, and the IMA will continue to evaluate this solution for tracking visitors throughout the museum. The IMA will look to purchase and build more agents to test some of the proposed applications above. The ability to collect from multiple agents will give us a better idea if this solution would accommodate a museum-wide fleet of devices and how many devices would be necessary for solid coverage. As more data is collected, it will be important to closely monitor the data and determine proper algorithms for processing the data into a useful dataset.

In addition to purchasing more devices, continuing development of the software will be important for further study. Building a real-time dashboard that will show the status of each agent will allow easier monitoring to help prevent downtime. Live reporting tools would also allow for easy access to the data to determine how many devices are currently in the museum. Trends over time could be graphed and broken down by agent location. If triangulation was found to be reliable, a map of the museum could be displayed with the estimated positions of all detected devices.

Tracking via Bluetooth signal is another option to be explored.  Bluetooth has a shorter range (5 to 30 meters) than Wi-Fi and may be more difficult to observe. However, the shorter range could provide better resolution to the triangulation. Unfortunately, a majority of smartphones disable the discoverability of their Bluetooth presence, so a custom device would likely need to be provided to the visitors.

Providing smartphone users with relevant information about where they are in the museum is also a task to investigate. The agents are able to quickly receive probe requests but this data will need processed into a usable state for a smartphone application or website. Having complete Wi-Fi or cell coverage in the galleries will be required to allow the smartphone user to connect to the app or website to receive the location data processed by the agents.

Indoor location and tracking is a young and constantly changing technology; many new ideas will be formed as this space continues to grow. Museums will be wise to watch how the technology advances so they can provide users with a more engaging experience through their smartphones.

7. Acknowledgements

We would like to thank the Indianapolis Museum of Art Audience Engagement department for conducting the visitor exit surveys, and Silvia Filippini-Fantoni for providing us with useful data from the surveys to use in this paper.

8. References

comScore. (2013). “comScore reports November 2012 U.S. mobile subscriber market share.” Press release. January 3, 2013. Consulted January 27, 2013. http://www.comscore.com/Insights/Press_Releases/2013/1/comScore_Reports_November_2012_U.S._Mobile_Subscriber_Market_Share

NPD Group. (2012). “The NPD Group: Lower prices and larger selection boost pre-paid mobile phone carriers.” November 15, 2012. Consulted January 27, 2013. https://www.npd.com/wps/portal/npd/us/news/press-releases/the-npd-group-lower-prices-and-larger-selection-boost-pre-paid-mobile-phone-carriers/


Cite as:
G. Bowman and K. Jaebker, Using Commodity Hardware as an Affordable Means to Track On-site Visitor Flow. In Museums and the Web 2013, N. Proctor & R. Cherry (eds). Silver Spring, MD: Museums and the Web. Published January 31, 2013. Consulted .
http://mw2013.museumsandtheweb.com/paper/3817/


2 thoughts on “Using Commodity Hardware as an Affordable Means to Track On-site Visitor Flow

  1. Hi, I had a quick question about using tshark. I was wondering when it displays the negative decibels what that unit is measuring. I am trying to use tshark to locate distance away from the device to the antenna. Thanks for the help.

  2. Thanks for the post! I am doing a project for one of my Master’s classes and this gave some great direction for how to do it on my own (e.g. hardware to use, stdbuf, etc).

Leave a Reply