DIY Web Scraping: Fetching and Extracting Data

Given the sheer multitude of accessible data available on the world wide web, the “web scraping” phenomenon has caught on like wildfire. Web scraping is a method for extracting data from websites. The scraping can be done manually, but is preferably done in a programmatic way.

Many free programs are out there to assist you with your forays into web scraping.  For a recent project we used IMacros to automate the fetching/extraction of needed data from the Residential Construction Branch of the U.S. Census.  This website provides data on the number of new housing units authorized by building permits.  Data are available monthly, year-to-date, and annually at the national, state, and most county and county subdivisions.  Prior to January 9, 2017 all building permit data at the county level or below was only available as individual text files.  This meant we had to manually download over 3,142 individual text files in order to obtain the data for all the counties in the U.S. It was a tedious task, to say the least.

Such a manual process would have been too labor intensive to take on without any automation via web scraping. Automating the entire process using IMacro was pretty straightforward and simple. Here’s an outline of the steps:

  • Install the IMacro extension to the Firefox web browser.
  • Test the IMacro recording function by going through the process of selecting and downloading the first file.
  • View the recorded code and create a loop +1 so that the code repeats itself and downloads each text file.
  • Save the files in the same file/folder location to make the process of merging data files into a single file much easier.
  • Extract data easily for every county, with the ability to roll up by state, region and nationally.

Like many data sites, the Building Permits Website now provides access to the FTP directory where you can navigate and download all 3,142 text files without having to enter specific parameters for each file.  However, if you come across websites that do not, we recommend that you get familiar with the site to determine what format the data is in: i.e. tables, individual pages etc.  If you need to scrape from numerous websites, take the time to get familiar with each one, because any change in formatting from site to site can cause havoc if you are not aware of the potential problem of downloading misaligned or incorrect data. Never forget the rule: garbage in, garbage out. Test before you scrape!

Why Volunteer?

The research industry needs volunteers. Here’s why you should consider playing a part.

Many of us here at MSG serve as active volunteer members of market and survey research industry organizations. It’s part of our company culture to get involved and make a difference. Recently, I attended back to back chapter events, and I began to reflect on the benefits of volunteering. Was it really worthwhile to devote my time to a local chapter organization?

It’s true, the amount of time you need to devote to volunteering can feel like a second job, and it is crucial that you be able to balance your primary and secondary activities. It’s definitely a juggling act, and it isn’t always easy.

That being said, there are loads of good reasons to become a volunteer. Here’s what influenced me to get involved:

Networking. Serving as an industry volunteer will get you talking to people and is a wonderful means for creating and maintaining relationships. I want to meet people whom I can work with, but I also want to build a network of long-lasting professional relationships. In my roles as a volunteer for a local chapter organization and committee memberships, I have encountered industry pros whom I never would have met otherwise.

Learning best practices. Education doesn’t end with a degree, a certification, or on-the-job training. It should be seen as a lifelong habit of mind. By attending events and seminars outside the orbit of your day-to-day business, you will be exposed to new ideas and pick up on new trends within your industry and related industries.

Organic growth. A natural goal we all have is to grow our business. When you volunteer, the cultivation of business growth can tend to happen more organically, as a function of developing relationships within the membership environment. As you discover ways to collaborate and partner with others, those seeds will sprout.

I firmly believe that volunteers are the lifeblood of an association. They keep our communities engaged and informed. Despite the fact that it can take up a lot of spare time, when I reflect and ask myself, should I have volunteered? I always answer a resounding YES!

 

 

 

Who Really Owns the Cell Numbers on Your List?

Say you have a list of cell numbers for consumers and you want to message them. Then you use an automatic dialing system to send text messages out to those numbers. This simple and apparently innocuous action could have drastic consequences that could actually cost millions of dollars.

This could happen to you

Take the case of Philadelphia-based frozen treats company Rita’s Water Ice, which settled a class action lawsuit for three million dollars in May 2016.

The reason? The plaintiff claimed that Rita’s had violated the Telephone Consumer Protection Act (TCPA).

The TCPA requires you to have prior express written consent before using an automatic telephone dialing system for messaging cellular numbers on a list.

In the Rita’s Water Ice court case, the company strongly denied the accusations, but they agreed to a settlement so as to avoid a prolonged lawsuit.

The plaintiff argued the case from two perspectives:

  1. Those who had given original consent but changed their mind and asked to be removed from the distribution list, which never happened.
  2. Those who claimed that they had never given consent to receive text messages.

What’s most interesting for those of us in the research industry is the second group. Upon analysis, it was discovered that certain plaintiffs owned cell phone numbers that had been assigned previously to consumers who HAD in fact agreed to receive text messages from Rita’s.  In effect, written consent had been given originally, then the cell number was reassigned to a new consumer who had no clue about any of that.

Navigating the murky waters of compliance  

All of this points to an issue of great concern to researchers: the vagueness of the TCPA. And it begs a major question: how much due diligence should a company or researcher have to perform, to ensure that the cell phone numbers on their list are in fact registered to the names on the list? It’s murky. A grey area. Undoubtedly, more litigation will have to occur before the question is answered definitively.

In the meantime, if TCPA compliance is at the forefront of your data collection, you should contact an MSG account manager. We have the ability to mitigate TCPA risk. We can identify wireless numbers for you, and we can offer identity verification that verifies called-party consent.

Until the TCPA is amended, clarified, or scrapped, the second golden rule always applies: “better safe than sorry.”

“Hope for the best, prepare for the worst”: salvaging the client list

You’ve probably heard the story before. It begins, “The study started with a client list….”

I can’t tell you how many times I had a client call and tell me that. The stories follow a pattern. The client says it’s a great list and you should be able to easily complete the study with it. Sounds great, right?

Here comes the plot twist. They forgot to tell you the list is 4 years old and hasn’t been touched since. Oh, and by the way, only 30% of the records have a phone or email address. Suddenly, easy street is filled with potholes.

This isn’t the end of the story, and it can have a happy ending. A sub-standard client list can be rescued with these investigative approaches and performance enhancements:

• Flag any cell phone numbers so they can be separated out and dialed manually, which also ensures TCPA compliance.

• Ask yourself: what is most important on their list? What is the key sampling element? Is it the individual (contact name)? If so, the file can be run against the National Change of Address (NCOA) database to see if the person has moved. If the person has moved, a search can be run for the new address. The next step is to identify the landline and (or) cellular telephone numbers associated with that individual at the new address.

• If location/address is the key element, check for the most up-to-date telephone numbers (either landline or cellular) and name associated with that address.

• Send the call list to a sample provider for verification. Does the information in your list match the sample provider’s database?

• If information doesn’t match, can you append on a new phone number or email address?

• Do you still have open quotas? See if you can append demographics to target for open quotas.

• When you’ve exhausted all options on the client list and the study still isn’t completed, order an additional custom sample that meets the ultimate client’s specifications (or at least comes close). Then you should dedupe the client list from any custom sample orders.

With the help of a good sample provider, even a subpar client list can be salvaged and the study brought to completion on time.

Using this software? Your data is NOT secure!

Almost every week I hear in the news about another data compromise, whether it be a large corporation having their customers credit cards and personal information stolen or even computer hacks rumored to be effecting the election. Personally, it is very scary thinking about myself and my family’s data getting into the wrong hands.

If I was managing a large database of panelists or other sensitive data which my business relies on, I am not sure I would sleep well at night knowing that there may be many different potential access points for this information to be exposed or accessed by someone else.

I was speaking with one of the world’s largest pharmaceutical companies recently and was told that the use of “brand x” survey platform has been 100% forbidden to use moving forward with the potential of losing a job if you are caught using this off the shelf tool.

I think we all need to take a look in the mirror and evaluate the different software, applications and tools we use in our everyday business and personal lives and ask if there is a better, more secure, way to manage these processes and our data.

In the business world it is very important to involve IT and your SECURITY team when choosing a new critical software platform(s) to make sure that the tools you will use every day meet the security requirements of your business as well as the productivity requirements for your research needs. I know this adds time and most likely costs to the bottom line but not as much as if your database wound up in the wrong hands .

Which side of the firewall does your database, survey data and other critical research tools reside?

Sports Adventure in the UK

It’s been my dream for quite a while to visit the UK.  I’ve been following tennis since I was a kid, as well as Premiere League soccer the past six years.  I finally decided to bite the bullet and take a solo trip to London last week.

I took the red-eye out of philly and landed at Heathrow airport 7:30am.  The first day was spent sight-seeing and catching up on some sleep, since the passenger next to me decided to use me as a shoulder rest the entire flight.

The following day was the highlight!  Tottenham vs Everton at White Hart Lane, a stadium almost 120 years old.  I didn’t see much of the hooliganism you hear about, but more a no-nonsense respect for the game.  If there were 34,000 fans in the stadium that day, 32,000 were dressed in navy blue (Tottenham’s colors), and the visitor’s section in bright blue (Everton’s colors).  I purchased a rain jacket back in the states and didn’t realize until boarding the train to the match, that it matched Everton’s bright blue jersey.  The looks I was given were equivalent to someone dressed in Cowboys colors at an Eagles home game.  I quickly removed my jacket and kept it rolled up until I could analyze the vibe at the stadium.  I noticed a few other Tottenham supporters wearing alternate colors so I put my jacket back on, and glad to say no hassles whatsoever.  I wondered why fans were literally shoveling food down their throats outside the grounds.  No one and I mean no one was eating, drinking or turning their eyes from the action until halftime.

The following day was spent touring Wimbledon and Shakespeare’s Globe Theatre.  The most interesting tidbit of the entire trip came on the Wimbledon tour.  There was a giant, what looked like a dough roller when you entered the grounds. 

When The All England Club first opened, they wanted the grass to look immaculate and needed to raise money for a grass flattener, aka the giant dough roller.  To raise money, they decided to hold a tennis tournament and since it was so profitable, continued from that point on.  Talk about the cart before the horse!

4 Surefire Ways to Increase ABS Response Rates Without Breaking the Bank

So you found the perfect sampling source with nearly 100% coverage and the ability to reach cell phone only homes with address based sample.  One can expect to get the completes needed but realistically what type of response rate will you achieve?  How can you boost it?

responsesDepending on the steps taken the response rate can vary greatly.  You may only realize 10-15% without a big name company endorsement to go with your survey and/or a pre-notification postcard, but such an endorsement can kill the budget before the study even begins!

Here are 4 surefire tips to increase response rates…

Tip #1 Phone number append and name append to the address using commercial databases to personalize the pieces and allow for reminder calls.  Even where the name is appended also include “or current resident” to reduce the return rate.

Tip #2 Add a creatively designed piece with the web link to drive them to participate online.  This allows the respondent to take the survey anytime and on a device of their choice.  Offering a multi-mode approach can increase participation and representation.

Tip #3 Repeat the Message and contact potential respondents multiple times via mail, phone, media or social networking sites which will increase awareness and help entice them to participate.  Messages are more effective when repeated!

Tip #4 Offer an Incentive to motivate your respondent.  Be sure the value represents a balance between effort and time spent on the survey within budget of course.

With some large, heavily endorsed studies we have seen up to 50% along with long field times, reminder calls, multiple post cards, and refusal conversions.  Use the tips that your study and budget will allow and you can experience a higher response rate too!

It’s the most wonderful time of the Year!

No – I am not talking about Christmas…I am talking about Halloween.  Most importantly – Candy season.  For me and many others Halloween is the official season of candy, the celebration of the sweet tooth.  The season always brings up the age old discussion – what is your favorite candy?  For me the answer is simple and no candy can even come close to the number one position.  But I would like to take the opportunity to rank my personal top 5 candy choices.  Please note this is for the chocolate division only.  Perhaps in a future post we can analyze the chewy candy division (MIKE AND IKE, SWEDISH FISH!!!)

Hershey’s Dark Chocolate – very basic but you have to love the dark chocolate version.  Did you know dark chocolate has many health benefits also?  sign me up!!!

Twix Bar – a very sold candy…the milk chocolate, the wafer, always crisp.  Twix has a ton of flavor but       that light taste allowing you to eat several in one sitting without feeling slowed down

1000 Grand Bar – for me this is an underrated favorite that makes the top 5.  Maybe others feel the same but I always feel the 1000 gets overlooked by the typical candy standby’s.  This bar is a great mix of chocolate, caramel and crisped rice.

Snickers – gotta love a snickers bar.  It is the opposite of Twix in my book in that it is a nice      hearty,   filling candy bar.  Bonus points for their recent marketing campaigns which I really enjoy!

Reese’s Peanut Butter Cup – for me this is the be all, end all, slam dunk #1 candy.  I love the traditional cup, love the specialty Reeses’s products (the Reece’s Christmas Tree!), from the freezer, room temperature.  It does not matter – that peanut butter and chocolate combo is unmatched.  It is nearly impossible for me to turn to a Reece’s’ cup!

So there it is – my top five candy choices in the chocolate division.  I hope this creates some good candy conversation in the office to discuss with your coworkers.

Cheers!

A special connection to an athlete at the Rio games

 

MSG’s own Keith Davis has a special connection to an athlete at the Rio games which begin August 5.  Read below to see Keith’s feelings about seeing one of his students make it all the way to the highest venue of athletic competition.

Coaching in any sport, it’s always a blessing when you come across an athlete you see has great potential and they believe and trust in you and your instruction. Ultimately, you take NO credit for their abilities but just the fact that they heed to your instruction and guidance allowing you to be a part in her development brings both humility and a since of awe to a coach.  AjeeWilsonThen, to see that athlete make the 2016 Olympic team AND appear on the Kellogg’s Cereal box ….

My tears of Joy just flow!!!

This year, I have the opportunity to watch a young lady, Ajee’ Wilson of Neptune, NJ (pronounced Ah-Zhay) compete as a US Olympian in the Women’s 800 meters. She’s a compassionate, classy and humble young lady. I coached Ajee’ at the very beginning of her track career in 2005. She was a hard worker and her talents afforded her national honors in the 400, 800, 1500 and 3K meters her very first year! The following years, Ajee’ won all three events in her age group setting National records AAU and USATF Jr. Olympics. But just how special is the young lady … in her sophomore year in high school, Ajee’ ran 2:00.91 800m anchor leg in the 1600 Sprint Medley Relay coming from last to first place at the New Balance National Outdoor meet in 2010! She set various records in NJ and later became World Youth Champion in the 800m (2011), World Jr. Champion (2012) and, in 2014, had the fastest time in the World … at the age of 20!  Her current coach (since 2010), Derek Thompson of Philadelphia, has done an amazing job bringing out her best abilities and her story continues!

You can learn more at:

http://kelloggs-teamusa.kelloggs.com/en_US/athletes/ajee.html or just Google her name!

Recent Conference Experience: MRA Joint Conference

Rajesh Bhai and Bob GranitoEarlier this year I had the pleasure of Co-chairing the Joint MRA Philly/Greater NY conference.  The conference was held in April in Center City Philadelphia and planning started back in October/November.  Being my first time planning a conference of this magnitude, my initial reaction was how will we get this done in time!  However, I soon found that my fellow Co-chair, Bob Granito (Of Interactive Media and member of the Greater NY Chapter) along with all the wonderful and dedicated volunteers were equally committed.  From the get go the entire committee proved to make the planning process engaging and seamless.   In the end it was fulfilling to see the planning and hard work from all come to fruition as many attendees mentioned they enjoyed the conference from beginning to end.  The conference itself was a full day event with 7 engaging speakers making up 5 thoughtful presentations:

  • Steve Levine (Zeldis) and Jerry Valentine (AstraZenaca) discussed current trends in Disruptive Behavior.
  • Michelle Murphy Niedziela, Ph.D of HCD Research discussed the 5 phases of Neuroscience.
  • David Dutwin, Ph.D of SSRS gave the keynote discussing the future of survey research.
  • Nina Hoe, Ph.D of Temple University presented on building a city-wide panel.
  • John Hartman and John Shiela of Phoenix Marketing shared their research on wearable technology trends.

As I reflect on the planning stages, I am glad to have the experience under my belt as I transition to my new role as President of the Philly MRA.  I cannot thank all the board members from both chapters and volunteers who helped make the event a success.

-Rajesh Bhai