Secure Digital Life #73
Recorded on July 24, 2018 at G-Unit Studios in Rhode Island!
- Check out our On-Demand material! Some of our previously recorded webcasts are now available On-Demand at: securityweekly.com/ondemand.
- Come to our Pool Cabana @ Black Hat and Def Con to pick up a free copy of "Cyber Hero Adventures". Here you will be able to get the comic book signed by Gary Berman.
Topic: AI (Daleks)
You know, with all the fun stuff on the news, I really thought we should talk about bots and botnets a bit.
On this episode of SDL, Russ and Doug discuss AI, how bots work, and bot nets in general.
- You know when we first started seeing bad bots (like the Daleks) there was almost always a mal kind of feeling about robots in general. SCIFI writers in the 30s and onward had a pretty big facisnation with them and Asimov seemed to be the most in tune with some future ideas where robots were serving all of us and had to be managed carefully lest they turn ugly (Cylons?) and decide that "Exterminate" was a pretty good thing to be saying. The term robot itself is Czech and there were robots in use as early as the 1940s. GM had a robot in the 60s called Unimate.
- Robots have always had a kind of scaling from very mechanical (think a device which picks up a piece of metal and moves it to the other side of a track) to autonomous devices that are self aware (we don't have those yet). Today, we even have people making "sex robots" which are semi-autonomous. The AI piece is complicated and so far, nothing truly autonomous has come about which is probably a good thing since we can't even deal with each other, let alone an autonomous race of immortal computers.
- But we want to talk about bots. That's not the same thing as a "robot" or even an "AI". The bots we are talking about are derivative of something called the robots.txt file on a html page. These are called internet bots and that of course got shortened to just bots.
So, the original idea that got going was these things called "spiders" trolled around the web just looking for things. The spiders basically picked a starting point, and then documented the page information into a database. When the spider found a anchor link, it added that link to it's list of places and eventually got around to examining that too. This is how things like google got started. These spiders were the tools of search engines so that they could locate all the goodies hidden all over the internet. My original experiment was as follows (cover your ears kids). I put my cat under a heat lamp (well, the cat put itself there) and I pointed a web cam at it and put up a web page which had a 1 minute update snapshot of the basket where the cat was laying. I didn't link it to anything but I put metatags in the page and the metatag "cat in a basket" took several days before it appeared in a search page. I took the page down and changed the metatag to "Hot Kitty Cam" and it got found in about the same amount of time. Now, Cat in a Basket got 5 hits in 2 weeks but "hot kitty cam" got about 10,000 in 2 weeks. Wonder why? (Note, it wasn't actually called hot kitty cam).
- Ok, so as this stuff matured, the need to interact with the spiders became more important. That meant that people started using elaborate meta tag schemes to attract views. This, of course, became a problem since people put tags that were not related in their list so that spiders would index them higher. All of this turned into a war between the spiders and the devolpers to see who could outmaneuver. So, spiders are the most basic form of bot.
- Once people realized that this was an automated robot which worked 24/7 gathering information, all of sudden, they realized that there was more out there then just nice information organization. Suddenly, the idea of scraping started to be a thing. What if you wrote a bot that didn't just look for links but instead looked for personal information. Now, the spider became a bot that scraped all sorts of things like phone numbers and names.
Name: Doug White
If your bot crawled a page and it saw name: it could report the name into the database and look for "phone". If it sees it, then it can match the two and move on. When this first got started, no one was really thinking about this and it very common for everyone to have a web page which had your personal information on it. Resumes were posted with home addresses, ages, even SS#s on them. Very quickly, scraping, became a profitable business since this personal information could be used for fake credit cards, loans, you name it.
- It used to be very common for almost every institution from the Dentist, Vet, Doctor, University, High School, etc. to ask you to put your name, phone number, and social security number on a paper form. Even just to sign in at the Vet, they asked. When they started using electronic systems, they just converted the paper form to an electronic form and used a database with the same information. When they went online, you guessed it, they did the same thing.
- When scraping and subsequently identity theft really got rolling, people started trying to fix it. This turned into a chess match with the bots.
- The robots.txt file on a web page. (see robotstxt.org for examples) contains instructions to "good bots" about what they are allowed to do on that page. For instance the command Disallow: / means don't look at anything. However, this is just like a sign that says "no spitting". It doesn't stop you from spitting, it just tells you not to do so. Bad bots just ignore it and go right ahead doing whatever they want. If the information is in text on the page, a bot can grab it.
- There are all sorts of solutions out there now about how to stop scaping but most of them involve redesign of the website, black/white listing, traps, captcha, etc. Using a anti bot ip filter is one common way that blacklists bot sources from sites.
- Other bots now use smarter techniques to infiltrate chats, facebook, twitter, etc. These are sometimes called "chatterbots". So, how does this work?
If I want a twitter chatterbot (twitterbot), I create a twitter, subscribe to a bunch of feeds, and start coding. Let's say I want to promote my candidate, Buck Turgidson, in his run for office. Now, Buck, he want's to "Shove 70000 megatons of kaboom-boom, right up the floating heads ass". If my bot receives the feeds and sees someone tweet "I really hate those floating heads". or even just "floating heads". The bot can actually (oversimplification) construct a tweet that replies. Like, "yeah, let's get those floating heads" or "I bet those floating heads are running a child sex shop out of a pizza parlour in New Jersey" or "You know, floating heads aren't as smart as me friend. Check out my site at cillallfloatingheads.com". What if you had thousands of these bots and they just work day and night. They can also collect information like your twitter handle and put that in a database which then searches facebook for your handle and another facebookbot starts spamming your account with anti floating head ads and news sites which report negative things about floating heads. Hmmm.
This same tactic is used legitimately by marketing firms to push ads to you on facebook and other sites. If you search for "trapezoidal dancing supplies" on Amazon. Their bots see that search, add it to their database associated with you, and make sure that you see all sorts of adds for trapezoidal dancing bars, trapezoidal dancing partners, and even, you know, tesseractal dancing supplies for those adventurous types. This isn't necessarily bad, but it means that marketing firms start buying this information and these bot databases and using them to target you as well (see Frederick Pohl's Venus Inc. and The Midas Plague (in Midas World)). Again, maybe not so bad. If you like trapezoidal dancing, wouldn't you rather see ads about it then ads about adult diapers?
But what if it gets more malevolant? So, let's add now, what if we start big data on the bots. So the bots can scrape, then we collect, and we start doing what is called discriminant analysis and LISREL (look it up) on the data sets. What if I can find out that a person who is a registered glipglop, who likes chocolate, who subscribes to thechristiansatanist.com, and purchased trapezoidal dancing supplies, is very likely to go for our fake story, "Buck Turgidson says "Neutrinos, harmless particles from outer space? It think not. Neutrinos are neutering our kids and turning them into zombies." as clickbait. Whoa! Now, we're talking. That bot collection was able to do more then just scrape, it was used to, dare I say, subvert someone. Combine that with marketing and all of a sudden, you are flooded with Buck Turgidson ads, connections to theconfederateyankee.com, trabtierb.com, and even srawofni.com ( a really crazy one). You start getting adds for tinfoil hats, survival condos in Mexico, etc. Suddenly, this is more than just a bot, it's a way of influencing opinion.
How does it work: Just coding. Simple instructions that involve a large scale data analytics operation. By doing all that we can code this down. Look at Eliza (1965 or so) from MIT. That program just took what you said, and turned it around a bit, peppered it with witty non sequiters, and boom people were fooled. I mean with a little time, you knew something was wrong, but think about tweets. If Eliza could have tweeted, wow!
So, what's a mother to do? Well, you have to armor yourself. That means, stop letting them scrape your info, start being skeptical and using common sense, use sites like snopes.com to double check things, and well, watch out. Certainly, anonymous browsing helps (tor, just plain old incognito mode, etc.) and using VPNs prevents them from pinpointing you, but the bots are out there working night and day to find you, get your info, categorize you, manipulate what you do, and change an outcome. How do you think I ended up with all these Tesseract Dancing Supplies?