Attack of the Turnitin Bot

So I’m looking through my apache access log and I come across this:

1195 2.88% TurnitinBot/1.5 http://www.turnitin.com/robot/crawlerinfo.htm

Okay, so it’s a bot reading my web site. It’s generated 1195 hits during the month of February. That’s 2.88% of the total traffic. But I’ve seen bots before. Happens all the time. Google, MSN, they all do it. They’re just indexing the content so they can return it as results when you search for something. (Ever notice how if you search for Oakland Gyros, for example, you come up with me? Neat, huh?) Move along, nothing to see here.

Except there is. Because I know this company, Turnitin. Maybe you do, too. For those of you who don’t know, it works like this. A professor in college makes you turn in your paper to him via the Turnitin service. Turnitin checks the paper to see if you plaigiarized. Then it hands over the results of the analysis to your professor, who busts you. But how does Turnitin know whether you plagiarized or not?

Well, for starters they keep a copy of your paper so they can check future papers against it. After all, maybe it was a good one. The kind of paper your roommate might want to use next term. Woe to him if his professor uses Turnitin. They’ll compare it to all the papers they know about – yours included – and rat him out to The Man. He could get expelled. And maybe that’s good, because you know you can barely tolerate him anyway. Never does the dishes, always late on the phone bill. But I digress.

So the body of information they have to check future papers against is growing every day. But that’s not good enough. Apparently they are now sending bots across the web to include web sites in their database of content. So if you thought you might be able to lift a few paragaraphs off my blog for your term paper, think again. The Turnitin bot has a copy to check you against.

But what’s the problem? Cheaters should be caught, right? They make it tougher on the students who do things the right way. I agree with that. But somehow I have a bad feeling about Turnitin and their service. Here’s a couple of reasons why.

1. At the request of a professor, turnitin is more than welcome to check someone’s college paper for plagiarism, but where do they get the right to keep the paper for their records? They are basically making money off that work without any compensation to the author. That paper is increasing the value of their database. The author gets nothing. I wonder if the student is even aware of the fact that his paper is now being used to evaluate other students papers. I wonder if Turnitin even asks permission.

2. I bet the push to have instructors use services like Turnitin has more to do with overly large class sizes than it does with rampant plagiarism. How can you teach English 101 without reading a lot of papers? How do you read the papers if you have 200 students? At least with Turnitin you can weed out the cheaters in an automated way. Hopefully that leaves you free to evaluate the papers on other criteria. If you’re reading them at all. (You are reading them, right?)

3. I was thinking about the atmosphere of mistrust that the use of Turnitin must engender. How long do you think it will be before some enterprising young student strikes back by feeding his professors doctoral dissertations into Turnitin and discovers something embarassing? How many professorial publications do you think he or she would have to submit before hitting a jackpot? Not a lot I bet. Maybe you think that’s vindictive. But, hey, isn’t it important to weed out dishonest professors? Sure it is. Though with the tables turned, suddenly professors might find themselves not liking Turnitin so much.

4. Does Turnitin have the right to use my blog to enrich its database? I do license these works, such as they are, under a Creative Commons license which clearly states that you can do what you want as long as you give me credit, but is Turnitin giving me credit? Do they have to? Are they doing anything that Google isn’t already doing?

Some smart lawyer out there can probably explain to me why this is all perfectly okay to do. Some smart apache admin can probably explain to me how to lock out the Turnitin bot so I don’t have to unwillingly contribute to their service if I don’t want to. But the whole thing still gives me the creeps. What do you think? Have you used Turnitin? Have you had it used on you? Do you think their service is a good idea?

UPDATE: I got to thinking, I already know a pretty smart server administrator: me. So ten minutes later I have a robots.txt file at the root of my web server that reads:

User-agent: TurnitinBot
Disallow: /

That should block Turnitin from using any more of my content. At least until I figure out if what they are doing is something I agree with or not.

No Responses to “Attack of the Turnitin Bot”

  1. Adam Says:

    I totally enjoyed reading about your opinions on this Turnitin stuff. It’s nice to see someone knows what’s going on. :)

  2. Devin Says:

    My girlfriend’s high school uses Turnitin, and when she was telling me about it I just got madder and madder. It’s such an invasion of privacy. And further more, don’t we – the taxpayers – pay these teachers to ACTUALLY READ the students’ work.

    In my girlfriend’s high school, the teachers that use turnitin don’t really bother to read the papers, they only have other students check them quickly. If a teacher doesn’t read a student’s work then how will he or she know which areas a student needs improvement in. It’s pretty ridiculous.

    Today I was reading through my website logs and I noticed the same thing. It’s disgusting. I definitely oppose this horrible service and I certainly don’t want to have to pay for the bandwidth they’re using to crawl every single page on my website – even pages that have no links to them from anywhere. I’m definitely going to put a robots.txt file in my server and I encourage anyone else running their own server to do the same.

  3. scott Says:

    Are they doing anything that Google isn’t already doing?

    Note to self: Yes. Google cites me (by providing links to the returned content. Turnitin does not cite me or give me credit of any kind.

  4. Mohammad Khatami Says:

    Thanks for the article. Rather informative.

    Note to NSA bot, my name is not mohammad.

  5. Turnitin Sued for Copyright Infringement » Computers, blogging, education, martial arts and liberal politics. Says:

    [...] Wow, has it really been three years since I wrote about how Turnitin gives me the creeps? Time flies when you’re raging against the machine, I guess. According to slashdot, some high school students in Virginia are suing the plagiarism-detection service for $900,000 in damages, claiming – as I did three years ago – that the company is using their copyrighted works for financial gain without their permission. If the consensus at slashdot is anything to go by (and it’s admittedly often not), Turnitin may be in some trouble on this one. [...]

  6. PSG Says:

    I am waiting to hear from my attorney since I just found out that my instructor uses this for this purpose. Until then, I never knew what it was for and the instructor decieved me intentionally not telling me what it is for. I think this is very low and this is a violation of our constitution of Innocent Until Proven Guilty. I pay high tuition rates for them to do their jobs and read and maually check for plagarism. I await my attorney’s phone call.

    Also for people with multiple blogs, if they make the same entry into all their blogs, will turnitin check all that too without our authorizations?

  7. scott Says:

    I don’t think you’ll get far with a lawyer on this. You might want to go to the provost or other academic authority at your school, however. Deceiving students is never an acceptable thing at reputable institutions.

    And yes, turnitin will check everything on the web unless someone tells it not to. You must do this for each individual site, even if it has the same content.

  8. RL Says:

    I had no problem with my univeristy (postgraduate) having a statement in their consitions that allowed them to use software to detect plagiarism, but when that ‘software’ turned out to be submitting works to an online database, to a company which directly profits from the use of work owned by third parties, there was no way I could continue studying at that university.

Leave a Reply

Anti-Spam Quiz: