SpaceBanter.com - View Single Post - Lockheed: Perpetrated the 9-11 Attack on the USA Ping MHVW: VVF nomiation. Was Water: Free Energy

#21 April 14th 04, 06:22 AM

On Tue, 13 Apr 2004 22:07:57 -0700, "Ugly Bob"
wrote:

"*" wrote in message
.. .
On 14 Apr 2004 02:01:19 GMT, John Griffin
wrote:

Why did you change it to "Alexa"?

Alexa is a real person. Get her name right.

That must be your real name...?

How you jump to that conclusion is a matter of National Security, right?

What's the madder wid ya, doof? Don't you like the message?

http://makeashorterlink.com/?U1C952408

We like this one much better:

Bill Gates and Lockheed's Echelon

http://news.com.com/2008-1082_3-5065298.html

Ever get the feeling your Usenet newsgroup list is being watched? By
Microsoft?
If so, consider yourself right. Thanks to the expertise of sociologist Marc
Smith, Microsoft is keeping a close eye on newsgroups and other public
e-mail lists, which it has identified as the Internet's undervalued
"knowledge management application."

In Microsoft's research and development labs, Smith has spent the past
several years slicing and dicing data about messages and message authors in
an ambitious effort to help people make sense of the newsgroup manifold--the
hordes of know-it-alls, flame warriors, spammers and neophytes who, by
Smith's estimate, last year numbered more than 100 million in the Usenet
network of e-mail threads, or newsgroups.

Smith's idea is that you can tell a lot about the quality of data by
tracking its newsgroup contributors' social habits--a notion that holds
promise for sorting through millions of messages, and peril for a online
world increasingly skittish about invasions of privacy.

Following the launch of Microsoft's NetScan application for analyzing
newsgroups and the people who post to them, Smith spoke to CNET News.com
about NetScan, about Microsoft's interest in e-mail lists and about an
application under development that would link objects in the real world to
an array of online information.

How did a guy like you get to work for a company like Microsoft?
I'm a sociologist. I've now been at Microsoft Research about four-and-a-half
years. Microsoft has a few social and cognitive psychologists, but I'm the
only sociologist.

Which means what, exactly, in the context of technology employment?
A sociologist studies the attributes of relationships and the group of
relationships that add up to a collective or a community. As a technology
group, our mandate is to both explore and to build tools to study the
phenomenon that we could call online community. We sociologists don't like
to use the term "community," particularly--we like to refer to them as
social cyberspaces.

What's wrong with "community"? The word seems to come up all the time when
we talk about the Internet.
When we say "community," perhaps what we really are looking at is a special
case of a broader phenomenon that sociologists call collective action, when
a group of people do something together. And this turns out to be the No. 1
thing people do with their computers: It's to send each other e-mail. The
No. 2 thing is to send groups of people e-mail--to join the list of people
who like to knit, or who like Microsoft products.

So why exactly does Microsoft need a resident sociologist?
Microsoft has a big investment in online communities, and has not had until
recently many tools to enhance that investment. What Microsoft wants around
communities is what every enterprise does, which is a peer-support,
knowledge-management application. And that means that if you go into Usenet,
you'll find 3,000 Microsoft public newsgroups, with 1.5 million people
posting 10 million messages. And that's 2002--and it's going to more than
double this year, because it more than doubled in '01. We don't see traffic
flagging at all.

My impression was that the use of e-mail lists was on the decline.
To the contrary! It's on the rise. Usenet alone--which is a backwater in
that most people don't know where it is and how to find it--on Usenet alone
there were 13.1 million unique identities who used Usenet in 2002, and by
that we mean that they were a contributor and wrote at least one message.
How many people read the message? We have no idea. That number is invisible
and is fragmented over a half-million servers that are not sharing their
data. But conservatively you could estimate that there are 10 readers for
every writer, so that makes it 130 million Usenet users per year. And that's
a small number compared to majordomo lists, or things like Yahoo Groups, and
the number of people who have a bulletin board on things like UltimateBBS.

What are you doing with these lists, from a sociological standpoint?
What we are about is the thread. It turns out that the core sociological
data type of the Internet is not IP (Internet Protocol) numbers, or any of
that stuff, it's threaded conversations. And it's amazing how little It
turns out that two-thirds of all threads in Usenet, in 2002, had a whopping
two messages.
investment has been put into adding value to the core data structure of the
Internet, which is the conversational thread. I can illustrate that by
suggesting that when you sit in front of your e-mail client, simply try to
sort your messages by thread size.

And by size of the thread you mean...?
I mean the number of messages, the number of generations of messages, the
breadth of the conversation. If eight people reply to a message, it has a
breadth of eight. If 12 reply, it's 12. And it turns out that the frequency
distribution of thread properties is very illuminating.

It turns out that two-thirds of all threads in Usenet, in 2002, had a
whopping two messages. And two-thirds of all authors are the people who
write a message, post once one day, and never again.

Is that indicative of a spam problem?
No, those aren't spammers, they are the people who post once, get their
answer and go away happy. They post a message that says they can't print,
then they get their answer. What newsgroups are is a form of knowledge
management application. What they are about is leveraging the collective
knowledge of large numbers of people.

So how is it useful to know that people are getting their printing questions
answered? What can you do with that information?
What you can do is say, "Let's look at how many times each of those unique
IDs posted. Twenty-four million times? That's your spammer." Humans have a
limited capacity to type and send and think up messages, while software is
virtually free from those constraints. What we do is say, "By looking at
these properties, the structure of authors, threads and newsgroups, we can
determine a lot of things that are good predictors of value."

Here's an example: Let's say you have a newsgroup with 22,000 messages
posted there per month. You have a problem! What should you read? We have
some suggestions. In an existing browser, you can see the messages sorted by
date, sorted by size or sorted alphabetically, and this is not very useful.
What we want to say is, "There are different vectors through this content
space, different ways of slicing into the data, the conversation, that are
more likely to bring valuable information."

For instance, what are people talking about? What we've done is highlight
the 40 threads that got the most number of messages in this period--day,
week, month, year. And we'll say, "Here are 40 really big threads." How do
you know those are good? We're not sure they were good, but these were the
things that got people really excited and engaged in this newsgroup. That's
one vector.

But what about the guy who gets his printer fixed in two messages?
And you can legitimately argue that. "What about small threads of high
value? How can you help me find them?" The answer is that we are, by
leveraging latent structural data that is itself a product of collective
behavior. You have lots of individuals working on their own. If there were
only one person writing Web pages, Google wouldn't work. But Google Groups
doesn't do what we do to Usenet. We're doing something useful to Usenet.
We're not yet a search engine, we're a research project. And we will
eventually be doing things related to the full text of the message.

Let's look at the individual who posts to a list. Does he show the pattern
of participation over time that is an indicator of a valuable contributor?
The question you should raise is, "What do you mean by value?" One man's
flame warrior is another man's poet. It's not for us to tell you. But we do
give you tools to sort patterns of difference.

Let me tell you how to find someone who gives really good technical support
answers using our author tracker. It's a way to slice a vector into the
content space that measures how dedicated are the people to this newsgroup.
Basically, it asks, "Are you a regular?"

And what will that indicate?
Regulars are value contributors. But you could say, "You are sorting people
by--and we do--how many days they come back." For example, you go into some
of our tech support newsgroups, and you'll find that there are I'm a social
scientist--I don't know the difference between good and bad, only the
difference between difference.
people who have contributed every day in the month. OK, those are regulars.
But how do you know they have value? It's not just the number of days you
come back. There are three other metrics, which tend to be ratios. One is
the ratio of replies: How many times did you reply to someone else, or start
a thread? Spammers may show up every day, but they don't reply. With a very
low reply-to-post ratio, I would say that that is a person who starts a lot
of conversations but never replies to anyone else, and it's probably a
spammer. Showing up every day is not enough--you have to respond to other
people. It's also thread-to-post. How many threads did you touch, how many
messages did you write? If you wrote 10 times, all into one thread, that's a
low ratio. You have a high conversational concentration.

Is that good or bad?
I'm a social scientist--I don't know the difference between good and bad,
only the difference between difference. Do I like flame warriors? Or don't
I? A high reply-to-post indicates a flame warrior, because they tell you
you're an idiot and they put all their messages into a few threads--so they
also have a low thread-to-post ratio.

If you want to find the answer person, flip that ratio around. They differ
from the flame warrior in the following way: Both show up every day, and
both reply. The answer person answers a post once or twice, then moves on.
We've seen people post 500 messages in one week in one thread. If you have
that much time on your hands--it's not to say that it's a good thing or a
bad thing, but a different thing. We give you the opportunity to say, "I
just came here because I can't print." We will guide you to the very real
group of people who are dedicated, for whatever reason, to not just computer
technology, but answering questions about knitting, horseback riding,
dogs--you name it. And the way to do that is to start looking at the social
accounting metadata about authors.

So could all of this ultimately add up to a better search engine?
If things go well, we'll have a better search engine. This remains early,
initial research, but our results look promising. Reranking results based on
social histories does do a better job, and I do believe we will deliver
interfaces that will find people who are debators, fine, but also those who
are answer people...It turns out that people have a lot to give each other.
There's a lot of knowledge to share, and 2 percent of every population is
motivated to be a knowledge sharer.

Most of us have to rely on signs or symbols that suggest a person is
reliable. With doctors you have their diplomas, the way the office looks,
and most important, who referred you--these are all indicators that we rely
on. We are trying to create analogous tools for online environments where
that data is latent, is not manifest in the interfaces visibly.

When you talk about a reputation system, I'm reminded of the eBay system.
We're similar but different--eBay is an explicit feedback system, and we are
an implicit feedback system. With eBay, buyers rate sellers, and sellers
rate buyers, after they conduct a transaction. It's what people say about
you. But there are real problems with this--most of all inflation, the
"Beverly Hills-adjacent" problem. If you read the L.A. real estate section,
everything is "Beverly Hills-adjacent." So there is this tendency to
inflate. There have been empirical studies of reputation ratings at eBay
that suggest that just going by reputation ratings at eBay is not an
indication that you're not going to get a fraudulent transaction.

Tell me about the AURA (Advanced User Resource Annotation) project.
AURA is about extending NetScan: "What if you could use NetScan with a
pocket computer and attach threads to things?" We use the Toshiba e740 and a
Compact Flash bar-code reader, run AURA software, and can walk up to any
bar-coded object, any ISBN-coded object, scan it, and the device brings back
information about that object…We imagine being able to walk up and down the
aisle of a grocery store and have a handheld computer rate everything with a
green light, a red light, a skull and crossbones.

In Hong Kong, during the height of the SARS outbreak, there was a system
that could tell you which buildings had had confirmed SARS cases. Now that's
a reputation system.

It's easier to do this with products than with, say, people.
People are one thing, but objects--all the books on my shelves, all the food
in my kitchen, the artworks in the hallway--we at Microsoft have bar-coded
every one of them. AURA is going to become a navigation tool. You can print
a bar code for a penny and slap them on things. Which we do--and then
Facilities comes along and scrapes them off.

It seems that once Microsoft starts tracking the behavior of individuals,
you're asking for trouble. What about privacy?
I think it's a very important thing. And we have build NetScan to protect
what I think are legitimate claims for privacy. Like a Net spider, NetScan
takes publicly accessible documents off the Internet, and it respects
metadata that says "Leave me alone!" There is the robots.txt file that says,
"You can look at this but not that." With Usenet there is one that says
"Leave my messages alone," and we respect that. We will not store your
messages if you put that in them.

Couldn't a spammer just put that in his or her messages, so you wouldn't be
able to identify them as a spammer?
That's a possibility, and that's something we would have to respect. But the
system still would not fail, because a person with no reputation is a person
who has a reputation. "Let me tell you about the people who the system has
shown to have value." We're about letting the cream float the top and not
about letting the other stuff sink.

How can you reassure someone who might be concerned that it's not such a
good idea for computers to be keeping track of our belongings and our
whereabouts?
I'm not sure, but we're leaking data all over the place now. And on the one
hand, that has utility for other people. On the other, there's a privacy
risk. In some ways, consider us a form of performance art. Would you like to
see you? This is potent. We accept that and hope we can offer people good
prophylactics against loss of privacy. And that may mean keeping multiple
IDs and e-mail addresses. Ultimately we may have to fragment our identities.