Some Considerations for the Semantics of EM Technology

I have been looking at a semantic algorithm that could quickly tag notes using material mined from a note stored in the EM. Basically, the idea is to weigh significant mined words on the grounds of existent tags and a dictionary that has a semantic load assigned to each word.

By using a dictionary, this would also allow us to return tags that are not in the note but that are semantically significant to it. For example, if you have a note that talks a lot about mathematics, inference, argumentation, formulae and so forth, you could have a dictionary entry such as logic={1/inference,0.9/formulae,0.7/argumentation,0,7/mathematics,…} that could thus return the top-level word ‘logic’ as a suggested tag on the grounds of sufficiently many significant words being featured prominently in the semantic load of the word in the dictionary. (This is pretty simple to do using an algorithm employing basic fuzzy set theory.) Also, if ‘logic’ featured as an existent tag, this would increase its likelihood as a suggested tag. This is to overcome the obvious problem that often notes do not actually contain the words naming the categories they would best fit.

As suggested by Olli, we can actually do this more elegantly without using a dictionary, by generating tags using a social network. Instead of making a dictionary, which is at the end of the day very clumsy and complex, we could provide precisely the same semantic functionality by replacing the dictionary with social networking, somewhat similarly as is done presently in Delicious or StumbleUpon. In other words, semantic load for cross-referencing words would be determined by tags in the entire social network the user is connected, rather than a dictionary.

In addition, the software could learn quickly to recognize significant semantic relationships to the user by a simple “thumbing” – except that this would be performed by simply “swiping” out non-functional tags. (Basically we could still use the fuzzy intensional model above (i.e. term = {weight/criterion,..}, but populate it by using the network and thumbing.)

There is, however, a significant problem involved in both the dictionary-based approach and the network-based approach. Namely, the above approach works only for notes with a significant amount of data, such as websites, lecture notes and so forth. But when I went through my Evernote notes trying to figure how the algorithm would tag them, it soon became apparent that most notes simply do not have enough data to mine from. Also, data mining would be difficult to perform on visual or audio notes.

So the question is: How do we produce enough data from a note to provide appropriate tags?

Addendum by Timo T.:

One idea would be to incorporate a concept of “neighbour note” into the algorithm. Say if you have a Evernote notebook of “Project: Extended mind think tank” you would have a bunch of notes in there all of which together make up quite a large amount of data. When making a new note, the algorihm could then take into account your existing tags from neighbour notes in the same notebook and/or process all of the neighbour notes to bring up suggestions.

The user will want (and probably need?) to do some kind of sorting of data on her own – like putting similar stuff into the same folder/notebook/etc. We should also use that connection data in our algorithm to make it work better and suit the preferences of the user.


2 thoughts on “Some Considerations for the Semantics of EM Technology

  1. Pingback: Tweets that mention Some Considerations for the Semantics of EM Technology « The Extended Mind --

  2. Pingback: Shortest Path between Two Concepts Is the Most Used One (by Olli Parviainen) « The Extended Mind

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s