Live Traffic Feed
Assisting with the (digimob) Project -
Page 2 of 4 • Share •
Page 2 of 4 •
1, 2, 3, 4 
Re: Assisting with the (digimob) Project -
neutralrobotboy wrote:Ahh, thanks. Forgot to do that. Should be viewable now.
Fantastic progress!
_________________
"Sacred Activism is the fusion of the mystic's passion for God with the activist's passion for justice, creating a third fire, which is the burning sacred heart that longs to help, preserve, and nurture every living thing." - Andrew Harvey

Khephra- Number of posts: 700
Age: 44
Registration date: 2008-08-11
Re: Assisting with the (digimob) Project -
Ideally we could enact a format that was suitable for labour-saving updates.
Yeah, if it doesn't add labor to the releasing team, it would be excellent to have something like this -- future releases could be catalogued almost entirely by script.
Part of the idea behind matching the genres to the structure of the fora was to help confine things a bit. Otherwise we could get really bogged down in sub-genres. However, I'm certainly open to other ways to categorize, so if you - or anyone else - has any ideas, pass 'em on!
Yeah, makes sense. So something on learning Hebrew would be under "Western Magickal Tradition > Qabalah", say?
Anyway, the remaining issues with the script are:
1. Some entries can't currently be put in by the script at all. These are entries with special characters, such as accented vowels. I've figured out how to put these all in their own text file, so they can probably be copied + pasted from that. There aren't that many of them, in any case. The problem seems to somehow be related to unicode/ascii conversion somewhere, but it's a real headache.
2. Unidentified bug: Makes it crash after parsing about 8700 lines of the ls output. Still tracking that one down, it's very strange. The database I linked to earlier is the result of a cheap hack to circumvent it. That's why some of the "Occult Carrot" entries aren't labeled properly.
3. File grouping not implemented. This is the idea I proposed of grouping files that are clearly numbered as part of a series. I'll try to get on this soon-ish, but I'm trying to spend a bit less time at the computer at the moment, and my workload is soon to increase...
After these issues are resolved (can't guarantee a timeframe at the moment), the script will be ready to go "live", basically. Then we can work out how to do the community participation thing, etc.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
neutralrobotboy wrote:Anyway, the remaining issues with the script are:
Wow! It's exciting to watch this progress!

ankh_f_n_khonsu- Number of posts: 395
Registration date: 2008-09-16
a couple of questions
Great work! I certainly don't want to dissuade you from this project, but I've got a couple of questions.
1) How does someone look up an item, if they don't have the exact file name? The hypothetical pdf I wish to contribute might have been named different at some point, either with the title of the book first, rather than the author's name, or the author's first name and yours starts with the author's last name, or one of them gets listed under "the" (as is the case in mega torrent #2.3,) etc.
2) Also, currently these aren't in alphabetical order. Will the program correct that, when you're done entering the information?
1) How does someone look up an item, if they don't have the exact file name? The hypothetical pdf I wish to contribute might have been named different at some point, either with the title of the book first, rather than the author's name, or the author's first name and yours starts with the author's last name, or one of them gets listed under "the" (as is the case in mega torrent #2.3,) etc.
2) Also, currently these aren't in alphabetical order. Will the program correct that, when you're done entering the information?

Sascrunch- Number of posts: 40
Location: Colorado
Registration date: 2008-08-13
Re: Assisting with the (digimob) Project -
1) How does someone look up an item, if they don't have the exact file name? The hypothetical pdf I wish to contribute might have been named different at some point, either with the title of the book first, rather than the author's name, or the author's first name and yours starts with the author's last name, or one of them gets listed under "the" (as is the case in mega torrent #2.3,) etc.
Actually, I was going to bring this up too... I think Google Docs has some search functions, if not, the solutions aren't terribly convenient. The thing is that even with things as they are now, you *should* be able to do searches like, [Author's last name] in the "File Name" column, actually, and have all files that include the author's last name anywhere in the filename turn up. I can look into this. It would be excellent if someone else could also look into this, since I'm not at the computer too much at the moment. In any case, searching should be flexible. Also, if you had a pdf to contribute, I assume the releasers would take care of renaming the file if that needed to be done.
2) Also, currently these aren't in alphabetical order. Will the program correct that, when you're done entering the information?
This can be done with simple column sorting. The script *could* arrange things in alphabetical order, but this can also be done simply even in Google Docs. Part of the idea of a database like this (I assume) is that it can be sorted in a variety of ways, leaving the question of how to sort the data up to the user.
All that said, a more convenient solution in the long run would be a custom web app. This would mean hosting and bandwidth costs, though... Not to mention development of the app itself and database administration, which would all become a real pain real quick. Though I could write the web app in theory (probably using Django and MySQL, which I've played with before), I wouldn't trust myself with admin -- It's just not something I have experience with.
At the moment, I assume the "preferred" solution is for the user to actually download (i.e., export) the database and use it locally. Like, I would grab it as an .ods or .xls file and open it in OpenOffice.org and do my searching and sorting that way. But again, that's not terribly convenient and I haven't looked into the best way to do it from within Google Docs.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Just looked into the search+sort issue, the solution seems to be "gadgets". I think it'll be easy to figure out, actually.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Yes, in preparation for digests a text file is manually searched for duplicates... having the archival list searchable is a high priority. 
_________________
"Sacred Activism is the fusion of the mystic's passion for God with the activist's passion for justice, creating a third fire, which is the burning sacred heart that longs to help, preserve, and nurture every living thing." - Andrew Harvey

Khephra- Number of posts: 700
Age: 44
Registration date: 2008-08-11
Re: Assisting with the (digimob) Project -
Yes, in preparation for digests a text file is manually searched for duplicates... having the archival list searchable is a high priority.
I've looked into google docs' "gadgets" again for this, and it's still a jumble to me. For a database this large, it may be a big headache. I'm actually not convinced that google docs will wind up being the best solution in terms of searchability, but I'll keep digging and hopefully figure it out. The good thing about the "gadgets" thing is that I can write one in JavaScript if need be. That said, I'm still not totally convinced that this is right considering the number of entries we're dealing with (this affects how long it takes for a gadget to refresh, which happens often). Figuring this out will take a bit of time, unless someone else has already dealt with this in google docs and knows what to do.
Dealing with this stuff, I'm tempted to just host the database myself and code the interface by hand in python using django. It's been a while since I've done that kinda thing, but it still may be worth it over google docs atm. I have a suspicion it'd take less effort and produce better results for flexible searching. The only thing is badwidth. More on this later.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Sorry to jump in, I appreciate the progresses made so far, but I would like to give some comments.
Some suggestion for improvement: convert extension to lower case.
It ain't big yet. But we need to consider scalability.For a database this large, it may be a big headache.
I would suggest that you spend some extra time refining the import to spreadsheet. I think some good use can be made out of spreadsheets and it can turn to be a useful tool until we get to the database level.Dealing with this stuff, I'm tempted to just host the database myself and code the interface by hand in python using django.
Some suggestion for improvement: convert extension to lower case.
Schinder- Number of posts: 4
Location: Beijing, China
Registration date: 2009-07-29
Re: Assisting with the (digimob) Project -
Sorry to jump in, I appreciate the progresses made so far, but I would like to give some comments.
No need to apologize! I'm glad to get more input.
It ain't big yet. But we need to consider scalability.
It's true that it ain't big yet. However: When playing with google docs' "gadgets", it seems that they refresh quite often, which seems to mean a complete refresh of all 8,000 (ish) rows of data, which means it takes a long time to do anything with it. Since I'm new to google docs, I'm hoping there's something I just don't understand, or some simple work-around. But as it stands, it's pretty annoying to work with. If you want to try messing around with it to see if you can find a convenient and flexible "gadget"-y way to search by varying criteria, though, please feel free to relay any of your findings.
I would suggest that you spend some extra time refining the import to spreadsheet. I think some good use can be made out of spreadsheets and it can turn to be a useful tool until we get to the database level.
One reason I'm bringing this up now is that I have a strong suspicion that telling my script to output to a database directly would resolve the outstanding problems with the script itself. That is, I don't think that the unicode conversion weirdess would persist, since it's an artifact of the module that's writing to spreadsheet format. Actually, the other outstanding problem with the script is also related to format conversion somewhere in that module. I should probably look at other python modules that have the same functionality, see if they fare any better.
Anyway, my thinking with the database is not so much about scaleability as flexibility, ease of use, and ease of maintenance. At this stage, maybe it's not really the way to go, I'm not sure, but it *may* turn out to be easiest in the long-run to be in total control of all archive-related functionality.
Even at this early stage, google docs is only practical if we find some way of providing reasonable search functionality. With things as they are, the only way I can think to do that is by allowing people to download the spreadsheet and search using their favorite compatible spreadsheet program (probably Excel or OpenOffice.org). Which I guess is fine, but it puts the burden of sorting and searching on the user, if this is meant for public consumption.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Regarding the size, I agree that Google Documents is not there yet, but it is true for other programs. I use to do this type of file indexing on my hard disks and until Excell 2003 the number of row in a spreadsheet was limited to 65536. Spreadsheets are not databases, but you can go a long way with them today.
In fact for the file format, I would suggest to stick to the basic: csv (comma separated value). You can do a lot of work on it using simple tools like grep and only convert to more powerful format at the last minute. That will also speed up the time you need to generate your file.
Also, until you are ready to crowd source the sanitization of your data, the database is not really necessary.
In fact for the file format, I would suggest to stick to the basic: csv (comma separated value). You can do a lot of work on it using simple tools like grep and only convert to more powerful format at the last minute. That will also speed up the time you need to generate your file.
Also, until you are ready to crowd source the sanitization of your data, the database is not really necessary.
Schinder- Number of posts: 4
Location: Beijing, China
Registration date: 2009-07-29
Re: Assisting with the (digimob) Project -
In fact for the file format, I would suggest to stick to the basic: csv (comma separated value). You can do a lot of work on it using simple tools like grep and only convert to more powerful format at the last minute. That will also speed up the time you need to generate your file.
Well... some file names have commas in them for one thing. But for another, currently the text data is parsed into a list of objects (classes) in python. I can use those objects to render their data into any format I want. I can see no good reason to render first to one file format and then another. This would actually lengthen the time needed to generate the spreadsheet. String searching, etc, can all be done very easily from within python.
Also, until you are ready to crowd source the sanitization of your data, the database is not really necessary.
Maybe, maybe not. There would be some advantages. For now, I'm thinking that Google Docs is unlikely to be satisfactory for searching through the output. In the meanwhile, though, it's true that there are at least some things I can fix up with the script, which will be useful regardless of the final output format. I'll get on those things in the next few days, hopefully. I think they'll be pretty straightforward. The real "bugs" that remain, though, are in one way or another spreadsheet-related.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Google Docs may well be insufficient to our needs... As for hosting a database, could we use free hosting? These forums are hosted through Forumotion, and I have to figure we might could find free hosting for that, too. But, given my ignorance of these matters, perhaps I'm completely mistaken. 
_________________
"Sacred Activism is the fusion of the mystic's passion for God with the activist's passion for justice, creating a third fire, which is the burning sacred heart that longs to help, preserve, and nurture every living thing." - Andrew Harvey

Khephra- Number of posts: 700
Age: 44
Registration date: 2008-08-11
Re: Assisting with the (digimob) Project -
Google Docs may well be insufficient to our needs... As for hosting a database, could we use free hosting?
I'll do a little bit of research, but at first glance, the answer seems to be yes. I'm also fairly ignorant on this topic, so I'll see what I can figure out. I'm looking through these sites first:
http://www.free-webhosts.com/free-mysql-database.php
And I'll also look to see if I can find other sites that might not require the extra coding.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Re: Assisting with the (digimob) Project -
Just a bit of an update:
I've found a MySQL hosting site that's free and looks like it could be fine. The only drawback is that I may have to code an interface in PHP, I'm not sure. But in any case, there are a few other testing steps to take. I checked out one other site and it had terms and conditions that made me think twice. I might look at a couple more in the next few days, but I do have some deadlines coming up...
I've dedicated very little time to this in the last week, and that may continue for a little while. I have a feeling that whenever I manage to sit with this for a few hours, I'll be able to basically figure out what to do and how, so hopefully there should be more to report in another week or two. I feel a bit lame about the snail's pace at the moment, this stuff shouldn't have to take quite this long, but it's not quite priority #1 right now. Nevertheless, work continues, however slowly.
I've found a MySQL hosting site that's free and looks like it could be fine. The only drawback is that I may have to code an interface in PHP, I'm not sure. But in any case, there are a few other testing steps to take. I checked out one other site and it had terms and conditions that made me think twice. I might look at a couple more in the next few days, but I do have some deadlines coming up...
I've dedicated very little time to this in the last week, and that may continue for a little while. I have a feeling that whenever I manage to sit with this for a few hours, I'll be able to basically figure out what to do and how, so hopefully there should be more to report in another week or two. I feel a bit lame about the snail's pace at the moment, this stuff shouldn't have to take quite this long, but it's not quite priority #1 right now. Nevertheless, work continues, however slowly.
_________________
No set of rules or customs can substitute for living wisdom.
neutralrobotboy- Number of posts: 159
Age: 28
Registration date: 2008-12-23

Page 2 of 4 •
1, 2, 3, 4 
Permissions of this forum:
You cannot reply to topics in this forum



