User:ProteinBoxBot/Ideas
Appearance
NOTE: This page is effectively read-only, except by the bot organizers. Please post any ideas and suggestions on the discussion page.
Development plan and future ideas
[edit]NOTE: The items below are thoughts for the future and are not included in the initial proposed specs.
See also: User:ProteinBoxBot/Project_proposals
Next up for implementation
[edit]- per discussion on Commons, add PDB infobox to all PDB images (Example [1])
- Run bot update
- needs to work with new data Web services
- remove {{PBB_Summary}} and {{PBB_Controls}} from main namespace
- pilot project for {{SWL}}
- find some well-known facts
- encode them in Gene Wiki article using {{SWL}}
- figure out synchronization with wikidraft.org/SMW, converting SWLs to real semantic links
- OUTPUT: demonstrate real inline queries on wikidraft.org
- OUTPUT: export from SMW to RDF
- pilot collaboration with MODs (specifically ZFIN)
- scan through all Gene Wiki pages for inline citations
- retrieve MeSH terms identify matching species (human, mouse, zebrafish, fly, rat, yeast)
- generate four-column output file:
- WP article name
- cited pubmed ID
- matching organisms by MeSH
- sentence(s) referencing the publication
- Notes
- is there a MeSH-to-taxonomy mapping? or do free-text matching?
- for pubs that reference multiple species, one line per species
- for articles that reference a pub multiple times, concatenate sentences
- redesign infobox to better handle linking to MODs (MGD, RGD, ZFIN, FlyBase, WormBase, etc.)
Add additional links
[edit]- GeneCards
- nextbio.com?
- wikiprofessional
- wikigenes
- WikiPathways.org
- KEGG (also add wikilinks to other gene pages in the same KEGG pathways)
- HPRD
- link to Bioinformatic Harvester? -- would need community consensus...
Add/improve stub data (gene-specific)
[edit]- change format of the references section to make it small-screen friendly ([2])
- Add GeneRIFs and references from Uniprot
- import and display EC number
- import and display protein domain information (through Uniprot/PFAM/COGs) See previous discussion.
- UniProt fields: PFAM, "Protein name", "Synonyms", FUNCTION, DOMAIN, SUBCELLULAR LOCATION, CATALYTIC ACTIVITY, COFACTOR, SUBUNIT, and WEB RESOURCE
- Need to fix the db links for genome locations: default for mouse has gone to mm9 User_talk:ProteinBoxBot#Mouse_location_links_lack_db_name_parameter (need to either change default in template, or need to do a second pass run on all infoboxes to add parameter)
- Load PPI from Entrez Gene User_talk:ProteinBoxBot/Archives/Archive1#Interaction_partners
- Add a note in infobox showing last-updated date
- for GO section, add small note of evidence code and a link to Pubmed reference, if available.
- add image maps to thumbnail expression images so that tissues can be identified
- add a banner from gene talk pages to portal page ([3])
Add/improve stub data (structure)
[edit]- add reference to GO section of infobox linking Entrez Gene
- Add a legend to the protein infobox, especially to explain what the expression profiles mean and how they were generated. See User_talk:ProteinBoxBot/Archives/Archive1#Some_comments_and_a_question
Technical bot stuff
[edit]- add MCB template to talk page
- Create more precise PDB caption by using the PDB "title"
- Change PDB image name to correspond to the PDB ID, not the gene Symbol
- change images to upload to Wikipedia:Wikimedia Commons
- Mechanism for users to interrupt actions of bot
- replace move expression image captions from image to text (Wikipedia:Preparing images for upload#Replace captions in the image with text)
- add template categories to {{PBB}} templates
- SVG instead of PNG for thumbnail expression images
- tag review articles in "Further reading" section with REVIEW (see User_talk:ProteinBoxBot/Archives/Archive1#Alternative_Idea)
- endash instead of hyphens in references ([4])
- change PDB image link (which currently references only www.pdb.org) to a structure-specific page. Also reference the license agreement (http://www.pdb.org/robohelp_f/site_navigation/citing_the_pdb.htm) (This item may become obsolete with change to EBI images in wikicommons...)
- test out using flare [5] to visualize usage/editing data
- fix duplicate images [6]
- only show 2-3 refs per protein interaction, biasing toward review articles (as discussed here)
Parallel efforts
[edit]- upload all PDB to flickr? allows browsing of entire SCOP sub-trees. maybe geotag by location?
- create a WP category for every GO category? (Piggy back with Enzyme class effort?)
- expand to create pages for each disease using {{Infobox_Disease}}
- second bot to wikilink common biology concepts, specifically on pages with PBB_Controls
- change {{Gene}} templates to internal wikilinks
- systematic creation of articles around protein domains (e.g., SMART database)
- Mass autogeneration of high-quality PDB images
Other
[edit]- look into HSPA1A and HSPA1B [7]
- automated way to create this table
- create a mac dashboard widget for the Gene Wiki?
- charting library to combine bar chart with background histogram... (not really Gene Wiki related...)
Completed tasks
[edit]Upload snapshots of all PDB images -- create a gallery?Done!get structure image from RSCBDone!modify orthologs box to automatically adjust rows and columns based on dataDone! (I think)...possible add a comment to the protein box area saying that changes (to the protein box only) will be overwritten by the next bot update; this may help us from having to worry about manual edits -- AND/OR -- allow users to manually enter comment in protein box to prevent bot from overwritingDone! through the PBB_Controls template.use "Category: Human proteins" instead of simply "Proteins"Done!add "Category: Gene from chromosome N"Done!change spacing pattern (e.g., [8])Fixed when infoboxes moved to template pages
Obsolete tasks
[edit]second bot to create redirects from gene aliasesRemoved! better for a human to doadd a comment <!--Add additional text here--> to make it clear where people can/should edit...Removed! better constrain areas for PBB editschanging redirects so that primary title is HGNC namemaybe just flag these for manual inspectionRemoved! A human should handle anything with regards to page moves.
adding links to page (e.g., "ITK") from alternate symbols (e.g., EMT; LYK; PSCTK2; MGC126257; MGC126258) and full gene name (e.g., IL2-inducible T-cell kinase)is redirecting from alternate symbols really a good idea? How would one list ITK on the EMT disambiguation page?Removed! Better that a human does this.
add a "update_PDB_image" tag in PBB_controls so that people can turn off automated edits for that part of the infobox specifically -- or, don't make any change to existing PDB image, only add if an image didn't previously existRemoved! Already default behavior