Today we built a powerful suite of Django management commands for improving and enriching Contact
and Domain
models in a real estate platform using Django 1.8 and Python 2.7 — all designed to work with legacy systems, while still leveraging smart NLP techniques like text summarization.
🛠️ Overview of Management Commands
1. update_contact_offer_counts
Purpose: Updates the count
field of each Contact
with the number of related Offer
objects.
python manage.py update_contact_offer_counts
2. update_domain_contact_counts
Purpose: Updates the contact_count
field in each Domain
by counting how many Contact
objects are assigned to it.
python manage.py update_domain_contact_counts
3. update_domain_ad_counts
Purpose: Sums up all Contact.count
values for contacts linked to a Domain
, and saves that total in the Domain.ad_count
field.
python manage.py update_domain_ad_counts
4. show_contacts_with_multiple_offers_and_no_domain
Purpose: Lists all Contact
objects that:
- Have more than one offer (
count > 1
) - Have a non-empty website
- Do not yet have a
Domain
assigned
python manage.py show_contacts_with_multiple_offers_and_no_domain
5. assign_domains_to_contacts
Purpose: For every Domain
, finds Contact
objects whose website
URL contains the domain’s URL, and assigns that Domain
if not already assigned.
python manage.py assign_domains_to_contacts
6. copy_contact_logos_to_domains
Purpose: For each Domain
that has no logo, finds a related Contact
that does, and copies the logo.
python manage.py copy_contact_logos_to_domains
7. generate_summaries_with_gensim
Purpose: Generates a short summary from each Domain.plain_rewrite
using Gensim’s summarize()
function, and stores it in the description
field.
python manage.py generate_summaries_with_gensim
8. generate_rewrite_and_summary
Purpose: First strips html_rewrite
into plain text (if plain_rewrite
is empty). Then generates a summary using Gensim and saves it in description
.
python manage.py generate_rewrite_and_summary
🧠Bonus: What Can Gensim Do With Text?
Gensim is a powerful NLP toolkit focused on semantic modeling, topic discovery, and similarity analysis — particularly useful when working with large sets of unstructured text like contact descriptions, real estate listings, or scraped HTML.
Feature | Tool/Method | Use Case |
---|---|---|
Summarization | summarize() |
Auto-snippets, TL;DRs, meta descriptions |
Keyword Extraction | keywords() |
Auto-tagging, search filtering, highlights |
Topic Modeling | LdaModel , LsiModel |
Discover themes in ads or descriptions |
Similarity Search | MatrixSimilarity |
Detect duplicates, recommend similar items |
Word Similarity | Word2Vec , FastText |
Semantic search, user intent detection |
Document Embedding | Doc2Vec |
Content recommendation, ML clustering |
TF-IDF Modeling | TfidfModel |
Identify unique or weighted keywords |
Pro tip: Even in legacy Python 2.7 setups, Gensim 3.x remains a reliable and flexible choice for NLP-based processing without requiring heavy ML infrastructure.
🚀 Ready to Expand
With these tools in place, you now have:
- Clean, structured data (
count
,ad_count
,description
) - Enriched content from HTML
- NLP summaries, keywords, and potential for auto-tagging
This lays the foundation for smart features like:
- Related listings
- Contact deduplication
- AI-assisted content suggestions
- Real-time domain health dashboards
Let me know if you’d like to expand this setup with TF-IDF, clustering, auto-tagging, or multi-language summaries next!
Comments
Post a Comment